Laurie Goodman at the AIBS Changing Practices in Data Pub workshop: Beyond Data Release Mandates - Helping Authors Make Data Available. 3rd December 2014
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors Make Data Available
1. Beyond Data Release Mandates
Helping Authors Make Data Available
Laurie Goodman, PhD
Editor-in-Chief GigaScience
ORCID ID: 0000-0001-9724-5976
2. Beyond Reproducibility and Retraction
• I’m not going to talk about:
– The 10-30% of papers that can’t be reproduced
• I’m not going to talk about:
– The 15x increase in the number of retractions in
the last decade
I’m going to talk about the other reason for making
data available (in an accessible and reusable format)
More eyes*More innovation* More widespread use
3. Why?
Infectious Disease
Measles: 122,000 per year
Hepatitis C-related liver disease: 350,000-500,000 per year
Malaria: 627,000 per year
HIV/AIDS: 1.4-1.7 million per year
Non-communicable, with genetic predisposition
Prostate cancer: 307,000 per year
Breast cancer: 522,000 per year
Suicide: 800,000 per year
Diabetes: 1.5 million per year
Cancer: 8.2 million per year
Cardiovascular Disease: 17.5 million per year
Non-genetic/Non-infectious
Pesticide Poisoning: 250,000 per year
Malnutrition: 2.8 million children (under 5) per year
World Health Organization Fact Sheets http://www.who.int/en/
Environment
Extinction Rate: 1,000-10,000x higher than natural extinction rate
World Wildife Federation http://wwf.panda.org/
4. This week: Genome Biology Soap Box article on Future of Data Publishing in
By Kahn R., Goodman L., & Mittleman D. http://genomebiology.com/
5. Scientific Communication
Via Publication
• Scholarly articles are merely advertisement of scholarship .
The actual scholarly artefacts, i.e. the data and
computational methods, which support the
scholarship, remain largely inaccessible --- Jon B.
Buckheit and David L. Donoho, WaveLab and reproducible
research, 1995
• Core scientific statements or assertions are intertwined and
hidden in the conventional scholarly narratives
• Lack of transparency, lack of credit for anything other than
“regular” dead tree publication
6. Wiley Researcher Data Insights Survey
Why Researchers Do Not Share
• Intellectual property or confidentiality issues (59%)
• Concerned research might be “scooped” (39%)
• Concerns about misinterpretation or misuse (32%)
• Concerns about attribution/citation credit (31%)
• Ethical concerns (24%)
• Insufficient time/resources (19%)
• Funder/institution does not require sharing (13%)
• Lack of funding (13%)
• Not sure where to share (5%)
• Not sure how to share (3%)
Slide from Catherine Giffi, Director, Strategic Market Analysis, Global Research, Wiley
Report is underway: but See:
http://exchanges.wiley.com/blog/2014/11/03/how-and-why-researchers-share-data-and-why-they-dont/
http://scholarlykitchen.sspnet.org/2014/11/11/to-share-or-not-to-share-that-is-the-research-data-question/
7. Wiley Researcher Data Insights Survey
Why Researchers Do Not Share
• Intellectual property or confidentiality issues (59%)
• Concerned research might be “scooped” (39%)
• Concerns about misinterpretation or misuse (32%)
• Concerns about attribution/citation credit (31%)
• Ethical concerns (24%)
• Insufficient time/resources (19%)
• Funder/institution does not require sharing (13%)
• Lack of funding (13%)
• Not sure where to share (5%)
• Not sure how to share (3%)
Slide from Catherine Giffi, Director, Strategic Market Analysis, Global Research, Wiley
Report is underway: but See:
http://exchanges.wiley.com/blog/2014/11/03/how-and-why-researchers-share-data-and-why-they-dont/
http://scholarlykitchen.sspnet.org/2014/11/11/to-share-or-not-to-share-that-is-the-research-data-question/
8. How Can Publishers Promote Data Sharing
Researchers are never so captive as when they publishing
But we need to help — not just harass.
Carrots and Sticks
• Sticks
And- why us?
– Create Journal Data Release Policies
– Check Data Release Policy is followed
• Carrots
– Find Ways to Aid Researchers in Releasing Data
– Consider ways to support/protect researchers who do share ahead of
publications
– Promote Data Citation
– Data Curation
– Data Hosting (short term or long term, depending on need)
9. How We Envision Research Publication
(Communicating Science)
Open-access journal Data Publishing Platform
Data Sets in
GigaDB
Analyses in
GigaGalaxy
Paper in
GigaScience
Data Analysis Platform
10. Why have Journal-linked Database?
Example #1:
Direct Data Citation
Encourages data
release prior to
publication of data
analysis article
http://blogs.biomedcentral.com/gigablog/2014/05/14/the-latest-weapon-in-publishing-data-the-polar-bear/
11. The polar bear DATA was released –prepublication- in 2011
Data were used and cited in at least 5 studies (Below)
1. Hailer, F et al., Nuclear genomic sequences reveal that polar bears are an old and
distinct bear lineage. Science. 2012 Apr 20;336(6079):344-7.
doi:10.1126/science.1216424.
2. Cahill, JA et al., Genomic evidence for island population conversion resolves
conflicting theories of polar bear evolution. PLoS Genet. 2013;9(3):e1003345.
doi:10.1371/journal.pgen.1003345.
3. Morgan, CC et al., Heterogeneous models place the root of the placental mammal
phylogeny. Mol Biol Evol. 2013 Sep;30(9):2145-56. doi:10.1093/molbev/mst117.
4. Cronin, MA et al., Molecular Phylogeny and SNP Variation of Polar Bears (Ursus
maritimus), Brown Bears (U. arctos), and Black Bears (U. americanus) Derived from
Genome Sequences. J Hered. 2014; 105(3):312-23. doi:10.1093/jhered/est133.
5. Bidon, T et al., Brown and Polar Bear Y Chromosomes Reveal Extensive Male-Biased
Gene Flow within Brother Lineages. Mol Biol Evol. 2014 Apr 4.
doi:10.1093/molbev/msu109
http://blogs.biomedcentral.com/gigablog/2014/05/14/the-latest-weapon-in-publishing-data-the-polar-bear/
Formal Article by data producers on these data
was published in 2014 in Cell
(But, data not cited in references…)
http://www.cell.com/cell/abstract/S0092-8674%2814%2900488-7
12. Why have Journal-linked Database?
Example #2:
Provide persistent
database for data
types that have no
repository
13. Why Journal-linked Database?
Example # 3
• New Sequencing technology
• minION Oxford-Nanopore
• New Sequence Data Type
• EBI and NCBI Databases not ready
• High community interest for testing
data
• >100 GB of data
• Uploaded prior to publication
• Deployed on Amazon Cloud Front
• Ongoing
testing/comparison/information
sharing prior to publication
• When ready for data EBI used our
cloud to upload data
• EBI transferred the data to NCBI when
they were ready
14. Why have Journal-linked Database?
Example #4:
Provide the specific
information (and
forms) for accessing
data in protected
databases
(EGA and DBGaP)
What needs to be
done and who to
contact for
permission to use
data
Forms needed to
access this dataset
15. Beyond Data Availability
Reviewing Data:
Issue: We can’t ask our reviewers to do that!
Our finding: Reviewers don’t mind
Reviewer Dr. Christophe Pouzat on neuroscience
manuscript: “In addition to making the presented research
trustworthy, the reproducible research paradigm definitely
makes the reviewers job more fun!”
Can also use specific Data Reviewers (we do)
Data Curation:
Data availability without metadata is useless
Provide or engage data curators (in your target
community or elsewhere.)
16. Thanks to:
Scott Edmunds, Executive Editor
Nicole Nogoy, Commissioning Editor
Peter Li, Lead Data Manager
Chris Hunter, Lead BioCurator
Rob Davidson, Data Scientist
Xiao (Jesse) Si Zhe, Database Developer
Amye Kenall, Journal Development Manager
Contact us:
editorial@gigasciencejournal.com
database@gigasciencejournal.com
Follow us:
@GigaScience
facebook.com/GigaScience
blogs.openaccesscentral.com/blogs/gigablog
www.gigasciencejournal.com
www.gigadb.org