Knowledge Exchange, Nov 2011, Bonn


Published on

Keynote presented to KE workshop held in conjunction with the release of the report "A Surfboard for Riding the Wave
Towards a four country action programme on research data":

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Knowledge Exchange, Nov 2011, Bonn

  1. 1. How manysolutions does ittake to changethe face ofresearch data? Todd Vision Dryad Digital Repository University of North Carolinaat Chapel Hill KE Workshop 14-15 November 2011 Bonn, Germany
  2. 2. “Es sollte nur ein Magazin der Kunst in der Welt sein wo der Künstler seineKunstwerke nur hinzugeben hätte um zu nehmen was er brauchte” “There ought to be in the world a repository of art, to which the artist needonly bring his artworks in order to take what he needed” Beethoven, letter to publisher F.A. Hoffmeister, 15 January 1801
  3. 3. Open dissection of research: the Beethoven Repository
  4. 4. n=3824 Source: Publishing Research Consortium,
  5. 5. Time of publication Specific details General detailsInformation Content Retirement or career change Accident Death Time (Michener et al. 1997)
  6. 6. Henry Oldenburg
  7. 7. Bumpus HC (1898) The Elimination of the Unfit as Illustrated by the Introduced Sparrow,Passer domesticus. Biological Lectures from the Marine Biological Laboratory: 209-226.
  8. 8. Transparency
  9. 9. Failure of peer-to-peer data sharing Wicherts and colleagues requested data from from 141 articles in American Psychological Association journals. “6 months later, after … 400 emails, [sending] detailed descriptions of our study aims, approvals of our ethical committee, signed assurances not to share data with others, and even our full resumes…” only 27% of authors complied Wicherts, J.M., Borsboom, D., Kats, J., Molenaar, D. (2006). The poor availabilityof psychological research data for reanalysis. American Psychologist, 61, 726-728.
  10. 10. News alert: scientists are human “We related the reluctance to share research data for reanalysis to 1148 statistically significant results reported in 49 papers published in two major psychology journals. We found the reluctance to share data to be associated with weaker evidence (against the null hypothesis of no effect) and a higher prevalence of apparent errors in the reporting of statistical results. The unwillingness to share data was particularly clear when reporting errors had a bearing on statistical significance”. Not shared Shared Wicherts et al. (2011) doi:10.1371/journal.pone.0026828
  11. 11. Lang GI, Botstein D (2011) PLoS ONEdoi:10.1371/journal.pone.0025290 101 pages!
  12. 12. Joint Data Archiving Policy (JDAP) Data are important products of the scientific enterprise, and they should be preserved and usable for decades in the future. As a condition for publication, data supporting the results in the article should be deposited in an appropriate public archive. Authors may elect to embargo access to the data for a period up to a year after publication. Exceptions may be granted at the discretion of the editor, especially for sensitive information. Whitlock, M. C., M. A. McPeek, M. D. Rausher, L. Rieseberg, and A. J. Moore. 2010. Data Archiving. American Naturalist. 175(2):145-146.
  13. 13. Infrastructure
  14. 14. author prepare manuscript and related data filesJOURNAL submit manuscript manuscript review DRYAD upload data editor accepted? accepted? send article Dryad data no yes description package send data identifier (DOI) curation data curator published article published data (with data citation) (with article citation)
  15. 15. See poster from Brian Hole
  16. 16. Heather Piwowar
  17. 17. Survey of authors What are the policies of your funder as they apply to online public archiving? (n=983) 1% Forbids 21% Recommends 9% Requires 40% No policy 26% I don’t know 3% Other
  18. 18. Data policies among bioscience journals IF=3.6 IF=6.0 IF=4.5 n=70 Piwowar HA, Chapman WW (2008) A review of journal policies for sharing researchdata. Presented at ELPUB2008, Nature Precedings hdl:10101/npre.2008.1700.1
  19. 19. Reuse
  20. 20. Tracking data reuse Piwowar, Carlson,Vision,unpublished
  21. 21. H. Piwowar, J. Carlson, T.Vision, unpubl.
  22. 22. H. Piwowar, unpubl.
  23. 23. Incentives
  24. 24. Does sharing imply that it need be altruistic? •  For a set of 85 cancer microarray clinical trials   48% had publicly available data   These received 85% of the article citations   Independent of journal impact factor, publication date, author nationality Piwowar H, et al. (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308.
  25. 25. Taxonomy of data archiving benefits Direct Indirect (costs avoided) Verification of published research Redundant data collection Preserving accessibility to data Inefficient legacy data curation Allowing reuse and repurposing of data Burden of sharing-upon-request Discoverability of data Opportunity cost of science not done Near term Long term Protection against personnel turnover Secure long-term stewardship Availability for review and validation Increased impact per publication Private Public Increased citations More efficient use of research dollars New collaborations Public trust in science New research opportunities Educational opportunities Fulfilling funding mandates Improved methodologies More informed policy Modified from Beagrie et al. (2009) Keeping Research Data Safe 2 28
  26. 26. Galilei, Galileo (1638) Discorsi e dimostrazioni matematiche, intorno à duenuove scienze Attenenti alla Mecanica I Movimenti Locali. Elsevier
  27. 27. Funding
  28. 28. Costs •  Moderate economies of scale are required   At 10K packages/yr, $50/deposit, depending on curation •  What are the costs for SOM?   Journal of Clinical Investigation: $300 flat fee   Ecological Archives: $250 10Mb, more fees beyond that   FASEB: $100 per file Beagrie N, Eakin-Richards L,Vision TJ (2009) Business models and cost estimation: Dryad repository case study. iPRES 2010
  29. 29. What is the return on investment? •  A rigorous framework is lacking   But we can look at comparators •  Marginal cost of data archiving   $50/article is 2% of of publication costs ($2.5K)   And 0.2% of grant costs/article (~$25K) •  Is the data worth 2% of the research investment?   Using DNA microarray data in GEO as a model   2,711 submissions in 2007   Data reused by 3rd parties in 1,150 articles Vision (2011) Open data and social contract of scientific publishing. BioScience, 60(5):330-330 Piwowar H,Vision TJ, Whitlock MC (2011) Data archiving is a good investment. Nature 473:285
  30. 30. Training
  31. 31. Buildingsolutions
  32. 32. DataONE network Three major components forflexibility, scalability and sustainability Member Nodes •  diverse institutions Coordinating Nodes •  serve local community •  retain complete Investigator Toolkit •  provide resources for metadata catalog managing their data •  indexing for search •  retain copies of data •  network-wide services •  ensure content availability (preservation) •  replication services
  33. 33. Concluding thoughts •  Archiving is essential •  Journals and learned societies will be at least as important as institutions •  Funders cannot be shy about policy, and must drive the marketplace •  We can leverage for data lots of things that work well for traditional publications •  International cooperation is a must
  34. 34. • • • • • •  @datadryad •  Dryad
  35. 35. Images and sources 1.© Yael Fitzpatrick and AAAS, 2. Beethoven mit der Missa solemnis, by Joseph Stieler; photo CC BY-NC-SA 2.0 Taran Rampersad Letter from Beethoven to Franz Anton Hoffmeister, © Beethoven-Haus Bonn 3. The Wikipedia Lesson of Dr Nicolaes Tulp, by Alasdair Forrest, http:// 4. © National Evolutionary Synthesis Center ( 5. © Publishing Research Consortium, source: 6. After Michener et al. (1997) Ecological Applications 7(1):330–342. 7. Title page of Philosophical Transactions of the Royal Society, Vol. 1, 1665, public domain; portrait of Henry Oldenburg, source:, public domain 8. source: Bumpus HC (1898) The Elimination of the Unfit as Illustrated by the Introduced Sparrow, Passer domesticus. Biological Lectures from the Marine Biological Laboratory: 209-226, public domain. 9. CC BY-NC-ND 2.0 by lebatihem, source: 2154686107/ 11. CC-BY Wicherts JM, Bakker M, Molenaar D source: Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results. PLoS ONE 6(11): e26828, 2011. doi:10.1371/journal.pone.0026828
  36. 36. 12. CC-BY, Lang GI, Botstein D source: A Test of the Coordinated Expression Hypothesis for the Origin and Maintenance of the GAL Cluster in Yeast. PLoS ONE 6(9): e25290, 2011. doi:10.1371/journal.pone.0025290 13. CC BY-SA 2.0 avlxyz, source: 16. courtesy of Peggy Schaeffer 20. CC-BY Piwowar HA, Chapman WW source: A review of journal policies for sharing research data, Nature Precedings, hdl:10101/npre.2008.1700.1 21. CC BY 2.0, sashafatcat source: 23, 24. CC-BY H Piwowar, J Carlson, T Vision, unpublished 25. CC-BY H Piwowar, source: reviewers/ 26. CC BY-ND 2.0 Sivaprakash Kannan source: 294755142/ 28. After: Beagrie N, Lavoie B, Woollard M (2010) Keeping Research Data Safe 2, http:// 29. Galilei, Galileo (1638) Discorsi e dimostrazioni matematiche, intorno à due nuove scienze Attenenti alla Mecanica I Movimenti Locali. Elsevier. Source: original unknown. 30. CC BY-NC-SA 2.0 Coralie Mercer, source: 33. CC BY-NC-ND 2.0 by, source: 2904115612/ 34. Liberty ship under construction, source:, public domain