Your SlideShare is downloading. ×
Research data and scholarly publications: going from casual acquaintances to something more
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Research data and scholarly publications: going from casual acquaintances to something more

1,190

Published on

Presented to ALPSP annual meeting 2011 in Oxfordshire UK during a session entitled "Abort Retry Fail? Data and the scholarly literature"

Presented to ALPSP annual meeting 2011 in Oxfordshire UK during a session entitled "Abort Retry Fail? Data and the scholarly literature"

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,190
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • .
  • .
  • .
  • This is a riskier workflow – it is more dependent on theauthor to make sure the publication contains a link to the data.
  • Getting authors and journals to do this sensibly on the article side is not easy. This is a relatively good example – but actual practice is all over the map. Sometimes in acknowledgements, sometimes in main text, sometimes in a standardized data availability section set up by the journal.One interesting area of agreement at the Data Citation workshop Micah mentioned was that the original article should list the data citation in the reference list, for better indexing. At any rate, there is much room for standardization, and awareness-raising among both authors and journals.
  • For funders, we have estimated that each publication costs 16K UK pounds worth of NSF funding. For another repository we have studied (GEO), 2711 data sets submitted in 2007 made substantive contributions to more than 1150 published articles in 2007-2010 alone, which would cost >18M UKP in original research grants.
  • Transcript

    • 1. Research data and scholarly publications:Going from casual acquaintances to something more
      Todd Vision
      Dept of Biology, University of North Carolina at Chapel Hill
      and the U.S. National Evolutionary Synthesis Center
      ALPSP, September 2011
      Abort, Retry, Fail? Data and the scholarly literature
    • 2.
    • 3.
    • 4. Peer-to-peer ‘sharing’ fails
      Wicherts and colleagues requested data from from 141 articles in American Psychological Association journals.
      “6 months later, after … 400 emails, [sending] detailed descriptions of our study aims, approvals of our ethical committee, signed assurances not to share data with others, and even our full resumes…” only 27% of authors complied
      Wicherts, J.M., Borsboom, D., Kats, J., & Molenaar, D. (2006). The poor availability of psychological research data for reanalysis. American Psychologist, 61, 726-728.
    • 5. Time of publication
      Specific details
      General details
      Retirement or
      career change
      Information Content
      Accident
      Death
      Time
      (Michener et al. 1997)
    • 6. Bumpus HC (1898) The Elimination of the Unfit as Illustrated by the Introduced Sparrow, Passer domesticus. Biological Lectures from the Marine Biological Laboratory: 209-226.
    • 7.
    • 8. n=3824
      Source: Publishing Research Consortium, http://publishingresearch.net
    • 9.
    • 10. Taxonomy of data archiving benefits
      Modified from Beagrie et al. (2009) Keeping Research Data Safe 2
      10
    • 11. Joint Data Archiving Policy (JDAP)
      Data are important products of the scientific enterprise, and they should be preserved and usable for decades in the future.
      As a condition for publication, data supporting the results in the article should be deposited in an appropriate public archive.
      Authors may elect to embargo access to the data for a period up to a year after publication.
      Exceptions may be granted at the discretion of the editor, especially for sensitive information.
      Whitlock, M. C., M. A. McPeek, M. D. Rausher, L. Rieseberg, and A. J. Moore. 2010. Data Archiving. American Naturalist. 175(2):145-146.
    • 12. The long tail of orphan data in “small science”
      after B. Heidorn
      “Most of the bytes are at the high end, but most of the datasets are at the low end” – Jim Gray
      Specialized repositories
      (e.g. GenBank, PDB)
      Volume
      Orphan data
      Rank frequency of datatype
    • 13. Smit E (2011) Abelard and Héloise: Why Data and Publications Belong Together. D-Lib Magazine doi:10.1045/january2011-smit
    • 14. The End
      To make data archiving and reuse standard part of research and publishing.
      The Means
      Enable low-burden data archiving at the time of manuscript submission.
      Promote researcher benefits from data archiving.
      Promote responsible data reuse.
      Empower journals, societies & publishers in shared governance.
      Ensure sustainability and long-term preservation.
      The Scope
      Data underlying peer-reviewed articles in basic and applied biosciences.
    • 15. Integrated
      Submit manuscript
    • 16. Integrated
      Submit manuscript
      Prompt author
      Manuscript metadata
    • 17. Integrated
      Submit manuscript
      Submit data
      Prompt author
      Manuscript metadata
    • 18. Integrated
      Submit manuscript
      Submit data
      Prompt author
      Manuscript metadata
      Review passcode
      Peer review
    • 19. Integrated
      Submit manuscript
      Submit data
      Prompt author
      Manuscript metadata
      Review passcode
      Peer review
      Acceptance notification
      Curation
      Data DOI
      Production
    • 20. Integrated
      Submit manuscript
      Submit data
      Prompt author
      Manuscript metadata
      Review passcode
      Peer review
      Acceptance notification
      Curation
      Data DOI
      Production
      Article metadata
      Curation
    • 21. Integrated
      Submit manuscript
      Submit data
      Prompt author
      Manuscript metadata
      Review passcode
      Peer review
      Acceptance notification
      Curation
      Data DOI
      Production
      Article metadata
      Curation
      Article
      Publication
      Data publication
      Article DOI/final metadata harvested
    • 22.
    • 23. Non-integrated
      Integrated
      Submit manuscript
      Submit data
      Prompt author
      Manuscript metadata
      Review passcode
      Peer review
      Submit data
      Acceptance notification
      Curation
      Data DOI
      Production
      Article metadata
      Curation
      Article
      Publication
      Data publication
      Article DOI/final metadata harvested
    • 24. Non-integrated
      Integrated
      Submit manuscript
      Submit data
      Prompt author
      Manuscript metadata
      Review passcode
      Peer review
      Submit data
      Acceptance notification
      Curation
      Data DOI
      Production
      Author includes data DOI
      Data DOI
      Article metadata
      Curation
      Article
      Publication
      Data publication
      Article DOI/final metadata harvested
    • 25. Non-integrated
      Integrated
      Submit manuscript
      Submit data
      Prompt author
      Manuscript metadata
      Review passcode
      Peer review
      Submit data
      Acceptance notification
      Curation
      Data DOI
      Production
      Author includes data DOI
      Data DOI
      Article metadata
      Curation
      Article
      Publication
      Data publication
      Article publication
      DOI/final metadata
      harvested
      Article DOI/final metadata harvested
    • 26. Dryad relative to Supplementary Online Materials
      * A few publisher SOM sites are exceptions to the general rule
      ** Practices differ among publishers, see Smit (2011), doi:10.1045/january2011-smit
      26
    • 27. Article citation
      Wu D, Wu M, Halpern A, Rusch DB, Yooseph S, Frazier M, Venter JC, Eisen JA (2011) Stalking the fourth domain in metagenomic data: searching for, discovering, and interpreting novel, deep branches in phylogenetic trees of phylogenetic marker genes. PLoS ONE 6(3): e18011. doi:10.1371/journal.pone.0018011
      Data citation
      Wu D, Wu M, Halpern A, Rusch DB, Yooseph S, Frazier M, Venter JC, Eisen JA (2011) Data from: Stalking the fourth domain in metagenomic data: searching for, discovering, and interpreting novel, deep branches in phylogenetic trees of phylogenetic marker genes. Dryad Digital Repository. doi:10.5061/dryad.8384
    • 28. Rebbeck CA, Leroi AM, Burt A (2011) Mitochondrial capture by a transmissible cancer. Science 331, 303
    • 29.
    • 30. Number of data packages
    • 31. 20 papers from Delsuc and
      Douzery going back to 2002
    • 32. By now, downloaded >1000X
    • 33. Fulfilling the role of a journal
    • 34.
    • 35. Does sharing imply that it need be altruistic?
      For a set of 85 cancer microarray clinical trials
      48% had publicly available data
      These received 85% of the article citations
      Independent of journal impact factor, publication date, author nationality
      Piwowar H, et al. (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308.
    • 36. Does sharing imply that it need be altruistic?
      For a set of 85 cancer microarray clinical trials
      48% had publicly available data
      These received 85% of the article citations
      Independent of journal impact factor, publication date, author nationality
      Piwowar H, et al. (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308.
    • 37. Data policies among bioscience journals
      IF=3.6
      IF=6.0
      IF=4.5
      n=70
      Piwowar HA, Chapman WW (2008) A review of journal policies for sharing research data. Presented at ELPUB2008, Nature Precedings hdl:10101/npre.2008.1700.1
    • 38. The value proposition
      For researchers
      Increase the impact of, and citations to, published research.
      Preserve and make data available to verify published results, to refine methodologies, and to repurpose.
      Free researchers from the burden of data preservation and access.
      For journals, publishers and societies
      Free journals from the burden of managing supplemental data
      Increase the discoverability, impact, and integrity of articles
      Increase their value to the community they serve.
      For funders
      A cost-effective mechanism to make research more accessible
      Leverage existing investments in order to enable new science
    • 39. Sustainability and governance
      Business model
      Long-term preservation requires a long-term organization
      In Dryad’s case, a membership-based nonprofit
      Revenue received from a broad array of ‘customers, including journals, societies, publishers, and researchers
      Deposit charges
      Paid upfront, when the majority of costs are incurred
      Ensure free access to the data in perpetuity
      Allow revenue to naturally scale with costs (i.e. volume of deposits)
      Distribute costs fairly among stakeholders
      Governance
      12 member Board of Directors nominated, elected by Membership
      Membership serves in advisory capacity, and is a community of practice
    • 40. Costs
      Moderate economies of scale are required
      At 10K packages/yr, <$50/deposit, depending on curation
      What are the costs for SOM?
      Journal of Clinical Investigation: $300 flat fee
      Ecological Archives: $250 <10Mb, more fees beyond that
      FASEB: $100 per file
      Beagrie N, Eakin-Richards L, Vision TJ (2009) Business models and cost estimation: Dryad repository case study. iPRES 2010
    • 41. Proposed payment plans
      Journal-based
      annual fee based on all research articles published/yr (~$25/per*)
      covers any deposits from the journal (even from prior yrs)
      Voucher-based
      pay in advance for some number of deposits (<$50/per deposit)
      Pay-as-you-go:
      be invoiced retrospectively for deposits (>$50/per deposit)
      Author-pays
      Author pays online at time of deposit
      Journal can still facilitate archiving through submission integration
      *These are rates for Members, which include a 10% discount
    • 42. What is the return on investment?
      A rigorous framework is lacking
      But we can look at comparators
      Marginal cost of data archiving
      $50/article is <2% of of publication costs (>$2.5K)
      And 0.2% of grant costs/article (~$25K)
      Is the data worth 2% of the research investment?
      Using DNA microarray data in GEO as a model
      2,711 submissions in 2007
      Data reused by 3rd parties in >1,150 articles
      Vision (2011) Open data and social contract of scientific publishing. BioScience, 60(5):330-330 Piwowar H, Vision TJ, Whitlock MC (2011) Data archiving is a good investment. Nature 473:285
    • 43.
    • 44. http://datadryad.org
      http://blog.datadryad.org
      http://datadryad.org/wiki
      http://code.google.com/p/dryad
      dryad-users@nescent.org
      @datadryad
      Dryad
    • 45. A very incomplete list of contributors
      JDAP: M. Whitlock
      DryadUS. R. Scherle, E. Feinstein, J. Greenberg, H. Piwowar, P. Schaeffer
      DryadUK: B. Hole, Max Wilkinson, D. Shotton
      Sustainability planning: N. Beagrie, L. Eakin-Richards

    ×