Research data and scholarly publications: going from casual acquaintances to something more
Upcoming SlideShare
Loading in...5
×
 

Research data and scholarly publications: going from casual acquaintances to something more

on

  • 1,329 views

Presented to ALPSP annual meeting 2011 in Oxfordshire UK during a session entitled "Abort Retry Fail? Data and the scholarly literature"

Presented to ALPSP annual meeting 2011 in Oxfordshire UK during a session entitled "Abort Retry Fail? Data and the scholarly literature"

Statistics

Views

Total Views
1,329
Views on SlideShare
1,316
Embed Views
13

Actions

Likes
0
Downloads
8
Comments
0

2 Embeds 13

http://paper.li 12
http://tweetedtimes.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • .
  • .
  • .
  • This is a riskier workflow – it is more dependent on theauthor to make sure the publication contains a link to the data.
  • Getting authors and journals to do this sensibly on the article side is not easy. This is a relatively good example – but actual practice is all over the map. Sometimes in acknowledgements, sometimes in main text, sometimes in a standardized data availability section set up by the journal.One interesting area of agreement at the Data Citation workshop Micah mentioned was that the original article should list the data citation in the reference list, for better indexing. At any rate, there is much room for standardization, and awareness-raising among both authors and journals.
  • For funders, we have estimated that each publication costs 16K UK pounds worth of NSF funding. For another repository we have studied (GEO), 2711 data sets submitted in 2007 made substantive contributions to more than 1150 published articles in 2007-2010 alone, which would cost >18M UKP in original research grants.

Research data and scholarly publications: going from casual acquaintances to something more Research data and scholarly publications: going from casual acquaintances to something more Presentation Transcript

  • Research data and scholarly publications:Going from casual acquaintances to something more
    Todd Vision
    Dept of Biology, University of North Carolina at Chapel Hill
    and the U.S. National Evolutionary Synthesis Center
    ALPSP, September 2011
    Abort, Retry, Fail? Data and the scholarly literature
  • Peer-to-peer ‘sharing’ fails
    Wicherts and colleagues requested data from from 141 articles in American Psychological Association journals.
    “6 months later, after … 400 emails, [sending] detailed descriptions of our study aims, approvals of our ethical committee, signed assurances not to share data with others, and even our full resumes…” only 27% of authors complied
    Wicherts, J.M., Borsboom, D., Kats, J., & Molenaar, D. (2006). The poor availability of psychological research data for reanalysis. American Psychologist, 61, 726-728.
  • Time of publication
    Specific details
    General details
    Retirement or
    career change
    Information Content
    Accident
    Death
    Time
    (Michener et al. 1997)
  • Bumpus HC (1898) The Elimination of the Unfit as Illustrated by the Introduced Sparrow, Passer domesticus. Biological Lectures from the Marine Biological Laboratory: 209-226.
  • n=3824
    Source: Publishing Research Consortium, http://publishingresearch.net
  • Taxonomy of data archiving benefits
    Modified from Beagrie et al. (2009) Keeping Research Data Safe 2
    10
  • Joint Data Archiving Policy (JDAP)
    Data are important products of the scientific enterprise, and they should be preserved and usable for decades in the future.
    As a condition for publication, data supporting the results in the article should be deposited in an appropriate public archive.
    Authors may elect to embargo access to the data for a period up to a year after publication.
    Exceptions may be granted at the discretion of the editor, especially for sensitive information.
    Whitlock, M. C., M. A. McPeek, M. D. Rausher, L. Rieseberg, and A. J. Moore. 2010. Data Archiving. American Naturalist. 175(2):145-146.
  • The long tail of orphan data in “small science”
    after B. Heidorn
    “Most of the bytes are at the high end, but most of the datasets are at the low end” – Jim Gray
    Specialized repositories
    (e.g. GenBank, PDB)
    Volume
    Orphan data
    Rank frequency of datatype
  • Smit E (2011) Abelard and Héloise: Why Data and Publications Belong Together. D-Lib Magazine doi:10.1045/january2011-smit
  • The End
    To make data archiving and reuse standard part of research and publishing.
    The Means
    Enable low-burden data archiving at the time of manuscript submission.
    Promote researcher benefits from data archiving.
    Promote responsible data reuse.
    Empower journals, societies & publishers in shared governance.
    Ensure sustainability and long-term preservation.
    The Scope
    Data underlying peer-reviewed articles in basic and applied biosciences.
  • Integrated
    Submit manuscript
  • Integrated
    Submit manuscript
    Prompt author
    Manuscript metadata
  • Integrated
    Submit manuscript
    Submit data
    Prompt author
    Manuscript metadata
  • Integrated
    Submit manuscript
    Submit data
    Prompt author
    Manuscript metadata
    Review passcode
    Peer review
  • Integrated
    Submit manuscript
    Submit data
    Prompt author
    Manuscript metadata
    Review passcode
    Peer review
    Acceptance notification
    Curation
    Data DOI
    Production
  • Integrated
    Submit manuscript
    Submit data
    Prompt author
    Manuscript metadata
    Review passcode
    Peer review
    Acceptance notification
    Curation
    Data DOI
    Production
    Article metadata
    Curation
  • Integrated
    Submit manuscript
    Submit data
    Prompt author
    Manuscript metadata
    Review passcode
    Peer review
    Acceptance notification
    Curation
    Data DOI
    Production
    Article metadata
    Curation
    Article
    Publication
    Data publication
    Article DOI/final metadata harvested
  • Non-integrated
    Integrated
    Submit manuscript
    Submit data
    Prompt author
    Manuscript metadata
    Review passcode
    Peer review
    Submit data
    Acceptance notification
    Curation
    Data DOI
    Production
    Article metadata
    Curation
    Article
    Publication
    Data publication
    Article DOI/final metadata harvested
  • Non-integrated
    Integrated
    Submit manuscript
    Submit data
    Prompt author
    Manuscript metadata
    Review passcode
    Peer review
    Submit data
    Acceptance notification
    Curation
    Data DOI
    Production
    Author includes data DOI
    Data DOI
    Article metadata
    Curation
    Article
    Publication
    Data publication
    Article DOI/final metadata harvested
  • Non-integrated
    Integrated
    Submit manuscript
    Submit data
    Prompt author
    Manuscript metadata
    Review passcode
    Peer review
    Submit data
    Acceptance notification
    Curation
    Data DOI
    Production
    Author includes data DOI
    Data DOI
    Article metadata
    Curation
    Article
    Publication
    Data publication
    Article publication
    DOI/final metadata
    harvested
    Article DOI/final metadata harvested
  • Dryad relative to Supplementary Online Materials
    * A few publisher SOM sites are exceptions to the general rule
    ** Practices differ among publishers, see Smit (2011), doi:10.1045/january2011-smit
    26
  • Article citation
    Wu D, Wu M, Halpern A, Rusch DB, Yooseph S, Frazier M, Venter JC, Eisen JA (2011) Stalking the fourth domain in metagenomic data: searching for, discovering, and interpreting novel, deep branches in phylogenetic trees of phylogenetic marker genes. PLoS ONE 6(3): e18011. doi:10.1371/journal.pone.0018011
    Data citation
    Wu D, Wu M, Halpern A, Rusch DB, Yooseph S, Frazier M, Venter JC, Eisen JA (2011) Data from: Stalking the fourth domain in metagenomic data: searching for, discovering, and interpreting novel, deep branches in phylogenetic trees of phylogenetic marker genes. Dryad Digital Repository. doi:10.5061/dryad.8384
  • Rebbeck CA, Leroi AM, Burt A (2011) Mitochondrial capture by a transmissible cancer. Science 331, 303
  • Number of data packages
  • 20 papers from Delsuc and
    Douzery going back to 2002
  • By now, downloaded >1000X
  • Fulfilling the role of a journal
  • Does sharing imply that it need be altruistic?
    For a set of 85 cancer microarray clinical trials
    48% had publicly available data
    These received 85% of the article citations
    Independent of journal impact factor, publication date, author nationality
    Piwowar H, et al. (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308.
  • Does sharing imply that it need be altruistic?
    For a set of 85 cancer microarray clinical trials
    48% had publicly available data
    These received 85% of the article citations
    Independent of journal impact factor, publication date, author nationality
    Piwowar H, et al. (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308.
  • Data policies among bioscience journals
    IF=3.6
    IF=6.0
    IF=4.5
    n=70
    Piwowar HA, Chapman WW (2008) A review of journal policies for sharing research data. Presented at ELPUB2008, Nature Precedings hdl:10101/npre.2008.1700.1
  • The value proposition
    For researchers
    Increase the impact of, and citations to, published research.
    Preserve and make data available to verify published results, to refine methodologies, and to repurpose.
    Free researchers from the burden of data preservation and access.
    For journals, publishers and societies
    Free journals from the burden of managing supplemental data
    Increase the discoverability, impact, and integrity of articles
    Increase their value to the community they serve.
    For funders
    A cost-effective mechanism to make research more accessible
    Leverage existing investments in order to enable new science
  • Sustainability and governance
    Business model
    Long-term preservation requires a long-term organization
    In Dryad’s case, a membership-based nonprofit
    Revenue received from a broad array of ‘customers, including journals, societies, publishers, and researchers
    Deposit charges
    Paid upfront, when the majority of costs are incurred
    Ensure free access to the data in perpetuity
    Allow revenue to naturally scale with costs (i.e. volume of deposits)
    Distribute costs fairly among stakeholders
    Governance
    12 member Board of Directors nominated, elected by Membership
    Membership serves in advisory capacity, and is a community of practice
  • Costs
    Moderate economies of scale are required
    At 10K packages/yr, <$50/deposit, depending on curation
    What are the costs for SOM?
    Journal of Clinical Investigation: $300 flat fee
    Ecological Archives: $250 <10Mb, more fees beyond that
    FASEB: $100 per file
    Beagrie N, Eakin-Richards L, Vision TJ (2009) Business models and cost estimation: Dryad repository case study. iPRES 2010
  • Proposed payment plans
    Journal-based
    annual fee based on all research articles published/yr (~$25/per*)
    covers any deposits from the journal (even from prior yrs)
    Voucher-based
    pay in advance for some number of deposits (<$50/per deposit)
    Pay-as-you-go:
    be invoiced retrospectively for deposits (>$50/per deposit)
    Author-pays
    Author pays online at time of deposit
    Journal can still facilitate archiving through submission integration
    *These are rates for Members, which include a 10% discount
  • What is the return on investment?
    A rigorous framework is lacking
    But we can look at comparators
    Marginal cost of data archiving
    $50/article is <2% of of publication costs (>$2.5K)
    And 0.2% of grant costs/article (~$25K)
    Is the data worth 2% of the research investment?
    Using DNA microarray data in GEO as a model
    2,711 submissions in 2007
    Data reused by 3rd parties in >1,150 articles
    Vision (2011) Open data and social contract of scientific publishing. BioScience, 60(5):330-330 Piwowar H, Vision TJ, Whitlock MC (2011) Data archiving is a good investment. Nature 473:285
  • http://datadryad.org
    http://blog.datadryad.org
    http://datadryad.org/wiki
    http://code.google.com/p/dryad
    dryad-users@nescent.org
    @datadryad
    Dryad
  • A very incomplete list of contributors
    JDAP: M. Whitlock
    DryadUS. R. Scherle, E. Feinstein, J. Greenberg, H. Piwowar, P. Schaeffer
    DryadUK: B. Hole, Max Wilkinson, D. Shotton
    Sustainability planning: N. Beagrie, L. Eakin-Richards