0
Research data and scholarly publications:Going from casual acquaintances to something more<br />Todd Vision<br />Dept of B...
Peer-to-peer ‘sharing’ fails<br />Wicherts and colleagues requested data from from 141 articles in American Psychological ...
Time of publication<br />Specific details<br />General details<br />Retirement or <br />career change<br />Information Con...
Bumpus HC (1898) The Elimination of the Unfit as Illustrated by the Introduced Sparrow, Passer domesticus. Biological Lect...
n=3824<br />Source: Publishing Research Consortium, http://publishingresearch.net<br />
Taxonomy of data archiving benefits<br />Modified from Beagrie et al. (2009) Keeping Research Data Safe 2<br />10<br />
Joint Data Archiving Policy (JDAP)<br />Data are important products of the scientific enterprise, and they should be prese...
The long tail of orphan data in “small science”<br />after B. Heidorn<br />“Most of the bytes are at the high end, but mos...
Smit E (2011) Abelard and Héloise:  Why Data and Publications Belong Together. D-Lib Magazine doi:10.1045/january2011-smit...
The End<br />To make data archiving and reuse standard part of research and publishing.  <br />The Means<br />Enable low-b...
Integrated<br />Submit manuscript<br />
Integrated<br />Submit manuscript<br />Prompt author<br />Manuscript metadata<br />
Integrated<br />Submit manuscript<br />Submit data<br />Prompt author<br />Manuscript metadata<br />
Integrated<br />Submit manuscript<br />Submit data<br />Prompt author<br />Manuscript metadata<br />Review passcode<br />P...
Integrated<br />Submit manuscript<br />Submit data<br />Prompt author<br />Manuscript metadata<br />Review passcode<br />P...
Integrated<br />Submit manuscript<br />Submit data<br />Prompt author<br />Manuscript metadata<br />Review passcode<br />P...
Integrated<br />Submit manuscript<br />Submit data<br />Prompt author<br />Manuscript metadata<br />Review passcode<br />P...
Non-integrated<br />Integrated<br />Submit manuscript<br />Submit data<br />Prompt author<br />Manuscript metadata<br />Re...
Non-integrated<br />Integrated<br />Submit manuscript<br />Submit data<br />Prompt author<br />Manuscript metadata<br />Re...
Non-integrated<br />Integrated<br />Submit manuscript<br />Submit data<br />Prompt author<br />Manuscript metadata<br />Re...
 Dryad relative to Supplementary Online Materials<br />* A few publisher SOM sites are exceptions to the general rule<br /...
Article citation<br />Wu D, Wu M, Halpern A, Rusch DB, Yooseph S, Frazier M, Venter JC, Eisen JA (2011) Stalking the fourt...
Rebbeck CA, Leroi AM, Burt A (2011) Mitochondrial capture by a transmissible cancer. Science 331, 303<br />
Number of data packages<br />
20 papers from Delsuc and <br />Douzery going back to 2002 <br />
By now, downloaded >1000X<br />
Fulfilling the role of a journal<br />
Does sharing imply that it need be altruistic?<br />For a set of 85 cancer microarray clinical trials<br />48% had publicl...
Does sharing imply that it need be altruistic?<br />For a set of 85 cancer microarray clinical trials<br />48% had publicl...
Data policies among bioscience journals<br />IF=3.6<br />IF=6.0<br />IF=4.5<br />n=70<br />Piwowar HA, Chapman WW (2008) A...
The value proposition<br />For researchers<br />Increase the impact of, and citations to, published research.<br />Preserv...
Sustainability and governance<br />Business model<br />Long-term preservation requires a long-term organization<br />In Dr...
Costs<br />Moderate economies of scale are required<br />At 10K packages/yr, <$50/deposit, depending on curation<br />What...
Proposed payment plans<br />Journal-based<br />annual fee based on all research articles published/yr (~$25/per*)<br />cov...
What is the return on investment?<br />A rigorous framework is lacking<br />But we can look at comparators<br />Marginal c...
http://datadryad.org<br />http://blog.datadryad.org<br />http://datadryad.org/wiki<br />http://code.google.com/p/dryad<br ...
A very incomplete list of contributors<br />JDAP: M. Whitlock<br />DryadUS. R. Scherle, E. Feinstein, J. Greenberg, H. Piw...
Upcoming SlideShare
Loading in...5
×

Research data and scholarly publications: going from casual acquaintances to something more

1,259

Published on

Presented to ALPSP annual meeting 2011 in Oxfordshire UK during a session entitled "Abort Retry Fail? Data and the scholarly literature"

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,259
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • .
  • .
  • .
  • This is a riskier workflow – it is more dependent on theauthor to make sure the publication contains a link to the data.
  • Getting authors and journals to do this sensibly on the article side is not easy. This is a relatively good example – but actual practice is all over the map. Sometimes in acknowledgements, sometimes in main text, sometimes in a standardized data availability section set up by the journal.One interesting area of agreement at the Data Citation workshop Micah mentioned was that the original article should list the data citation in the reference list, for better indexing. At any rate, there is much room for standardization, and awareness-raising among both authors and journals.
  • For funders, we have estimated that each publication costs 16K UK pounds worth of NSF funding. For another repository we have studied (GEO), 2711 data sets submitted in 2007 made substantive contributions to more than 1150 published articles in 2007-2010 alone, which would cost &gt;18M UKP in original research grants.
  • Transcript of "Research data and scholarly publications: going from casual acquaintances to something more "

    1. 1. Research data and scholarly publications:Going from casual acquaintances to something more<br />Todd Vision<br />Dept of Biology, University of North Carolina at Chapel Hill<br />and the U.S. National Evolutionary Synthesis Center<br />ALPSP, September 2011<br />Abort, Retry, Fail? Data and the scholarly literature<br />
    2. 2.
    3. 3.
    4. 4. Peer-to-peer ‘sharing’ fails<br />Wicherts and colleagues requested data from from 141 articles in American Psychological Association journals.<br />“6 months later, after … 400 emails, [sending] detailed descriptions of our study aims, approvals of our ethical committee, signed assurances not to share data with others, and even our full resumes…” only 27% of authors complied <br />Wicherts, J.M., Borsboom, D., Kats, J., & Molenaar, D. (2006). The poor availability of psychological research data for reanalysis. American Psychologist, 61, 726-728.<br />
    5. 5. Time of publication<br />Specific details<br />General details<br />Retirement or <br />career change<br />Information Content<br />Accident<br />Death<br />Time<br />(Michener et al. 1997)<br />
    6. 6. Bumpus HC (1898) The Elimination of the Unfit as Illustrated by the Introduced Sparrow, Passer domesticus. Biological Lectures from the Marine Biological Laboratory: 209-226.<br />
    7. 7.
    8. 8. n=3824<br />Source: Publishing Research Consortium, http://publishingresearch.net<br />
    9. 9.
    10. 10. Taxonomy of data archiving benefits<br />Modified from Beagrie et al. (2009) Keeping Research Data Safe 2<br />10<br />
    11. 11. Joint Data Archiving Policy (JDAP)<br />Data are important products of the scientific enterprise, and they should be preserved and usable for decades in the future. <br />As a condition for publication, data supporting the results in the article should be deposited in an appropriate public archive.<br />Authors may elect to embargo access to the data for a period up to a year after publication. <br />Exceptions may be granted at the discretion of the editor, especially for sensitive information.<br />Whitlock, M. C., M. A. McPeek, M. D. Rausher, L. Rieseberg, and A. J. Moore. 2010. Data Archiving. American Naturalist. 175(2):145-146.<br />
    12. 12. The long tail of orphan data in “small science”<br />after B. Heidorn<br />“Most of the bytes are at the high end, but most of the datasets are at the low end” – Jim Gray<br />Specialized repositories<br />(e.g. GenBank, PDB)<br />Volume<br />Orphan data<br />Rank frequency of datatype<br />
    13. 13. Smit E (2011) Abelard and Héloise: Why Data and Publications Belong Together. D-Lib Magazine doi:10.1045/january2011-smit<br />
    14. 14. The End<br />To make data archiving and reuse standard part of research and publishing. <br />The Means<br />Enable low-burden data archiving at the time of manuscript submission.<br />Promote researcher benefits from data archiving.<br />Promote responsible data reuse.<br />Empower journals, societies & publishers in shared governance.<br />Ensure sustainability and long-term preservation.<br />The Scope<br />Data underlying peer-reviewed articles in basic and applied biosciences.<br />
    15. 15. Integrated<br />Submit manuscript<br />
    16. 16. Integrated<br />Submit manuscript<br />Prompt author<br />Manuscript metadata<br />
    17. 17. Integrated<br />Submit manuscript<br />Submit data<br />Prompt author<br />Manuscript metadata<br />
    18. 18. Integrated<br />Submit manuscript<br />Submit data<br />Prompt author<br />Manuscript metadata<br />Review passcode<br />Peer review<br />
    19. 19. Integrated<br />Submit manuscript<br />Submit data<br />Prompt author<br />Manuscript metadata<br />Review passcode<br />Peer review<br />Acceptance notification<br />Curation<br />Data DOI<br />Production<br />
    20. 20. Integrated<br />Submit manuscript<br />Submit data<br />Prompt author<br />Manuscript metadata<br />Review passcode<br />Peer review<br />Acceptance notification<br />Curation<br />Data DOI<br />Production<br />Article metadata<br />Curation<br />
    21. 21. Integrated<br />Submit manuscript<br />Submit data<br />Prompt author<br />Manuscript metadata<br />Review passcode<br />Peer review<br />Acceptance notification<br />Curation<br />Data DOI<br />Production<br />Article metadata<br />Curation<br />Article<br />Publication<br />Data publication<br />Article DOI/final metadata harvested<br />
    22. 22.
    23. 23. Non-integrated<br />Integrated<br />Submit manuscript<br />Submit data<br />Prompt author<br />Manuscript metadata<br />Review passcode<br />Peer review<br />Submit data<br />Acceptance notification<br />Curation<br />Data DOI<br />Production<br />Article metadata<br />Curation<br />Article<br />Publication<br />Data publication<br />Article DOI/final metadata harvested<br />
    24. 24. Non-integrated<br />Integrated<br />Submit manuscript<br />Submit data<br />Prompt author<br />Manuscript metadata<br />Review passcode<br />Peer review<br />Submit data<br />Acceptance notification<br />Curation<br />Data DOI<br />Production<br />Author includes data DOI<br />Data DOI<br />Article metadata<br />Curation<br />Article<br />Publication<br />Data publication<br />Article DOI/final metadata harvested<br />
    25. 25. Non-integrated<br />Integrated<br />Submit manuscript<br />Submit data<br />Prompt author<br />Manuscript metadata<br />Review passcode<br />Peer review<br />Submit data<br />Acceptance notification<br />Curation<br />Data DOI<br />Production<br />Author includes data DOI<br />Data DOI<br />Article metadata<br />Curation<br />Article<br />Publication<br />Data publication<br />Article publication<br />DOI/final metadata<br />harvested<br />Article DOI/final metadata harvested<br />
    26. 26. Dryad relative to Supplementary Online Materials<br />* A few publisher SOM sites are exceptions to the general rule<br />** Practices differ among publishers, see Smit (2011), doi:10.1045/january2011-smit<br />26<br />
    27. 27. Article citation<br />Wu D, Wu M, Halpern A, Rusch DB, Yooseph S, Frazier M, Venter JC, Eisen JA (2011) Stalking the fourth domain in metagenomic data: searching for, discovering, and interpreting novel, deep branches in phylogenetic trees of phylogenetic marker genes. PLoS ONE 6(3): e18011. doi:10.1371/journal.pone.0018011<br />Data citation<br />Wu D, Wu M, Halpern A, Rusch DB, Yooseph S, Frazier M, Venter JC, Eisen JA (2011) Data from: Stalking the fourth domain in metagenomic data: searching for, discovering, and interpreting novel, deep branches in phylogenetic trees of phylogenetic marker genes. Dryad Digital Repository. doi:10.5061/dryad.8384<br />
    28. 28. Rebbeck CA, Leroi AM, Burt A (2011) Mitochondrial capture by a transmissible cancer. Science 331, 303<br />
    29. 29.
    30. 30. Number of data packages<br />
    31. 31. 20 papers from Delsuc and <br />Douzery going back to 2002 <br />
    32. 32. By now, downloaded >1000X<br />
    33. 33. Fulfilling the role of a journal<br />
    34. 34.
    35. 35. Does sharing imply that it need be altruistic?<br />For a set of 85 cancer microarray clinical trials<br />48% had publicly available data<br />These received 85% of the article citations<br />Independent of journal impact factor, publication date, author nationality<br />Piwowar H, et al. (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308. <br />
    36. 36. Does sharing imply that it need be altruistic?<br />For a set of 85 cancer microarray clinical trials<br />48% had publicly available data<br />These received 85% of the article citations<br />Independent of journal impact factor, publication date, author nationality<br />Piwowar H, et al. (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308. <br />
    37. 37. Data policies among bioscience journals<br />IF=3.6<br />IF=6.0<br />IF=4.5<br />n=70<br />Piwowar HA, Chapman WW (2008) A review of journal policies for sharing research data. Presented at ELPUB2008, Nature Precedings hdl:10101/npre.2008.1700.1 <br />
    38. 38. The value proposition<br />For researchers<br />Increase the impact of, and citations to, published research.<br />Preserve and make data available to verify published results, to refine methodologies, and to repurpose. <br />Free researchers from the burden of data preservation and access.<br />For journals, publishers and societies<br />Free journals from the burden of managing supplemental data <br />Increase the discoverability, impact, and integrity of articles<br />Increase their value to the community they serve.<br />For funders<br />A cost-effective mechanism to make research more accessible<br />Leverage existing investments in order to enable new science<br />
    39. 39. Sustainability and governance<br />Business model<br />Long-term preservation requires a long-term organization<br />In Dryad’s case, a membership-based nonprofit <br />Revenue received from a broad array of ‘customers, including journals, societies, publishers, and researchers<br />Deposit charges<br />Paid upfront, when the majority of costs are incurred<br />Ensure free access to the data in perpetuity<br />Allow revenue to naturally scale with costs (i.e. volume of deposits)<br />Distribute costs fairly among stakeholders<br />Governance<br />12 member Board of Directors nominated, elected by Membership <br />Membership serves in advisory capacity, and is a community of practice<br />
    40. 40. Costs<br />Moderate economies of scale are required<br />At 10K packages/yr, <$50/deposit, depending on curation<br />What are the costs for SOM?<br />Journal of Clinical Investigation: $300 flat fee<br />Ecological Archives: $250 <10Mb, more fees beyond that<br />FASEB: $100 per file<br />Beagrie N, Eakin-Richards L, Vision TJ (2009) Business models and cost estimation: Dryad repository case study. iPRES 2010<br />
    41. 41. Proposed payment plans<br />Journal-based<br />annual fee based on all research articles published/yr (~$25/per*)<br />covers any deposits from the journal (even from prior yrs)<br />Voucher-based<br />pay in advance for some number of deposits (<$50/per deposit)<br />Pay-as-you-go: <br />be invoiced retrospectively for deposits (>$50/per deposit)<br />Author-pays<br />Author pays online at time of deposit<br />Journal can still facilitate archiving through submission integration<br />*These are rates for Members, which include a 10% discount <br />
    42. 42. What is the return on investment?<br />A rigorous framework is lacking<br />But we can look at comparators<br />Marginal cost of data archiving<br />$50/article is <2% of of publication costs (>$2.5K)<br />And 0.2% of grant costs/article (~$25K)<br />Is the data worth 2% of the research investment?<br />Using DNA microarray data in GEO as a model<br />2,711 submissions in 2007<br />Data reused by 3rd parties in >1,150 articles<br />Vision (2011) Open data and social contract of scientific publishing. BioScience, 60(5):330-330 Piwowar H, Vision TJ, Whitlock MC (2011) Data archiving is a good investment. Nature 473:285<br />
    43. 43.
    44. 44. http://datadryad.org<br />http://blog.datadryad.org<br />http://datadryad.org/wiki<br />http://code.google.com/p/dryad<br />dryad-users@nescent.org<br /> @datadryad<br /> Dryad<br />
    45. 45. A very incomplete list of contributors<br />JDAP: M. Whitlock<br />DryadUS. R. Scherle, E. Feinstein, J. Greenberg, H. Piwowar, P. Schaeffer<br />DryadUK: B. Hole, Max Wilkinson, D. Shotton<br />Sustainability planning: N. Beagrie, L. Eakin-Richards<br />
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×