Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Public Sharing of Research Datasets: A Pilot Study of Associations

1,395 views

Published on

Presented at ASIST & ISSI Pre-Conference
Symposium on Informetrics and Scientometrics on Nov 7, 2009

http://www.sois.uwm.edu/MetricsPreCon/program.html

Published in: Health & Medicine, Technology
  • Be the first to comment

Public Sharing of Research Datasets: A Pilot Study of Associations

  1. 1. Public sharing of research datasets: a pilot study of associations Heather Piwowar and Wendy Chapman Department of Biomedical Informatics University of Pittsburgh
  2. 2. http://www.flickr.com/photos/vroomvroommm/3457772539 data data
  3. 3. http://www.flickr.com/photos/75166820@N00/5318468/ stale
  4. 4. http://www.flickr.com/photos/ryanr/142455033/ sounds great
  5. 5. http://www.flickr.com/photos/faerie-dust/2315927946/ but not easy
  6. 6. http://www.flickr.com/photos/sunrise/35819369/ http://www.flickr.com/photos/fboyd/2156630044/ persuade
  7. 7. http://www.flickr.com/photos/mesh/14102209/ does it work?
  8. 8. Prior work has focused on surveys and studies of intention. Our aim: measure associations between observed data sharing behaviour and environmental variables aim
  9. 9. Funder Journal Investigator Institution Study Is research data shared after publication? aim
  10. 10. Funder Journal Investigator Institution Study Is research data shared after publication? aim
  11. 11. http://en.wikipedia.org/wiki/DNA_microarray http://en.wikipedia.org/wiki/Image:Heatmap.png http://commons.wikimedia.org/wiki/ File:DNA_double_helix_vertikal.PNG microarray data
  12. 12. microarray data
  13. 13. Ochsner et al. (2008). Much room for improvement in deposition rates of expression microarray datasets. Nature Methods, 5(12), 991. Manually reviewed 20 journals for 2007: 400 studies 200 shared their microarray data data sample
  14. 14. Journal Funder Journal Investigator impact mandates mandates “experience” factor Is research data shared after publication? variables
  15. 15. Funder mandates variables
  16. 16. Funder mandates NIH 2003 Data Sharing Requirement Requires a data sharing plan for studies funded after October 2003 that receive more than $500 000 in direct funding per year variables
  17. 17. Funder mandates Assumed data sharing requirement was applicable if: the NIH grant numbers associated with PubMed entry had $750 000 in total funding any year since 2004 plus a NIH grant number with a leading “1” or “2” since 2004 variables
  18. 18. Journal mandates variables
  19. 19. Journal mandates Piwowar and Chapman. A review of journal policies for sharing research data. International Conference on Electronic Publishing (ELPUB) 2008 Journal Policy Strength: Strong, Weak, or None variables
  20. 20. Author experience variables
  21. 21. Author experience Publication history and impact variables
  22. 22. Author experience “experience and impact” proxy: • years since first publication • h-index estimate • a-index estimate Scriptable, to allow scaling up to thousands of authors? variables
  23. 23. Author experience Author publication history variables
  24. 24. Author experience Citation counts variables
  25. 25. Author experience Author name disambiguation Author-ity web service: Torvik & Smalheiser. (2009). Author Name Disambiguation in MEDLINE. ACM Transactions on Knowledge Discovery from Data, 3(3):11. variables
  26. 26. Author experience PubMed + PubMed Central + Author-ity to compute pubmedi citation estimates ➡ not comprehensive account of publication accomplishments ➡ for aggregate analysis: free, open, scriptable, flexible, reproducible. variables
  27. 27. Author experience For each first and last author, we used the first principal component of: • years since first publication • pubmedi h-index estimate • pubmedi a-index estimate variables
  28. 28. Journal Funder Journal Investigator impact mandates mandates “experience” factor Is research data shared after publication? variables
  29. 29. Univariate odds ratios Multivariate logistic regression stats
  30. 30. http://www.flickr.com/photos/paperpariah/3002687604/ results
  31. 31. Not statistically significant Statistically significant Journal Funder Journal Investigator impact mandates mandates “experience” factor Is research data shared after publication? results
  32. 32. Funder mandates 33% results
  33. 33. Journal Journal impact mandates factor Strength of journal data sharing policy is very correlated with impact factor results
  34. 34. Investigator “experience” results
  35. 35. Investigator “experience” results
  36. 36. Investigator “experience” results
  37. 37. Investigator “experience” results
  38. 38. Investigator “experience” results
  39. 39. Investigator “experience” results
  40. 40. http://www.flickr.com/photos/vlastula/300102949/ • Association does not imply causation • Only one datatype • Small sample, limited variables • Dataset contains disproportionate number of high-impact studies limitations
  41. 41. • NIH data sharing plan applies to a minority of NIH microarray studies • NIH data sharing plan does not seem to increase frequency of data sharing • More experienced investigators are more likely to share data prelim conclusions
  42. 42. http://www.flickr.com/photos/krcla/2069243613/ PhD dissertation! • More samples • More variables next steps
  43. 43. http://www.flickr.com/photos/cogdog/123072/ Spin-off projects: • Quantify usefulness of pubmedi h-index • Study the patterns and prevalence of data reuse future
  44. 44. Dept of Biomedical Informatics at U of Pittsburgh NLM for training grant funding Open science online community and those who release their articles, datasets and photos openly Dr Wendy Chapman for her support and feedback thanks
  45. 45. Journal mandates variables
  46. 46. Journal Policy strength mandates categorization: None: No applicable mention of data sharing Weak: Request or unenforceable requirement Strong: Require data deposit accession number as a condition of publication variables
  47. 47. http://www.flickr.com/photos/myklroventine/892446624/ I post my data, code, and statistical scripts at http://www.dbmi.pitt.edu/piwowar Share yours too! open science

×