Public sharing of
    research datasets:
      a pilot study of
        associations

Heather Piwowar and Wendy Chapman

 ...
http://www.flickr.com/photos/vroomvroommm/3457772539




                                  data data
http://www.flickr.com/photos/75166820@N00/5318468/




                                                     stale
http://www.flickr.com/photos/ryanr/142455033/




      sounds great
http://www.flickr.com/photos/faerie-dust/2315927946/




          but not easy
http://www.flickr.com/photos/sunrise/35819369/
http://www.flickr.com/photos/fboyd/2156630044/




                        ...
http://www.flickr.com/photos/mesh/14102209/




    does it work?
Prior work has focused on surveys and
studies of intention.


Our aim: measure associations between
observed data sharing ...
Funder   Journal    Investigator   Institution   Study




               Is research data shared
                   after...
Funder   Journal    Investigator   Institution   Study




               Is research data shared
                   after...
http://en.wikipedia.org/wiki/DNA_microarray
   http://en.wikipedia.org/wiki/Image:Heatmap.png
   http://commons.wikimedia....
microarray
      data
Ochsner et al. (2008). Much room for improvement in
deposition rates of expression microarray datasets. Nature
Methods, 5(...
Journal
 Funder     Journal                     Investigator
                         impact
mandates   mandates          ...
Funder
mandates




           variables
Funder
 mandates


NIH 2003 Data Sharing Requirement

Requires a data sharing plan
for studies funded after October 2003
t...
Funder
mandates



Assumed data sharing requirement was applicable if:
the NIH grant numbers associated with PubMed entry ...
Journal
mandates




           variables
Journal
 mandates



Piwowar and Chapman.
A review of journal policies for sharing research data.
International Conference...
Author
experience




             variables
Author
experience


    Publication history and impact




             variables
Author
experience


       “experience and impact” proxy:
             •
           years since first publication
         ...
Author
experience
             Author publication history




             variables
Author
experience
             Citation counts




             variables
Author
experience
                   Author name disambiguation




    Author-ity web service:
    Torvik & Smalheiser. (...
Author
experience
                   PubMed + PubMed Central +
                   Author-ity to compute
                  ...
Author
experience   For each first and last author,
             we used the first principal
             component of:
    ...
Journal
 Funder     Journal                     Investigator
                         impact
mandates   mandates          ...
Univariate odds ratios
Multivariate logistic regression




                      stats
http://www.flickr.com/photos/paperpariah/3002687604/




                                                       results
Not statistically significant             Statistically significant



                                   Journal
 Funder ...
Funder
mandates




           33%

            results
Journal
            Journal
impact
           mandates
 factor




     Strength of journal data sharing policy
     is ve...
Investigator
“experience”




                results
Investigator
“experience”




                results
Investigator
“experience”




                results
Investigator
“experience”




                results
Investigator
“experience”




                results
Investigator
“experience”




                results
http://www.flickr.com/photos/vlastula/300102949/




            •     Association does not imply causation
            • ...
•   NIH data sharing plan applies to a
    minority of NIH microarray studies
•   NIH data sharing plan does not seem
    ...
http://www.flickr.com/photos/krcla/2069243613/




             PhD dissertation!
                   • More samples
      ...
http://www.flickr.com/photos/cogdog/123072/




        Spin-off projects:
         • Quantify usefulness of pubmedi h-ind...
Dept of Biomedical Informatics at U of Pittsburgh
NLM for training grant funding
Open science online community and those w...
Journal
mandates




           variables
Journal         Policy strength
mandates         categorization:

    None: No applicable mention of data sharing

    Wea...
http://www.flickr.com/photos/myklroventine/892446624/




         I post my data, code, and statistical scripts at
      ...
Public Sharing of Research Datasets: A Pilot Study of Associations
Upcoming SlideShare
Loading in...5
×

Public Sharing of Research Datasets: A Pilot Study of Associations

1,033

Published on

Presented at ASIST & ISSI Pre-Conference
Symposium on Informetrics and Scientometrics on Nov 7, 2009

http://www.sois.uwm.edu/MetricsPreCon/program.html

Published in: Health & Medicine, Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,033
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Public Sharing of Research Datasets: A Pilot Study of Associations

  1. 1. Public sharing of research datasets: a pilot study of associations Heather Piwowar and Wendy Chapman Department of Biomedical Informatics University of Pittsburgh
  2. 2. http://www.flickr.com/photos/vroomvroommm/3457772539 data data
  3. 3. http://www.flickr.com/photos/75166820@N00/5318468/ stale
  4. 4. http://www.flickr.com/photos/ryanr/142455033/ sounds great
  5. 5. http://www.flickr.com/photos/faerie-dust/2315927946/ but not easy
  6. 6. http://www.flickr.com/photos/sunrise/35819369/ http://www.flickr.com/photos/fboyd/2156630044/ persuade
  7. 7. http://www.flickr.com/photos/mesh/14102209/ does it work?
  8. 8. Prior work has focused on surveys and studies of intention. Our aim: measure associations between observed data sharing behaviour and environmental variables aim
  9. 9. Funder Journal Investigator Institution Study Is research data shared after publication? aim
  10. 10. Funder Journal Investigator Institution Study Is research data shared after publication? aim
  11. 11. http://en.wikipedia.org/wiki/DNA_microarray http://en.wikipedia.org/wiki/Image:Heatmap.png http://commons.wikimedia.org/wiki/ File:DNA_double_helix_vertikal.PNG microarray data
  12. 12. microarray data
  13. 13. Ochsner et al. (2008). Much room for improvement in deposition rates of expression microarray datasets. Nature Methods, 5(12), 991. Manually reviewed 20 journals for 2007: 400 studies 200 shared their microarray data data sample
  14. 14. Journal Funder Journal Investigator impact mandates mandates “experience” factor Is research data shared after publication? variables
  15. 15. Funder mandates variables
  16. 16. Funder mandates NIH 2003 Data Sharing Requirement Requires a data sharing plan for studies funded after October 2003 that receive more than $500 000 in direct funding per year variables
  17. 17. Funder mandates Assumed data sharing requirement was applicable if: the NIH grant numbers associated with PubMed entry had $750 000 in total funding any year since 2004 plus a NIH grant number with a leading “1” or “2” since 2004 variables
  18. 18. Journal mandates variables
  19. 19. Journal mandates Piwowar and Chapman. A review of journal policies for sharing research data. International Conference on Electronic Publishing (ELPUB) 2008 Journal Policy Strength: Strong, Weak, or None variables
  20. 20. Author experience variables
  21. 21. Author experience Publication history and impact variables
  22. 22. Author experience “experience and impact” proxy: • years since first publication • h-index estimate • a-index estimate Scriptable, to allow scaling up to thousands of authors? variables
  23. 23. Author experience Author publication history variables
  24. 24. Author experience Citation counts variables
  25. 25. Author experience Author name disambiguation Author-ity web service: Torvik & Smalheiser. (2009). Author Name Disambiguation in MEDLINE. ACM Transactions on Knowledge Discovery from Data, 3(3):11. variables
  26. 26. Author experience PubMed + PubMed Central + Author-ity to compute pubmedi citation estimates ➡ not comprehensive account of publication accomplishments ➡ for aggregate analysis: free, open, scriptable, flexible, reproducible. variables
  27. 27. Author experience For each first and last author, we used the first principal component of: • years since first publication • pubmedi h-index estimate • pubmedi a-index estimate variables
  28. 28. Journal Funder Journal Investigator impact mandates mandates “experience” factor Is research data shared after publication? variables
  29. 29. Univariate odds ratios Multivariate logistic regression stats
  30. 30. http://www.flickr.com/photos/paperpariah/3002687604/ results
  31. 31. Not statistically significant Statistically significant Journal Funder Journal Investigator impact mandates mandates “experience” factor Is research data shared after publication? results
  32. 32. Funder mandates 33% results
  33. 33. Journal Journal impact mandates factor Strength of journal data sharing policy is very correlated with impact factor results
  34. 34. Investigator “experience” results
  35. 35. Investigator “experience” results
  36. 36. Investigator “experience” results
  37. 37. Investigator “experience” results
  38. 38. Investigator “experience” results
  39. 39. Investigator “experience” results
  40. 40. http://www.flickr.com/photos/vlastula/300102949/ • Association does not imply causation • Only one datatype • Small sample, limited variables • Dataset contains disproportionate number of high-impact studies limitations
  41. 41. • NIH data sharing plan applies to a minority of NIH microarray studies • NIH data sharing plan does not seem to increase frequency of data sharing • More experienced investigators are more likely to share data prelim conclusions
  42. 42. http://www.flickr.com/photos/krcla/2069243613/ PhD dissertation! • More samples • More variables next steps
  43. 43. http://www.flickr.com/photos/cogdog/123072/ Spin-off projects: • Quantify usefulness of pubmedi h-index • Study the patterns and prevalence of data reuse future
  44. 44. Dept of Biomedical Informatics at U of Pittsburgh NLM for training grant funding Open science online community and those who release their articles, datasets and photos openly Dr Wendy Chapman for her support and feedback thanks
  45. 45. Journal mandates variables
  46. 46. Journal Policy strength mandates categorization: None: No applicable mention of data sharing Weak: Request or unenforceable requirement Strong: Require data deposit accession number as a condition of publication variables
  47. 47. http://www.flickr.com/photos/myklroventine/892446624/ I post my data, code, and statistical scripts at http://www.dbmi.pitt.edu/piwowar Share yours too! open science
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×