37. Doesn’t work
self-reported denying a request in last 3 years
trainees self-reported denying a request
been denied access to data, materials, code
authors “not able to retrieve raw data”
not willing to release data
0% 10% 20% 30% 40%
Campbell et al. JAMA. 2002.
Kyzas et al. J Natl Cancer Inst. 2005.
Vogeli et al. Acad Med. 2006.
Reidpath et al. Bioethics 2001.
38. Don’t get the email
Evangelou et al. FASEB J. 2006.
Wren. Bioinformatics 2008.
Wren et al. EMBO Rep 2006.
39. Say no
want to publish more papers first
want exclusive use
ensure data confidentiality
control
avoid cost of preparation
0% 10% 20% 30% 40% 50%
Hedstrom. Society of Am Archivists Ann Meeting. 2008.
40. Ask why
`Before I send you the data could I ask what you want it for?'
`Can you be more explicit, please, about the analyses you have in
mind and what you plan to do with them?'
`We'll have to discuss your request with the other coauthors.
Before we do that, I'd like to know your proposed analysis plan.'
`We are not finished using the data, but when we are finished with
it, we would be open to requests for the data.'
`Any use of the data other than for the specific purpose laid down
in the contract of collaboration is effectively ruled out.'
Reidpath et al. Bioethics 2001.
43. Has real costs.
Survey of doctoral students and postdocs:
28-50% reported withholding negative effects:
• hurt progress of their research,
• hurt rate of discovery in their lab/research group,
• hurt quality of their relationships with academic
scientists,
• hurt quality of their education,
• hurt level of communication in their lab/research
group.
Vogeli et al. Acad Med. 2006 Feb; 81(2):128-36
44. Ok, then on a website?
No. Urls stop working.
Evangelou et al. FASEB J. 2006.
Wren. Bioinformatics 2008.
Wren et al. EMBO Rep 2006.
54. Funder Journal Investigator Institution Study
Is research data shared
after publication?
55. Funder Journal Investigator Institution Study
funded by impact years since sector humans?
NIH? factor first paper
size mice?
size of strength of # pubs
grant policy impact plants?
# citations rank
sharing open cancer?
plan req’d? access? previously country
shared? clinical
funded by number of trial?
non-NIH? microarray previously
reused? number of
studies authors
published gender
year
56. journal data sharing policy
“An inherent principle of publication is that
others should be able to replicate and build
upon the authors' published claims.
Therefore, a condition of publication
in a Nature journal is that authors are
required to make materials, data and
associated protocols available in a publicly
accessible database …”
http://www.nature.com/authors/editorial_policies/availability.html
http://www.nature.com/nature/journal/v453/n7197/index.html
64. Proportion of articles with shared datasets, by year
0.35
Proportion of articles with datasets found in GEO or ArrayExpress
0.30
0.25
0.20
0.15
Across time
0.10
0.05
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
Year article published
69. We looked at data sharing policies
within Instruction to Author
statements of 70 journals, as they
apply to gene expression microarray
data.
Piwowar and Chapman. ELPUB 2008
70. strength of data sharing policies
No applicable policy (43%)
Weak policy (24%)
should, recommend, request
must, but without requiring database accession number
Strong policy (33%)
must, required, condition of publication
requires database accession number
72. Articles published in journals
with a strong data-sharing policy
are more likely to have publicly
available datasets
73. What can we do
about it?
Learn
• Learn from those who do it well
• Focus on places that need it
74. Proportion of datasets shared
0.0
0.2
0.4
0.6
0.8
1.0
Physiol Genomics
PLoS Genet
Genome Biol
Microbiology
PLoS One
BMC Genomics
Plant Cell
Genome Res
Eukaryot Cell
Appl Environ Microbiol
BMC Med Genomics
Hum Mol Genet
Proc Natl Acad Sci U S A
Infect Immun
Am J Respir Cell Mol Biol
Dev Biol
J Bacteriol
Mol Endocrinol
BMC Cancer
Plant Physiol
Biol Reprod
Blood
J Immunol
FASEB J
Toxicol Sci
J Exp Bot
Nucleic Acids Res
Diabetes
Mol Cell Biol
Mol Cancer Ther
BMC Bioinformatics
Stem Cells
FEBS Lett
J Neurosci
Am J Pathol
J Biol Chem
J Virol
OTHER
Cancer Res
J Clin Endocrinol Metab
Plant Mol Biol
Clin Cancer Res
Genomics
Journals
Invest Ophthalmol Vis Sci
Mol Hum Reprod
Carcinogenesis
Gene
Endocrinology
Oncogene
Cancer Lett
Biochem Biophys Res Commun
(Physiological Genomics)
75. Proportion of datasets shared
0.0
0.2
0.4
0.6
0.8
1.0
Stanford University
University of Pennsylvania
University of Illinois
University of California, Los Angeles
University of Wisconsin, Madison
University of Washington
University of California, Davis
The University of British Columbia
University of California, San Francisco
University of Florida
University of California, San Diego
University of Minnesota, Twin Cities
Baylor College of Medicine
OTHER
Max Planck Gesellschaft
Harvard University
Duke University Medical Center
Yale University
Johns Hopkins University
University of Pittsburgh
(Stanford)
Washington University in Saint Louis
University of Toronto
University of California, Berkeley
University of Michigan, Ann Arbor
Michigan State University
Institutions
National Cancer Institute
Tokyo Daigaku
77. Multivariate nonlinear regressions with interactions
Odds Ratio
0.25 0.50 1.00 2.00 4.00 8.00
Has journal policy
Multivariate nonlinear regressions with interactions
Count of R01 & other NIH grants Odds Ratio
0.95
0.25 0.50 1.00 2.00 4.00 8.00
Authors prev GEOAE sharing & OA & microarray creation
Has journal policy
NO K funding other P funding
Count of R01 & or NIH grants
0.95
Authors prev GEOAE sharing & OA & microarray creation
NO K Journalfunding
funding or P impact
Institution high citations & collaboration
Journal policy consequences & Journal impact long halflife
Journal policy consequences & long halflife
Institution high citations NOTcollaboration & animals or mice
Instititution is government & NOT higher ed
NOT animals or mice
Last author num prev pubs & first year pub
Large NIH grant
Instititution is government & NOT higher ed Humans & cancer
NO geo reuse + YES high institution output
Last author num prev pubs & first year pub
First author num prev pubs & first year pub
Large NIH grant
Humans & cancer
NO geo reuse + YES high institution output
First author num prev pubs & first year pub
78. Multivariate nonlinear regressions with interactions
Odds Ratio
0.25 0.50 1.00 2.00 4.00 8.00
Has journal policy
Multivariate nonlinear regressions with interactions
Count of R01 & other NIH grants Odds Ratio
0.95
0.25 0.50 1.00 2.00 4.00 8.00
Authors prev GEOAE sharing & OA & microarray creation
Has journal policy
NO K funding other P funding
Count of R01 & or NIH grants
0.95
Authors prev GEOAE sharing & OA & microarray creation
NO K Journalfunding
funding or P impact
Institution high citations & collaboration
Journal policy consequences & Journal impact long halflife
Journal policy consequences & long halflife
Institution high citations NOTcollaboration & animals or mice
Instititution is government & NOT higher ed
NOT animals or mice
Last author num prev pubs & first year pub
Large NIH grant
Instititution is government & NOT higher ed Humans & cancer
NO geo reuse + YES high institution output
Last author num prev pubs & first year pub
First author num prev pubs & first year pub
Large NIH grant
Humans & cancer
NO geo reuse + YES high institution output
First author num prev pubs & first year pub
79. Multivariate nonlinear regression with interactions
Odds Ratio
0.25 0.50 1.00 2.00 4.00
OA journal & previous GEO-AE sharing
Amount of NIH funding
0.95
Journal impact factor and policy
Higher Ed in USA
Cancer & humans
80. Multivariate nonlinear regression with interactions
Odds Ratio
0.25 0.50 1.00 2.00 4.00
OA journal & previous GEO-AE sharing
Amount of NIH funding
0.95
Journal impact factor and policy
Higher Ed in USA
Cancer & humans
83. currency of value?
Citations.
$50!
Diamond,Arthur M. What is a Citation Worth?.
The Journal of Human Resources (1986)
vol. 21 (2) pp. 200-215
84. dataset
85 cancer microarray trials published in 1999-2003, as
identified by Ntzani and Ioannidis (2003)
citations
ISI Web of Science Citation index, citations from
2004-2005
data sharing locations
Publisher and lab websites, microarray databases, WayBack
Internet Archive, Oncomine
statistics
Multivariate linear regression
96. a) in our
communities
- strengthening policies:
- journal, conference, institutional
- decision-makers
- role-models and educators
97. b) in our tools
- measure opinions
- measure use
- be transparent!
98. c) with our data
- share it.
- ugly? incomplete? strange?
“Flawed, but out there”
is a million times better than
“perfect, but unattainable”
http://sciblogs.co.nz/seeing-data/2010/10/12/the-zen-of-open-data/
99. “Does anyone want your data?
That’s hard to predict […]
After all, no one ever knocked on your
door asking to buy those figurines
collecting dust in your cabinet before you
listed them on eBay.
Your data, too, may simply be awaiting an
effective matchmaker.”
Got data? Nature Neuroscience (2007)
100. I post my data, code, and statistical scripts:
http://researchremix.org
Share yours too!
http://www.flickr.com/photos/myklroventine/892446624/
101. More info?
• OATP oa.data tag
on Connotea, Twi1er
• FriendFeed
• Mendeley
“data sharing” group
• @researchremix
piwowar@zoology.ubc.ca
102. thank you
Todd Vision,
Michael Whitlock,
Wendy Chapman
The open science online community and those who
release their articles, datasets and photos openly