SlideShare a Scribd company logo
1 of 104
Download to read offline
Open research data
            Heather Piwowar
 DataONE postdoc with Dryad and NESCent, UBC
              @researchremix

                OA week 2010
         University of British Columbia
#1

It matters
http://www.metmuseum.org/toah/ho/09/euwf/ho_24.45.1.htm
http://www.flickr.com/photos/jsmjr/62443357/
http://www.flickr.com/photos/camilleharrington/3587294608/
http://www.flickr.com/photos/rkuhnau/3318245976/
http://www.flickr.com/photos/conformpdx/1796399674/
http://www.flickr.com/photos/rkuhnau/3317418699/
http://www.flickr.com/photos/zemlinki/261617721/
http://www.flickr.com/photos/tracenmatt/3020786491/
http://www.flickr.com/photos/the-o/2078239333/
http://www.flickr.com/photos/75166820@N00/5318468/
#2

Wayfinding + progress
http://www.flickr.com/photos/paulhami/1020538523//
Which data?




              http://www.flickr.com/photos/paulhami/1020538523//
Where?




         http://www.flickr.com/photos/paulhami/1020538523//
With whom?




      http://www.flickr.com/photos/paulhami/1020538523//
When?




        http://www.flickr.com/photos/paulhami/1020538523//
Under what terms?




                http://www.flickr.com/photos/paulhami/1020538523//
http://www.flickr.com/photos/paulhami/1020538523//
Find
Organize
Document
Deidentify
Format
Ask
Submit

Answer questions
Worry about mistakes being found
Worry about data being misinterpreted
Worry about being scooped
Forgo money and IP and prestige???
not very motivating.
http://www.flickr.com/photos/johnnyvulkan/381941233/
     http://www.flickr.com/photos/tonivc/2283676770/
a) policies +
   expectations

- NSF
- Joint Data Archiving Policy
- BioMed Central
- PLoS
b) repositories


- datatype-based
- institution-based
- discipline-based
- journal-based
c) standards


- data licenses
- data citation
- IDs for datasets, people, entities
d) part of something
   bigger

- open government data
- citizen science
- supplemental materials
- dataset-based usage metrics
- awards, recognition
#3

Is it working?
lots of data sharing!




                        http://www.genome.jp/en/db_growth.html
but how much isn’t 
 shared?

  what isn’t shared?
              who isn’t sharing it?
why not?
     how much does it matter?
             what can we do 
              about it?
you can not manage
what you do not measure




               quote: Lord Kelvin
               http://www.flickr.com/photos/archeon/2941655917/
http://www.flickr.com/photos/ryanr/142455033/
Why is it important?
Are we sure?
Errors.

More than half of all papers contain errors
5‐10% contain errors that change the conclusions




        Gore et al 1977, Kantoer and Taylor 1994, McGuigan 1995, Hurlbert and White 1993
Ok, let’s share on
request.
Doesn’t work
self-reported denying a request in last 3 years

     trainees self-reported denying a request

  been denied access to data, materials, code

      authors “not able to retrieve raw data”

                    not willing to release data

                                                  0%   10%      20%      30%      40%

                                                               Campbell et al. JAMA. 2002.
                                                       Kyzas et al. J Natl Cancer Inst. 2005.
                                                              Vogeli et al. Acad Med. 2006.
                                                             Reidpath et al. Bioethics 2001.
Don’t get the email




                Evangelou et al.  FASEB J.  2006.
                    Wren.  Bioinformatics 2008.
                   Wren et al.  EMBO Rep 2006.
Say no

want to publish more papers first
              want exclusive use
      ensure data confidentiality
                         control
        avoid cost of preparation
                                    0%    10% 20% 30% 40% 50%




                        Hedstrom. Society of Am Archivists Ann Meeting. 2008.
Ask why
`Before I send you the data could I ask what you want it for?'
`Can you be more explicit, please, about the analyses you have in 
  mind and what you plan to do with them?'

`We'll have to discuss your request with the other coauthors.  
 Before we do that, I'd like to know your proposed analysis plan.' 

`We are not finished using the data, but when we are finished with 
 it, we would be open to requests for the data.'

`Any use of the data other than for the specific purpose laid down 
  in the contract of collaboration is effectively ruled out.'
                                                Reidpath et al. Bioethics 2001.
Not efficient.
Not efficient.
Not fair.
    Not random:
    ‐ young
    ‐ productive



                   Campbell et all 2000
Has real costs.
Survey of doctoral students and postdocs:


28-50% reported withholding negative effects:
  • hurt progress of their research,
  • hurt rate of discovery in their lab/research group,
  • hurt quality of their relationships with academic
    scientists,
  • hurt quality of their education,
  • hurt level of communication in their lab/research
    group.
                                   Vogeli et al. Acad Med. 2006 Feb; 81(2):128-36
Ok, then on a website?
No. Urls stop working.




               Evangelou et al.  FASEB J.  2006.
                   Wren.  Bioinformatics 2008.
                  Wren et al.  EMBO Rep 2006.
Ok, in a repository?
lots of data sharing!




                        http://www.genome.jp/en/db_growth.html
http://www.flickr.com/photos/g_kat26/4255119413/
http://www.flickr.com/photos/jima/606588905/
Combined, these full-text portals reach 85%
of the articles available through
U of Pittsburgh library subscriptions.
microarray data




 http://en.wikipedia.org/wiki/DNA_microarray
 http://en.wikipedia.org/wiki/Image:Heatmap.png




 http://commons.wikimedia.org/wiki/File:DNA_double_helix_vertikal.PNG
11,603 studies that created
gene expression microarray data
Funder   Journal       Investigator   Institution   Study




                   Is research data shared
                       after publication?
Funder       Journal       Investigator   Institution     Study

funded by     impact         years since   sector        humans?
NIH?          factor         first paper
                                           size          mice?
size of       strength of    # pubs
grant         policy                       impact        plants?
                             # citations   rank
sharing       open                                       cancer?
plan req’d?   access?        previously    country
                             shared?                     clinical
funded by     number of                                  trial?
non-NIH?      microarray     previously
                             reused?                     number of
              studies                                    authors
              published      gender
                                                         year
journal data sharing policy


          “An inherent principle of publication is that
           others should be able to replicate and build
           upon the authors' published claims.
           Therefore, a condition of publication
           in a Nature journal is that authors are
           required to make materials, data and
           associated protocols available in a publicly
           accessible database …”


                          http://www.nature.com/authors/editorial_policies/availability.html
                              http://www.nature.com/nature/journal/v453/n7197/index.html
journal rank
institution rank




Yu et al. BMC medical
  informatics and decision
  making (2007) vol. 7 pp. 17
funding level

PubMed grant lists   + NIH grant details
study type
author gender
and so on...


    124 variables
11,603 studies


25% had links from datasets in databases
Proportion of articles with shared datasets, by year




                                                                    0.35
Proportion of articles with datasets found in GEO or ArrayExpress

                                                                    0.30
                                                                    0.25
                                                                    0.20
                                                                    0.15




                                                                                                          Across time
                                                                    0.10
                                                                    0.05




                                                                           2000   2001   2002   2003   2004   2005    2006   2007   2008   2009

                                                                                                  Year article published
What can we do
about it?
What can we do
about it?

Funder policies.
19%

Piwowar and Chapman. Journal of Informetrics 2010
What can we do
about it?

Journal policies.
We looked at data sharing policies
 within Instruction to Author
 statements of 70 journals, as they
 apply to gene expression microarray
 data.




                         Piwowar and Chapman. ELPUB 2008
strength of data sharing policies

    No applicable policy (43%)


    Weak policy (24%)
      should, recommend, request
      must, but without requiring database accession number
    Strong policy (33%)
      must, required, condition of publication
      requires database accession number
High-impact journals
     tend to have
a strong data-sharing
        policy
Articles published in journals
with a strong data-sharing policy
are more likely to have publicly
        available datasets
What can we do
about it?

Learn
• Learn from those who do it well
• Focus on places that need it
Proportion of datasets shared




                                     0.0
                                           0.2
                                                 0.4
                                                       0.6
                                                                      0.8
                                                                                    1.0
             Physiol Genomics
                    PLoS Genet
                   Genome Biol
                    Microbiology
                      PLoS One
                BMC Genomics
                       Plant Cell
                  Genome Res
                  Eukaryot Cell
        Appl Environ Microbiol
          BMC Med Genomics
                Hum Mol Genet
      Proc Natl Acad Sci U S A
                   Infect Immun
      Am J Respir Cell Mol Biol
                         Dev Biol
                      J Bacteriol
                 Mol Endocrinol
                   BMC Cancer
                   Plant Physiol
                    Biol Reprod
                           Blood
                      J Immunol
                        FASEB J
                     Toxicol Sci
                       J Exp Bot
             Nucleic Acids Res
                        Diabetes
                    Mol Cell Biol
               Mol Cancer Ther
           BMC Bioinformatics
                     Stem Cells
                      FEBS Lett
                      J Neurosci
                    Am J Pathol
                    J Biol Chem
                           J Virol
                         OTHER
                    Cancer Res
       J Clin Endocrinol Metab
                  Plant Mol Biol
               Clin Cancer Res
                      Genomics
                                                                                   Journals




     Invest Ophthalmol Vis Sci
              Mol Hum Reprod
                Carcinogenesis
                            Gene
                 Endocrinology
                      Oncogene
                     Cancer Lett
Biochem Biophys Res Commun
                                                        (Physiological Genomics)
Proportion of datasets shared




                                            0.0
                                                     0.2
                                                           0.4
                                                                      0.6
                                                                                 0.8
                                                                                        1.0
                   Stanford University
            University of Pennsylvania
                   University of Illinois
  University of California, Los Angeles
     University of Wisconsin, Madison
             University of Washington
        University of California, Davis
    The University of British Columbia
University of California, San Francisco
                  University of Florida
   University of California, San Diego
  University of Minnesota, Twin Cities
           Baylor College of Medicine
                                OTHER
             Max Planck Gesellschaft
                    Harvard University
      Duke University Medical Center
                       Yale University


             Johns Hopkins University
               University of Pittsburgh
                                                                 (Stanford)




 Washington University in Saint Louis
                 University of Toronto
     University of California, Berkeley
    University of Michigan, Ann Arbor
             Michigan State University
                                                                              Institutions




             National Cancer Institute
                       Tokyo Daigaku
Proportion of datasets shared




       0.0
             0.2
                         0.4
                                       0.6
                                                   0.8
                                                             1.0




   1
 101
 201
 301
 401
 501
 601
 701
 801
 901
1001
1101
1201
1301
                                               rank




1401
1501
1601
1701
1801
1901
                                               Institution
Multivariate nonlinear regressions with interactions
                                                                       Odds Ratio
                                                                                        0.25       0.50                 1.00            2.00   4.00   8.00

                                                             Has journal policy
                                                       Multivariate nonlinear regressions with interactions
                            Count of                R01 & other NIH grants                 Odds Ratio




                                                                                                                                 0.95
                                                                                     0.25   0.50   1.00          2.00     4.00          8.00
Authors prev GEOAE sharing & OA & microarray creation
                                                                   Has journal policy
                                        NO K funding other P funding
                                                   Count of R01 & or NIH grants




                                                                                                          0.95
                        Authors prev GEOAE sharing & OA & microarray creation
                                                          NO K Journalfunding
                                                                funding or P impact
                                           Institution high citations & collaboration
              Journal policy consequences & Journal impact            long halflife
                                      Journal policy consequences & long halflife
                   Institution high citations NOTcollaboration  & animals or mice
                                      Instititution is government & NOT higher ed
                                                   NOT animals or mice
                                       Last author num prev pubs & first year pub
                                                                     Large NIH grant
              Instititution is government & NOT higher ed          Humans & cancer
                                      NO geo reuse + YES high institution output
               Last author num prev pubs & first year pub
                                       First author num prev pubs & first year pub

                                                             Large NIH grant
                                                          Humans & cancer
              NO geo reuse + YES high institution output
               First author num prev pubs & first year pub
Multivariate nonlinear regressions with interactions
                                                                       Odds Ratio
                                                                                        0.25       0.50                 1.00            2.00   4.00   8.00

                                                             Has journal policy
                                                       Multivariate nonlinear regressions with interactions
                            Count of                R01 & other NIH grants                 Odds Ratio




                                                                                                                                 0.95
                                                                                     0.25   0.50   1.00          2.00     4.00          8.00
Authors prev GEOAE sharing & OA & microarray creation
                                                                   Has journal policy
                                        NO K funding other P funding
                                                   Count of R01 & or NIH grants




                                                                                                          0.95
                        Authors prev GEOAE sharing & OA & microarray creation
                                                          NO K Journalfunding
                                                                funding or P impact
                                           Institution high citations & collaboration
              Journal policy consequences & Journal impact            long halflife
                                      Journal policy consequences & long halflife
                   Institution high citations NOTcollaboration  & animals or mice
                                      Instititution is government & NOT higher ed
                                                   NOT animals or mice
                                       Last author num prev pubs & first year pub
                                                                     Large NIH grant
              Instititution is government & NOT higher ed          Humans & cancer
                                      NO geo reuse + YES high institution output
               Last author num prev pubs & first year pub
                                       First author num prev pubs & first year pub

                                                             Large NIH grant
                                                          Humans & cancer
              NO geo reuse + YES high institution output
               First author num prev pubs & first year pub
Multivariate nonlinear regression with interactions
                                                 Odds Ratio
                                     0.25   0.50    1.00       2.00      4.00

OA journal & previous GEO-AE sharing

               Amount of NIH funding




                                                        0.95
      Journal impact factor and policy

                    Higher Ed in USA

                   Cancer & humans
Multivariate nonlinear regression with interactions
                                                 Odds Ratio
                                     0.25   0.50    1.00       2.00      4.00

OA journal & previous GEO-AE sharing

               Amount of NIH funding




                                                        0.95
      Journal impact factor and policy

                    Higher Ed in USA

                   Cancer & humans
Carrot?




          http://www.flickr.com/photos/sunrise/35819369/
currency of value?

     Citations.
currency of value?

     Citations.

           $50!




                     Diamond,Arthur M. What is a Citation Worth?.
                        The Journal of Human Resources (1986)
                        vol. 21 (2) pp. 200-215
dataset
85 cancer microarray trials published in 1999-2003, as
identified by Ntzani and Ioannidis (2003)

citations
ISI Web of Science Citation index, citations from
2004-2005

data sharing locations
Publisher and lab websites, microarray databases, WayBack
Internet Archive, Oncomine

statistics
Multivariate linear regression
Note:
 log
 scale
~70%
Next?




        http://www.flickr.com/photos/gatewaystreets/3838452287/
Abadie et al. Journal of the American Statistical Association 2010
http://www.flickr.com/photos/boitabulle/3668162701/
http://upload.wikimedia.org/wikipedia/commons/thumb/e/e6/
    Gamma_distribution_pdf.svg/500px-Gamma_distribution_pdf.svg.png
#4

We are the culture.
    Let’s do it.
http://www.flickr.com/photos/joellevand/279468607/
http://www.flickr.com/photos/huzzahvintage/4577075021/
a) in our
   communities

- strengthening policies:
  - journal, conference, institutional
- decision-makers
- role-models and educators
b) in our tools


- measure opinions
- measure use
- be transparent!
c) with our data

- share it.
- ugly? incomplete? strange?

       “Flawed, but out there”
     is a million times better than
     “perfect, but unattainable”

                 http://sciblogs.co.nz/seeing-data/2010/10/12/the-zen-of-open-data/
“Does anyone want your data?

That’s hard to predict […]
After all, no one ever knocked on your
door asking to buy those figurines
collecting dust in your cabinet before you
listed them on eBay.

Your data, too, may simply be awaiting an
effective matchmaker.”
                Got data? Nature Neuroscience (2007)
I post my data, code, and statistical scripts:
http://researchremix.org
Share yours too!


                         http://www.flickr.com/photos/myklroventine/892446624/
More info?
 • OATP oa.data tag 
  on Connotea, Twi1er
 • FriendFeed
 • Mendeley 
  “data sharing” group
 • @researchremix 
  piwowar@zoology.ubc.ca 
thank you
Todd Vision,
  Michael Whitlock,
  Wendy Chapman
The open science online community and those who
  release their articles, datasets and photos openly
http://www.flickr.com/photos/youraddresshere/6649228/

More Related Content

Viewers also liked

Open Access, Open Research, Open Data, Open Science, Open what? #gfm2013
Open Access, Open Research, Open Data, Open Science, Open what? #gfm2013Open Access, Open Research, Open Data, Open Science, Open what? #gfm2013
Open Access, Open Research, Open Data, Open Science, Open what? #gfm2013Christian Heise
 
Online Demographics and Online Use Habit of Indian Women: An Overview
Online Demographics and Online Use Habit of Indian Women: An OverviewOnline Demographics and Online Use Habit of Indian Women: An Overview
Online Demographics and Online Use Habit of Indian Women: An OverviewSantosh C. Hulagabali
 
Social Media Tools and Mobile Apps for Research and Publishing
Social Media Tools and Mobile Apps for Research and PublishingSocial Media Tools and Mobile Apps for Research and Publishing
Social Media Tools and Mobile Apps for Research and PublishingCheryl Peltier-Davis
 
Web 2.0 Tools for Researchers
Web 2.0 Tools for ResearchersWeb 2.0 Tools for Researchers
Web 2.0 Tools for Researcherstbirdcymru
 
Research Metrics
Research Metrics Research Metrics
Research Metrics Naz Torabi
 

Viewers also liked (8)

Google Docs for Researchers: Creating, Editing, And Sharing Your Work Online
Google Docs for Researchers: Creating, Editing, And Sharing Your Work OnlineGoogle Docs for Researchers: Creating, Editing, And Sharing Your Work Online
Google Docs for Researchers: Creating, Editing, And Sharing Your Work Online
 
Open Access, Open Research, Open Data, Open Science, Open what? #gfm2013
Open Access, Open Research, Open Data, Open Science, Open what? #gfm2013Open Access, Open Research, Open Data, Open Science, Open what? #gfm2013
Open Access, Open Research, Open Data, Open Science, Open what? #gfm2013
 
Online Demographics and Online Use Habit of Indian Women: An Overview
Online Demographics and Online Use Habit of Indian Women: An OverviewOnline Demographics and Online Use Habit of Indian Women: An Overview
Online Demographics and Online Use Habit of Indian Women: An Overview
 
Social Media Tools and Mobile Apps for Research and Publishing
Social Media Tools and Mobile Apps for Research and PublishingSocial Media Tools and Mobile Apps for Research and Publishing
Social Media Tools and Mobile Apps for Research and Publishing
 
10. spreadsheet
10. spreadsheet10. spreadsheet
10. spreadsheet
 
Web 2.0 Tools for Researchers
Web 2.0 Tools for ResearchersWeb 2.0 Tools for Researchers
Web 2.0 Tools for Researchers
 
Teaching with Comics! Tools & Apps
Teaching with Comics! Tools & AppsTeaching with Comics! Tools & Apps
Teaching with Comics! Tools & Apps
 
Research Metrics
Research Metrics Research Metrics
Research Metrics
 

Similar to Research into Open Research Data

Public data archiving: Who does? Who doesn't? What can we do about it?
Public data archiving: Who does?  Who doesn't?  What can we do about it?Public data archiving: Who does?  Who doesn't?  What can we do about it?
Public data archiving: Who does? Who doesn't? What can we do about it?Heather Piwowar
 
Reputation as (dis)incentive
Reputation as (dis)incentiveReputation as (dis)incentive
Reputation as (dis)incentiveHeather Piwowar
 
Public Sharing of Research Datasets: A Pilot Study of Associations
Public Sharing of Research Datasets: A Pilot Study of Associations Public Sharing of Research Datasets: A Pilot Study of Associations
Public Sharing of Research Datasets: A Pilot Study of Associations Heather Piwowar
 
SLA webinar: Open research data needs librarians
SLA webinar: Open research data needs librariansSLA webinar: Open research data needs librarians
SLA webinar: Open research data needs librariansHeather Piwowar
 
Why study Data Sharing? (+ why share your data)
Why study Data Sharing?  (+ why share your data)Why study Data Sharing?  (+ why share your data)
Why study Data Sharing? (+ why share your data)Heather Piwowar
 
RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015William Gunn
 
Public Data Archiving in Ecology and Evolution: How well are we doing?
Public Data Archiving in Ecology and Evolution: How well are we doing?Public Data Archiving in Ecology and Evolution: How well are we doing?
Public Data Archiving in Ecology and Evolution: How well are we doing?Sandra Binning
 
Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014Susanna-Assunta Sansone
 
Publishing perspectives on data management & future directions
Publishing perspectives on data management & future directionsPublishing perspectives on data management & future directions
Publishing perspectives on data management & future directionsARDC
 
Research in the time of Covid: Surveying impacts on Early Career Researchers
Research in the time of Covid: Surveying impacts on Early Career ResearchersResearch in the time of Covid: Surveying impacts on Early Career Researchers
Research in the time of Covid: Surveying impacts on Early Career ResearchersRebecca Grant
 
Nicole Nogoy: GigaScience...how licensing can change the way we do research
Nicole Nogoy: GigaScience...how licensing can change the way we do researchNicole Nogoy: GigaScience...how licensing can change the way we do research
Nicole Nogoy: GigaScience...how licensing can change the way we do researchGigaScience, BGI Hong Kong
 
Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors M...
Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors M...Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors M...
Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors M...GigaScience, BGI Hong Kong
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...GigaScience, BGI Hong Kong
 
ELPUB 2008: A review of journal policies for sharing research data
ELPUB 2008:    A review of journal policies for sharing research dataELPUB 2008:    A review of journal policies for sharing research data
ELPUB 2008: A review of journal policies for sharing research dataHeather Piwowar
 
Scio12 sem web_final
Scio12 sem web_finalScio12 sem web_final
Scio12 sem web_finalKristi Holmes
 
The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...Todd Vision
 
Futures for scholarly journals: a researchers' perspective
Futures for scholarly journals: a researchers' perspectiveFutures for scholarly journals: a researchers' perspective
Futures for scholarly journals: a researchers' perspectiveResearch Information Network
 
The Dryad Digital Repository: Published data as part of the greater data ecos...
The Dryad Digital Repository: Published data as part of the greater data ecos...The Dryad Digital Repository: Published data as part of the greater data ecos...
The Dryad Digital Repository: Published data as part of the greater data ecos...Hilmar Lapp
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsCarole Goble
 
Biomedical Research as Part of the Digital Enterprise
Biomedical Research as Part of the Digital EnterpriseBiomedical Research as Part of the Digital Enterprise
Biomedical Research as Part of the Digital EnterprisePhilip Bourne
 

Similar to Research into Open Research Data (20)

Public data archiving: Who does? Who doesn't? What can we do about it?
Public data archiving: Who does?  Who doesn't?  What can we do about it?Public data archiving: Who does?  Who doesn't?  What can we do about it?
Public data archiving: Who does? Who doesn't? What can we do about it?
 
Reputation as (dis)incentive
Reputation as (dis)incentiveReputation as (dis)incentive
Reputation as (dis)incentive
 
Public Sharing of Research Datasets: A Pilot Study of Associations
Public Sharing of Research Datasets: A Pilot Study of Associations Public Sharing of Research Datasets: A Pilot Study of Associations
Public Sharing of Research Datasets: A Pilot Study of Associations
 
SLA webinar: Open research data needs librarians
SLA webinar: Open research data needs librariansSLA webinar: Open research data needs librarians
SLA webinar: Open research data needs librarians
 
Why study Data Sharing? (+ why share your data)
Why study Data Sharing?  (+ why share your data)Why study Data Sharing?  (+ why share your data)
Why study Data Sharing? (+ why share your data)
 
RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015
 
Public Data Archiving in Ecology and Evolution: How well are we doing?
Public Data Archiving in Ecology and Evolution: How well are we doing?Public Data Archiving in Ecology and Evolution: How well are we doing?
Public Data Archiving in Ecology and Evolution: How well are we doing?
 
Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014
 
Publishing perspectives on data management & future directions
Publishing perspectives on data management & future directionsPublishing perspectives on data management & future directions
Publishing perspectives on data management & future directions
 
Research in the time of Covid: Surveying impacts on Early Career Researchers
Research in the time of Covid: Surveying impacts on Early Career ResearchersResearch in the time of Covid: Surveying impacts on Early Career Researchers
Research in the time of Covid: Surveying impacts on Early Career Researchers
 
Nicole Nogoy: GigaScience...how licensing can change the way we do research
Nicole Nogoy: GigaScience...how licensing can change the way we do researchNicole Nogoy: GigaScience...how licensing can change the way we do research
Nicole Nogoy: GigaScience...how licensing can change the way we do research
 
Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors M...
Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors M...Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors M...
Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors M...
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
 
ELPUB 2008: A review of journal policies for sharing research data
ELPUB 2008:    A review of journal policies for sharing research dataELPUB 2008:    A review of journal policies for sharing research data
ELPUB 2008: A review of journal policies for sharing research data
 
Scio12 sem web_final
Scio12 sem web_finalScio12 sem web_final
Scio12 sem web_final
 
The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...
 
Futures for scholarly journals: a researchers' perspective
Futures for scholarly journals: a researchers' perspectiveFutures for scholarly journals: a researchers' perspective
Futures for scholarly journals: a researchers' perspective
 
The Dryad Digital Repository: Published data as part of the greater data ecos...
The Dryad Digital Repository: Published data as part of the greater data ecos...The Dryad Digital Repository: Published data as part of the greater data ecos...
The Dryad Digital Repository: Published data as part of the greater data ecos...
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trends
 
Biomedical Research as Part of the Digital Enterprise
Biomedical Research as Part of the Digital EnterpriseBiomedical Research as Part of the Digital Enterprise
Biomedical Research as Part of the Digital Enterprise
 

More from Heather Piwowar

Calculating how much your University spends on Open Access--and what to do ab...
Calculating how much your University spends on Open Access--and what to do ab...Calculating how much your University spends on Open Access--and what to do ab...
Calculating how much your University spends on Open Access--and what to do ab...Heather Piwowar
 
How to Calculate OA APC Spend for Your University
How to Calculate OA APC Spend for Your UniversityHow to Calculate OA APC Spend for Your University
How to Calculate OA APC Spend for Your UniversityHeather Piwowar
 
Intro to Managing Serials with Net Cost per Paid Use
Intro to Managing Serials with Net Cost per Paid UseIntro to Managing Serials with Net Cost per Paid Use
Intro to Managing Serials with Net Cost per Paid UseHeather Piwowar
 
The Future of OA: 
The Impact of Open Access on Readership and Subscription ...
 The Future of OA: 
The Impact of Open Access on Readership and Subscription ... The Future of OA: 
The Impact of Open Access on Readership and Subscription ...
The Future of OA: 
The Impact of Open Access on Readership and Subscription ...Heather Piwowar
 
The time has come to talk of... who should own scholarly infrastructure?
 The time has come to talk of... who should own scholarly infrastructure? The time has come to talk of... who should own scholarly infrastructure?
The time has come to talk of... who should own scholarly infrastructure?Heather Piwowar
 
What kinds of open have 
made a difference in scholarly communication infrast...
What kinds of open have 
made a difference in scholarly communication infrast...What kinds of open have 
made a difference in scholarly communication infrast...
What kinds of open have 
made a difference in scholarly communication infrast...Heather Piwowar
 
Data science needs Data and lots of it
Data science needs Data and lots of itData science needs Data and lots of it
Data science needs Data and lots of itHeather Piwowar
 
Impactstory OA week 2017
Impactstory OA week 2017Impactstory OA week 2017
Impactstory OA week 2017Heather Piwowar
 
Software-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learnedSoftware-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learnedHeather Piwowar
 
What's your Impactstory?
What's your Impactstory?What's your Impactstory?
What's your Impactstory?Heather Piwowar
 
capturing the impact of software AAS 2017
capturing the impact of software AAS 2017capturing the impact of software AAS 2017
capturing the impact of software AAS 2017Heather Piwowar
 
Software-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learnedSoftware-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learnedHeather Piwowar
 
submission summary for #WSSSPE Policy session on Credit, Citation, and Impact
submission summary for #WSSSPE Policy session on Credit, Citation, and Impactsubmission summary for #WSSSPE Policy session on Credit, Citation, and Impact
submission summary for #WSSSPE Policy session on Credit, Citation, and ImpactHeather Piwowar
 
Building Skyscrapers with our Scholarship
Building Skyscrapers with our ScholarshipBuilding Skyscrapers with our Scholarship
Building Skyscrapers with our ScholarshipHeather Piwowar
 
Right time, right place, to change the world
Right time, right place, to change the worldRight time, right place, to change the world
Right time, right place, to change the worldHeather Piwowar
 
No more waiting! Tools that work Today to reveal dataset use
No more waiting!  Tools that work Today to reveal dataset useNo more waiting!  Tools that work Today to reveal dataset use
No more waiting! Tools that work Today to reveal dataset useHeather Piwowar
 
Analyzing data about our data
Analyzing data about our dataAnalyzing data about our data
Analyzing data about our dataHeather Piwowar
 

More from Heather Piwowar (20)

Calculating how much your University spends on Open Access--and what to do ab...
Calculating how much your University spends on Open Access--and what to do ab...Calculating how much your University spends on Open Access--and what to do ab...
Calculating how much your University spends on Open Access--and what to do ab...
 
Unsub Lightning Talk
Unsub Lightning TalkUnsub Lightning Talk
Unsub Lightning Talk
 
How to Calculate OA APC Spend for Your University
How to Calculate OA APC Spend for Your UniversityHow to Calculate OA APC Spend for Your University
How to Calculate OA APC Spend for Your University
 
Intro to Managing Serials with Net Cost per Paid Use
Intro to Managing Serials with Net Cost per Paid UseIntro to Managing Serials with Net Cost per Paid Use
Intro to Managing Serials with Net Cost per Paid Use
 
The Future of OA: 
The Impact of Open Access on Readership and Subscription ...
 The Future of OA: 
The Impact of Open Access on Readership and Subscription ... The Future of OA: 
The Impact of Open Access on Readership and Subscription ...
The Future of OA: 
The Impact of Open Access on Readership and Subscription ...
 
The time has come to talk of... who should own scholarly infrastructure?
 The time has come to talk of... who should own scholarly infrastructure? The time has come to talk of... who should own scholarly infrastructure?
The time has come to talk of... who should own scholarly infrastructure?
 
What kinds of open have 
made a difference in scholarly communication infrast...
What kinds of open have 
made a difference in scholarly communication infrast...What kinds of open have 
made a difference in scholarly communication infrast...
What kinds of open have 
made a difference in scholarly communication infrast...
 
Data science needs Data and lots of it
Data science needs Data and lots of itData science needs Data and lots of it
Data science needs Data and lots of it
 
Oadoi and libraries
Oadoi and librariesOadoi and libraries
Oadoi and libraries
 
Impactstory OA week 2017
Impactstory OA week 2017Impactstory OA week 2017
Impactstory OA week 2017
 
Paperbuzz sneak peek
Paperbuzz sneak peekPaperbuzz sneak peek
Paperbuzz sneak peek
 
Software-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learnedSoftware-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learned
 
What's your Impactstory?
What's your Impactstory?What's your Impactstory?
What's your Impactstory?
 
capturing the impact of software AAS 2017
capturing the impact of software AAS 2017capturing the impact of software AAS 2017
capturing the impact of software AAS 2017
 
Software-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learnedSoftware-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learned
 
submission summary for #WSSSPE Policy session on Credit, Citation, and Impact
submission summary for #WSSSPE Policy session on Credit, Citation, and Impactsubmission summary for #WSSSPE Policy session on Credit, Citation, and Impact
submission summary for #WSSSPE Policy session on Credit, Citation, and Impact
 
Building Skyscrapers with our Scholarship
Building Skyscrapers with our ScholarshipBuilding Skyscrapers with our Scholarship
Building Skyscrapers with our Scholarship
 
Right time, right place, to change the world
Right time, right place, to change the worldRight time, right place, to change the world
Right time, right place, to change the world
 
No more waiting! Tools that work Today to reveal dataset use
No more waiting!  Tools that work Today to reveal dataset useNo more waiting!  Tools that work Today to reveal dataset use
No more waiting! Tools that work Today to reveal dataset use
 
Analyzing data about our data
Analyzing data about our dataAnalyzing data about our data
Analyzing data about our data
 

Recently uploaded

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 

Recently uploaded (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 

Research into Open Research Data

  • 1. Open research data Heather Piwowar DataONE postdoc with Dryad and NESCent, UBC @researchremix OA week 2010 University of British Columbia
  • 12.
  • 16. Which data? http://www.flickr.com/photos/paulhami/1020538523//
  • 17. Where? http://www.flickr.com/photos/paulhami/1020538523//
  • 18. With whom? http://www.flickr.com/photos/paulhami/1020538523//
  • 19. When? http://www.flickr.com/photos/paulhami/1020538523//
  • 20. Under what terms? http://www.flickr.com/photos/paulhami/1020538523//
  • 22. Find Organize Document Deidentify Format Ask Submit Answer questions Worry about mistakes being found Worry about data being misinterpreted Worry about being scooped Forgo money and IP and prestige???
  • 24. http://www.flickr.com/photos/johnnyvulkan/381941233/ http://www.flickr.com/photos/tonivc/2283676770/
  • 25. a) policies + expectations - NSF - Joint Data Archiving Policy - BioMed Central - PLoS
  • 26. b) repositories - datatype-based - institution-based - discipline-based - journal-based
  • 27. c) standards - data licenses - data citation - IDs for datasets, people, entities
  • 28. d) part of something bigger - open government data - citizen science - supplemental materials - dataset-based usage metrics - awards, recognition
  • 30. lots of data sharing! http://www.genome.jp/en/db_growth.html
  • 31. but how much isn’t  shared? what isn’t shared? who isn’t sharing it? why not? how much does it matter? what can we do  about it?
  • 32. you can not manage what you do not measure quote: Lord Kelvin http://www.flickr.com/photos/archeon/2941655917/
  • 34. Why is it important? Are we sure?
  • 35. Errors. More than half of all papers contain errors 5‐10% contain errors that change the conclusions Gore et al 1977, Kantoer and Taylor 1994, McGuigan 1995, Hurlbert and White 1993
  • 36. Ok, let’s share on request.
  • 37. Doesn’t work self-reported denying a request in last 3 years trainees self-reported denying a request been denied access to data, materials, code authors “not able to retrieve raw data” not willing to release data 0% 10% 20% 30% 40% Campbell et al. JAMA. 2002. Kyzas et al. J Natl Cancer Inst. 2005. Vogeli et al. Acad Med. 2006. Reidpath et al. Bioethics 2001.
  • 38. Don’t get the email Evangelou et al.  FASEB J.  2006. Wren.  Bioinformatics 2008. Wren et al.  EMBO Rep 2006.
  • 39. Say no want to publish more papers first want exclusive use ensure data confidentiality control avoid cost of preparation 0% 10% 20% 30% 40% 50% Hedstrom. Society of Am Archivists Ann Meeting. 2008.
  • 40. Ask why `Before I send you the data could I ask what you want it for?' `Can you be more explicit, please, about the analyses you have in  mind and what you plan to do with them?' `We'll have to discuss your request with the other coauthors.   Before we do that, I'd like to know your proposed analysis plan.'  `We are not finished using the data, but when we are finished with  it, we would be open to requests for the data.' `Any use of the data other than for the specific purpose laid down  in the contract of collaboration is effectively ruled out.' Reidpath et al. Bioethics 2001.
  • 42. Not efficient. Not fair. Not random: ‐ young ‐ productive Campbell et all 2000
  • 43. Has real costs. Survey of doctoral students and postdocs: 28-50% reported withholding negative effects: • hurt progress of their research, • hurt rate of discovery in their lab/research group, • hurt quality of their relationships with academic scientists, • hurt quality of their education, • hurt level of communication in their lab/research group. Vogeli et al. Acad Med. 2006 Feb; 81(2):128-36
  • 44. Ok, then on a website? No. Urls stop working. Evangelou et al.  FASEB J.  2006. Wren.  Bioinformatics 2008. Wren et al.  EMBO Rep 2006.
  • 45. Ok, in a repository?
  • 46. lots of data sharing! http://www.genome.jp/en/db_growth.html
  • 49.
  • 50. Combined, these full-text portals reach 85% of the articles available through U of Pittsburgh library subscriptions.
  • 51. microarray data http://en.wikipedia.org/wiki/DNA_microarray http://en.wikipedia.org/wiki/Image:Heatmap.png http://commons.wikimedia.org/wiki/File:DNA_double_helix_vertikal.PNG
  • 52.
  • 53. 11,603 studies that created gene expression microarray data
  • 54. Funder Journal Investigator Institution Study Is research data shared after publication?
  • 55. Funder Journal Investigator Institution Study funded by impact years since sector humans? NIH? factor first paper size mice? size of strength of # pubs grant policy impact plants? # citations rank sharing open cancer? plan req’d? access? previously country shared? clinical funded by number of trial? non-NIH? microarray previously reused? number of studies authors published gender year
  • 56. journal data sharing policy “An inherent principle of publication is that others should be able to replicate and build upon the authors' published claims. Therefore, a condition of publication in a Nature journal is that authors are required to make materials, data and associated protocols available in a publicly accessible database …” http://www.nature.com/authors/editorial_policies/availability.html http://www.nature.com/nature/journal/v453/n7197/index.html
  • 58. institution rank Yu et al. BMC medical informatics and decision making (2007) vol. 7 pp. 17
  • 59. funding level PubMed grant lists + NIH grant details
  • 62. and so on... 124 variables
  • 63. 11,603 studies 25% had links from datasets in databases
  • 64. Proportion of articles with shared datasets, by year 0.35 Proportion of articles with datasets found in GEO or ArrayExpress 0.30 0.25 0.20 0.15 Across time 0.10 0.05 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 Year article published
  • 65. What can we do about it?
  • 66. What can we do about it? Funder policies.
  • 67. 19% Piwowar and Chapman. Journal of Informetrics 2010
  • 68. What can we do about it? Journal policies.
  • 69. We looked at data sharing policies within Instruction to Author statements of 70 journals, as they apply to gene expression microarray data. Piwowar and Chapman. ELPUB 2008
  • 70. strength of data sharing policies No applicable policy (43%) Weak policy (24%) should, recommend, request must, but without requiring database accession number Strong policy (33%) must, required, condition of publication requires database accession number
  • 71. High-impact journals tend to have a strong data-sharing policy
  • 72. Articles published in journals with a strong data-sharing policy are more likely to have publicly available datasets
  • 73. What can we do about it? Learn • Learn from those who do it well • Focus on places that need it
  • 74. Proportion of datasets shared 0.0 0.2 0.4 0.6 0.8 1.0 Physiol Genomics PLoS Genet Genome Biol Microbiology PLoS One BMC Genomics Plant Cell Genome Res Eukaryot Cell Appl Environ Microbiol BMC Med Genomics Hum Mol Genet Proc Natl Acad Sci U S A Infect Immun Am J Respir Cell Mol Biol Dev Biol J Bacteriol Mol Endocrinol BMC Cancer Plant Physiol Biol Reprod Blood J Immunol FASEB J Toxicol Sci J Exp Bot Nucleic Acids Res Diabetes Mol Cell Biol Mol Cancer Ther BMC Bioinformatics Stem Cells FEBS Lett J Neurosci Am J Pathol J Biol Chem J Virol OTHER Cancer Res J Clin Endocrinol Metab Plant Mol Biol Clin Cancer Res Genomics Journals Invest Ophthalmol Vis Sci Mol Hum Reprod Carcinogenesis Gene Endocrinology Oncogene Cancer Lett Biochem Biophys Res Commun (Physiological Genomics)
  • 75. Proportion of datasets shared 0.0 0.2 0.4 0.6 0.8 1.0 Stanford University University of Pennsylvania University of Illinois University of California, Los Angeles University of Wisconsin, Madison University of Washington University of California, Davis The University of British Columbia University of California, San Francisco University of Florida University of California, San Diego University of Minnesota, Twin Cities Baylor College of Medicine OTHER Max Planck Gesellschaft Harvard University Duke University Medical Center Yale University Johns Hopkins University University of Pittsburgh (Stanford) Washington University in Saint Louis University of Toronto University of California, Berkeley University of Michigan, Ann Arbor Michigan State University Institutions National Cancer Institute Tokyo Daigaku
  • 76. Proportion of datasets shared 0.0 0.2 0.4 0.6 0.8 1.0 1 101 201 301 401 501 601 701 801 901 1001 1101 1201 1301 rank 1401 1501 1601 1701 1801 1901 Institution
  • 77. Multivariate nonlinear regressions with interactions Odds Ratio 0.25 0.50 1.00 2.00 4.00 8.00 Has journal policy Multivariate nonlinear regressions with interactions Count of R01 & other NIH grants Odds Ratio 0.95 0.25 0.50 1.00 2.00 4.00 8.00 Authors prev GEOAE sharing & OA & microarray creation Has journal policy NO K funding other P funding Count of R01 & or NIH grants 0.95 Authors prev GEOAE sharing & OA & microarray creation NO K Journalfunding funding or P impact Institution high citations & collaboration Journal policy consequences & Journal impact long halflife Journal policy consequences & long halflife Institution high citations NOTcollaboration & animals or mice Instititution is government & NOT higher ed NOT animals or mice Last author num prev pubs & first year pub Large NIH grant Instititution is government & NOT higher ed Humans & cancer NO geo reuse + YES high institution output Last author num prev pubs & first year pub First author num prev pubs & first year pub Large NIH grant Humans & cancer NO geo reuse + YES high institution output First author num prev pubs & first year pub
  • 78. Multivariate nonlinear regressions with interactions Odds Ratio 0.25 0.50 1.00 2.00 4.00 8.00 Has journal policy Multivariate nonlinear regressions with interactions Count of R01 & other NIH grants Odds Ratio 0.95 0.25 0.50 1.00 2.00 4.00 8.00 Authors prev GEOAE sharing & OA & microarray creation Has journal policy NO K funding other P funding Count of R01 & or NIH grants 0.95 Authors prev GEOAE sharing & OA & microarray creation NO K Journalfunding funding or P impact Institution high citations & collaboration Journal policy consequences & Journal impact long halflife Journal policy consequences & long halflife Institution high citations NOTcollaboration & animals or mice Instititution is government & NOT higher ed NOT animals or mice Last author num prev pubs & first year pub Large NIH grant Instititution is government & NOT higher ed Humans & cancer NO geo reuse + YES high institution output Last author num prev pubs & first year pub First author num prev pubs & first year pub Large NIH grant Humans & cancer NO geo reuse + YES high institution output First author num prev pubs & first year pub
  • 79. Multivariate nonlinear regression with interactions Odds Ratio 0.25 0.50 1.00 2.00 4.00 OA journal & previous GEO-AE sharing Amount of NIH funding 0.95 Journal impact factor and policy Higher Ed in USA Cancer & humans
  • 80. Multivariate nonlinear regression with interactions Odds Ratio 0.25 0.50 1.00 2.00 4.00 OA journal & previous GEO-AE sharing Amount of NIH funding 0.95 Journal impact factor and policy Higher Ed in USA Cancer & humans
  • 81. Carrot? http://www.flickr.com/photos/sunrise/35819369/
  • 82. currency of value? Citations.
  • 83. currency of value? Citations. $50! Diamond,Arthur M. What is a Citation Worth?. The Journal of Human Resources (1986) vol. 21 (2) pp. 200-215
  • 84. dataset 85 cancer microarray trials published in 1999-2003, as identified by Ntzani and Ioannidis (2003) citations ISI Web of Science Citation index, citations from 2004-2005 data sharing locations Publisher and lab websites, microarray databases, WayBack Internet Archive, Oncomine statistics Multivariate linear regression
  • 86. ~70%
  • 87. Next? http://www.flickr.com/photos/gatewaystreets/3838452287/
  • 88. Abadie et al. Journal of the American Statistical Association 2010
  • 90. http://upload.wikimedia.org/wikipedia/commons/thumb/e/e6/ Gamma_distribution_pdf.svg/500px-Gamma_distribution_pdf.svg.png
  • 91.
  • 92.
  • 93. #4 We are the culture. Let’s do it.
  • 96. a) in our communities - strengthening policies: - journal, conference, institutional - decision-makers - role-models and educators
  • 97. b) in our tools - measure opinions - measure use - be transparent!
  • 98. c) with our data - share it. - ugly? incomplete? strange? “Flawed, but out there” is a million times better than “perfect, but unattainable” http://sciblogs.co.nz/seeing-data/2010/10/12/the-zen-of-open-data/
  • 99. “Does anyone want your data? That’s hard to predict […] After all, no one ever knocked on your door asking to buy those figurines collecting dust in your cabinet before you listed them on eBay. Your data, too, may simply be awaiting an effective matchmaker.” Got data? Nature Neuroscience (2007)
  • 100. I post my data, code, and statistical scripts: http://researchremix.org Share yours too! http://www.flickr.com/photos/myklroventine/892446624/
  • 101. More info? • OATP oa.data tag  on Connotea, Twi1er • FriendFeed • Mendeley  “data sharing” group • @researchremix  piwowar@zoology.ubc.ca 
  • 102. thank you Todd Vision, Michael Whitlock, Wendy Chapman The open science online community and those who release their articles, datasets and photos openly
  • 103.