The document discusses open research data and factors that influence data sharing. It notes that while data sharing has increased, much research data is still not shared. It examines variables like journal policies, funder policies, researcher practices, and institution reputation that correlate with increased data sharing. Stronger policies from journals and funders have led to more data being publicly available. Areas with good sharing can be learned from to improve sharing in other fields or countries.
"Leaders and Laggards in the preservation of raw biomedical research data" presented at NEDCC 2010, The Tectonics of Digital Curation
A Symposium on the Shifting Preservation and Access Landscape
Thesis Proposal, as presented for dissertation proposal defenseHeather Piwowar
The slides I presented for my PhD proposal defense for my project, "Foundational studies for measuring the impact, prevalence, and patterns of publicly sharing biomedical research data." Dept of Biomedical Informatics, University of Pittsburgh.
Thesis defense, Heather Piwowar, Sharing biomedical research dataHeather Piwowar
Presentation by Heather Piwowar as PhD dissertation defense on March 24, 2010 at the Dept of Biomedical Informatics, U of Pittsburgh. "Foundational studies for
measuring the impact, prevalence, and patterns of publicly sharing biomedical research data." I passed :)
Presented at ASIS&T 2009 in the student awards section. The presentation contains an overview of my dissertation proposal, as 2009 winner of the Thomson Reuters Information Science Doctoral Dissertation Proposal Scholarship, administered by the ASIS&T Information Science Education Committee
Open Access and Property Rights on a Collision Course with ScholarsKimberly Yang
This seminar talk focused on open access as a philosophy directly affecting scholars, academics, and consumers and its tension with property rights (intellectual property, copyright, proprietary databases). These impact our ability (or inability) to search, identify, retrieve, access, and use full-text publications or data relevant to research topics and investigations. It impacts our ability to use others' works (and our own work) in our research, writing, publishing and teaching.
The need to redefine genomic data sharing - moving towards Open Science Oct ...Fiona Nielsen
This presentation was given at the symposium: Genomics for Health and Environment in Nijmegen on Oct 30, 2014
http://www.studiegids.science.ru.nl/2014/science/prospectus/biology_bachelor/course/34732/
The presentation introduces Open Science and Open Access Publishing and discusses these concepts in relation to (human) genomics.
The discussion includes a presentation of the concept behind http://repositive.io, the social enterprise software platform which was spun out of the DNAdigest research activities.
As a special edition to the students in the audience who are curious about their future scientific career, I included a couple of slides about my move from academic research to being a social entrepreneur.
DNAdigest works to promote and enable easier and more efficient sharing of genomics data for research. We educate and engage the community about the hurdles and dilemmas for data sharing as faced from the perspective of stakeholders in academia, industry and patient communities. As part of our work we are working with our community and supporters to prototype new mechanisms and concepts for data sharing and data access.
Please visit our website to learn more about our activities and events: http://DNAdigest.org
Follow us on twitter: @DNAdigest
Laurie Goodman on "Overcoming Hurdles to Data Publication" for the Alan Turing Institute Symposium on Reproducibility for Data-Intensive Research, Oxford, 7th April 2016.
"Leaders and Laggards in the preservation of raw biomedical research data" presented at NEDCC 2010, The Tectonics of Digital Curation
A Symposium on the Shifting Preservation and Access Landscape
Thesis Proposal, as presented for dissertation proposal defenseHeather Piwowar
The slides I presented for my PhD proposal defense for my project, "Foundational studies for measuring the impact, prevalence, and patterns of publicly sharing biomedical research data." Dept of Biomedical Informatics, University of Pittsburgh.
Thesis defense, Heather Piwowar, Sharing biomedical research dataHeather Piwowar
Presentation by Heather Piwowar as PhD dissertation defense on March 24, 2010 at the Dept of Biomedical Informatics, U of Pittsburgh. "Foundational studies for
measuring the impact, prevalence, and patterns of publicly sharing biomedical research data." I passed :)
Presented at ASIS&T 2009 in the student awards section. The presentation contains an overview of my dissertation proposal, as 2009 winner of the Thomson Reuters Information Science Doctoral Dissertation Proposal Scholarship, administered by the ASIS&T Information Science Education Committee
Open Access and Property Rights on a Collision Course with ScholarsKimberly Yang
This seminar talk focused on open access as a philosophy directly affecting scholars, academics, and consumers and its tension with property rights (intellectual property, copyright, proprietary databases). These impact our ability (or inability) to search, identify, retrieve, access, and use full-text publications or data relevant to research topics and investigations. It impacts our ability to use others' works (and our own work) in our research, writing, publishing and teaching.
The need to redefine genomic data sharing - moving towards Open Science Oct ...Fiona Nielsen
This presentation was given at the symposium: Genomics for Health and Environment in Nijmegen on Oct 30, 2014
http://www.studiegids.science.ru.nl/2014/science/prospectus/biology_bachelor/course/34732/
The presentation introduces Open Science and Open Access Publishing and discusses these concepts in relation to (human) genomics.
The discussion includes a presentation of the concept behind http://repositive.io, the social enterprise software platform which was spun out of the DNAdigest research activities.
As a special edition to the students in the audience who are curious about their future scientific career, I included a couple of slides about my move from academic research to being a social entrepreneur.
DNAdigest works to promote and enable easier and more efficient sharing of genomics data for research. We educate and engage the community about the hurdles and dilemmas for data sharing as faced from the perspective of stakeholders in academia, industry and patient communities. As part of our work we are working with our community and supporters to prototype new mechanisms and concepts for data sharing and data access.
Please visit our website to learn more about our activities and events: http://DNAdigest.org
Follow us on twitter: @DNAdigest
Laurie Goodman on "Overcoming Hurdles to Data Publication" for the Alan Turing Institute Symposium on Reproducibility for Data-Intensive Research, Oxford, 7th April 2016.
Open Access, Open Research, Open Data, Open Science, Open what? #gfm2013Christian Heise
Drawing from a quick overview of the recent discourses on Open Access, Open Research and Open Science we will challenge the all to often unspecific and generalized notion of "Openness". What does it mean to be open in contrast to being closed? On activist level the Open Knowledge Foundation proposed an 'Open Definition' which lists eleven criteria for openness. By discussing this definition we intend to outline some controversial issues in the current struggle for openness.
Online Demographics and Online Use Habit of Indian Women: An OverviewSantosh C. Hulagabali
The presentation, titled 'Online Demographics and Online Use Habit of Indian Women: An Overview', attempts to bring Indian women netizens in limelight. It is presumed and apprehended, commonly, that the men have a monopoly over access to Internet in India. However, the research findings reveal that the women using Internet, in particular and technology in general, are increasing exponentially. On the other hand, the Indian women are considerably making their identity felt in as software professionals, content writers, transcriptions, online instructors, editors, information scientists etc. are women professional these days. It also traces the detailed findings of the research reports undertaken so far to understand Indian women Internet users and their use habits.
My books- Learning to Go https://gumroad.com/l/learn2go & The 30 Goals Challenge for Teachers http://routledge.com/books/details/9780415735346/
Resources at http://ShellyTerrell.com/Comics
Public Sharing of Research Datasets: A Pilot Study of Associations Heather Piwowar
Presented at ASIST & ISSI Pre-Conference
Symposium on Informetrics and Scientometrics on Nov 7, 2009
http://www.sois.uwm.edu/MetricsPreCon/program.html
Why study Data Sharing? (+ why share your data)Heather Piwowar
A presentation to the DBMI department at the University of Pittsburgh about data sharing and reuse: what this means, why it is important, some of what we’ve learned, and what we still don’t know.
Recomendations for infrastructure and incentives for open science, presented to the Research Data Alliance 6th Plenary. Presenter: William Gunn, Director of Scholarly Communications for Mendeley.
Research in the time of Covid: Surveying impacts on Early Career ResearchersRebecca Grant
Based on a survey of over 4,500 researchers published in the white paper The State of Open Data 2020, this session will explore the impacts of the pandemic on early career reearchers (ECRs), their research practice, and how they interact with open data. We will discuss the specific challenges reported by ECRs, as well as the gaps in training and support that they have identified that would encourage their sharing and reuse of research data.
Presentation at the E-ARMA conference 2021.
Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors M...GigaScience, BGI Hong Kong
Laurie Goodman at the AIBS Changing Practices in Data Pub workshop: Beyond Data Release Mandates - Helping Authors Make Data Available. 3rd December 2014
ELPUB 2008: A review of journal policies for sharing research dataHeather Piwowar
Abstract: Sharing data is a tenet of science, yet commonplace in only a few subdisciplines. Recognizing that a data sharing culture is unlikely to be achieved without policy guidance, some funders and journals have begun to request and require that investigators share their primary datasets with other researchers. The purpose of this study is to understand the current state of data sharing policies within journals, the features of journals that are associated with the strength of their data sharing policies, and whether the strength of data sharing policies impact the observed prevalence of data sharing. Methods: We investigated these relationships with respect to gene expression microarray data in the journals that most often publish studies about this type of data. We measured data sharing prevalence as the proportion of papers with submission links from NCBI’s Gene Expression Omnibus (GEO) database. We conducted univariate and linear multivariate regressions to understand the relationship between the strength of data sharing policy and journal impact factor, journal subdiscipline, journal publisher (academic societies vs. commercial), and publishing model (open vs. closed access). Results: Of the 70 journal policies, 53 made some mention of sharing publication-related data within their Instruction to Author statements. Of the 40 policies with a data sharing policy applicable to gene expression microarrays, we classified 17 as weak and 23 as strong (strong policies required an accession number from database submission prior to publication). Existence of a data sharing policy was associated with the type of journal publisher: 46% of commercial journals had data sharing policy, compared to 82% of journals published by an academic society. All five of the openaccess journals had a data sharing policy. Policy strength was associated with impact factor: the journals with no data sharing policy, a weak policy, and a strong policy had respective median impact factors of 3.6, 4.9, and 6.2. Policy strength was positively associated with measured data sharing submission into the GEO database: the journals with no data sharing policy, a weak policy, and a strong policy had median data sharing prevalence of 8%, 20%, and 25%, respectively. Conclusion: This review and analysis begins to quantify the relationship between journal policies and data sharing outcomes. We hope it contributes to assessing the incentives and initiatives designed to facilitate widespread, responsible, effective data sharing.
Open Access, Open Research, Open Data, Open Science, Open what? #gfm2013Christian Heise
Drawing from a quick overview of the recent discourses on Open Access, Open Research and Open Science we will challenge the all to often unspecific and generalized notion of "Openness". What does it mean to be open in contrast to being closed? On activist level the Open Knowledge Foundation proposed an 'Open Definition' which lists eleven criteria for openness. By discussing this definition we intend to outline some controversial issues in the current struggle for openness.
Online Demographics and Online Use Habit of Indian Women: An OverviewSantosh C. Hulagabali
The presentation, titled 'Online Demographics and Online Use Habit of Indian Women: An Overview', attempts to bring Indian women netizens in limelight. It is presumed and apprehended, commonly, that the men have a monopoly over access to Internet in India. However, the research findings reveal that the women using Internet, in particular and technology in general, are increasing exponentially. On the other hand, the Indian women are considerably making their identity felt in as software professionals, content writers, transcriptions, online instructors, editors, information scientists etc. are women professional these days. It also traces the detailed findings of the research reports undertaken so far to understand Indian women Internet users and their use habits.
My books- Learning to Go https://gumroad.com/l/learn2go & The 30 Goals Challenge for Teachers http://routledge.com/books/details/9780415735346/
Resources at http://ShellyTerrell.com/Comics
Public Sharing of Research Datasets: A Pilot Study of Associations Heather Piwowar
Presented at ASIST & ISSI Pre-Conference
Symposium on Informetrics and Scientometrics on Nov 7, 2009
http://www.sois.uwm.edu/MetricsPreCon/program.html
Why study Data Sharing? (+ why share your data)Heather Piwowar
A presentation to the DBMI department at the University of Pittsburgh about data sharing and reuse: what this means, why it is important, some of what we’ve learned, and what we still don’t know.
Recomendations for infrastructure and incentives for open science, presented to the Research Data Alliance 6th Plenary. Presenter: William Gunn, Director of Scholarly Communications for Mendeley.
Research in the time of Covid: Surveying impacts on Early Career ResearchersRebecca Grant
Based on a survey of over 4,500 researchers published in the white paper The State of Open Data 2020, this session will explore the impacts of the pandemic on early career reearchers (ECRs), their research practice, and how they interact with open data. We will discuss the specific challenges reported by ECRs, as well as the gaps in training and support that they have identified that would encourage their sharing and reuse of research data.
Presentation at the E-ARMA conference 2021.
Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors M...GigaScience, BGI Hong Kong
Laurie Goodman at the AIBS Changing Practices in Data Pub workshop: Beyond Data Release Mandates - Helping Authors Make Data Available. 3rd December 2014
ELPUB 2008: A review of journal policies for sharing research dataHeather Piwowar
Abstract: Sharing data is a tenet of science, yet commonplace in only a few subdisciplines. Recognizing that a data sharing culture is unlikely to be achieved without policy guidance, some funders and journals have begun to request and require that investigators share their primary datasets with other researchers. The purpose of this study is to understand the current state of data sharing policies within journals, the features of journals that are associated with the strength of their data sharing policies, and whether the strength of data sharing policies impact the observed prevalence of data sharing. Methods: We investigated these relationships with respect to gene expression microarray data in the journals that most often publish studies about this type of data. We measured data sharing prevalence as the proportion of papers with submission links from NCBI’s Gene Expression Omnibus (GEO) database. We conducted univariate and linear multivariate regressions to understand the relationship between the strength of data sharing policy and journal impact factor, journal subdiscipline, journal publisher (academic societies vs. commercial), and publishing model (open vs. closed access). Results: Of the 70 journal policies, 53 made some mention of sharing publication-related data within their Instruction to Author statements. Of the 40 policies with a data sharing policy applicable to gene expression microarrays, we classified 17 as weak and 23 as strong (strong policies required an accession number from database submission prior to publication). Existence of a data sharing policy was associated with the type of journal publisher: 46% of commercial journals had data sharing policy, compared to 82% of journals published by an academic society. All five of the openaccess journals had a data sharing policy. Policy strength was associated with impact factor: the journals with no data sharing policy, a weak policy, and a strong policy had respective median impact factors of 3.6, 4.9, and 6.2. Policy strength was positively associated with measured data sharing submission into the GEO database: the journals with no data sharing policy, a weak policy, and a strong policy had median data sharing prevalence of 8%, 20%, and 25%, respectively. Conclusion: This review and analysis begins to quantify the relationship between journal policies and data sharing outcomes. We hope it contributes to assessing the incentives and initiatives designed to facilitate widespread, responsible, effective data sharing.
Presentation by RIN's Director, Michael Jubb, at the Association of Subscription Agents' annual conference in February 2010. http://www.subscription-agents.org/conferences/asa-conference-2010
The Dryad Digital Repository: Published data as part of the greater data ecos...Hilmar Lapp
Presented at the M3 and Biosharing Special Interest Group (SIG) meeting at ISMB 2010 in Boston, MA: http://gensc.org/gc_wiki/index.php/M3_%26_BioSharing
Calculating how much your University spends on Open Access--and what to do ab...Heather Piwowar
#NASIG2020 presentation
Librarians are working hard to understand how much money their university is spending on open access article processing fees (APCs), and how much of what they subscribe to is available as OA. This information is useful when making subscription decisions, considering Read and Publish agreements, rethinking library open access budgets, and designing Institution-wide OA policies.
This session will talk concretely about how to calculate the impact of Open Access on *your* university. It will provide an overview on how to estimate the amount of money spent across a university on Open Access fees: we will discuss underlying concepts behind calculating OA article-processing fee (APC) spend and give an overview of useful data sources, including Unsub.
Follow at @unsub_org
How to Calculate OA APC Spend for Your UniversityHeather Piwowar
Universities are hungry to know how much they spend on Open Access fees. This data is important to planning transformative and read and publish agreements, forming library strategy, and understanding scholarly communication on your campus. Unfortunately, it hasn’t been easy to calculate how much your university is spending on Open Access.
Learn how recent developments in data sources and tools have made this easier during this webinar. We will discuss the underlying concepts behind calculating OA article-processing fee (APC) spend, and provide you with paths to calculate the Open Access fees paid by your institution. ALCTS webinar.
Intro to Managing Serials with Net Cost per Paid UseHeather Piwowar
This webinar will introduce a new metric for evaluating the cost effectiveness of Serials: Net Cost Per Paid Use (NCPPU). NCPPU goes beyond the standard Cost Per Use calculation to exclude free content (OA and back catalog), incorporate ILL costs, and value citation and authorship. ALCTS webinar.
submission summary for #WSSSPE Policy session on Credit, Citation, and ImpactHeather Piwowar
submission summary for #WSSSPE Policy session on Credit, Citation, and Impact
presentation by Heather Piwowar
November 2013
agenda: http://wssspe.researchcomputing.org.uk/
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
37. Doesn’t work
self-reported denying a request in last 3 years
trainees self-reported denying a request
been denied access to data, materials, code
authors “not able to retrieve raw data”
not willing to release data
0% 10% 20% 30% 40%
Campbell et al. JAMA. 2002.
Kyzas et al. J Natl Cancer Inst. 2005.
Vogeli et al. Acad Med. 2006.
Reidpath et al. Bioethics 2001.
38. Don’t get the email
Evangelou et al. FASEB J. 2006.
Wren. Bioinformatics 2008.
Wren et al. EMBO Rep 2006.
39. Say no
want to publish more papers first
want exclusive use
ensure data confidentiality
control
avoid cost of preparation
0% 10% 20% 30% 40% 50%
Hedstrom. Society of Am Archivists Ann Meeting. 2008.
40. Ask why
`Before I send you the data could I ask what you want it for?'
`Can you be more explicit, please, about the analyses you have in
mind and what you plan to do with them?'
`We'll have to discuss your request with the other coauthors.
Before we do that, I'd like to know your proposed analysis plan.'
`We are not finished using the data, but when we are finished with
it, we would be open to requests for the data.'
`Any use of the data other than for the specific purpose laid down
in the contract of collaboration is effectively ruled out.'
Reidpath et al. Bioethics 2001.
43. Has real costs.
Survey of doctoral students and postdocs:
28-50% reported withholding negative effects:
• hurt progress of their research,
• hurt rate of discovery in their lab/research group,
• hurt quality of their relationships with academic
scientists,
• hurt quality of their education,
• hurt level of communication in their lab/research
group.
Vogeli et al. Acad Med. 2006 Feb; 81(2):128-36
44. Ok, then on a website?
No. Urls stop working.
Evangelou et al. FASEB J. 2006.
Wren. Bioinformatics 2008.
Wren et al. EMBO Rep 2006.
54. Funder Journal Investigator Institution Study
Is research data shared
after publication?
55. Funder Journal Investigator Institution Study
funded by impact years since sector humans?
NIH? factor first paper
size mice?
size of strength of # pubs
grant policy impact plants?
# citations rank
sharing open cancer?
plan req’d? access? previously country
shared? clinical
funded by number of trial?
non-NIH? microarray previously
reused? number of
studies authors
published gender
year
56. journal data sharing policy
“An inherent principle of publication is that
others should be able to replicate and build
upon the authors' published claims.
Therefore, a condition of publication
in a Nature journal is that authors are
required to make materials, data and
associated protocols available in a publicly
accessible database …”
http://www.nature.com/authors/editorial_policies/availability.html
http://www.nature.com/nature/journal/v453/n7197/index.html
64. Proportion of articles with shared datasets, by year
0.35
Proportion of articles with datasets found in GEO or ArrayExpress
0.30
0.25
0.20
0.15
Across time
0.10
0.05
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
Year article published
69. We looked at data sharing policies
within Instruction to Author
statements of 70 journals, as they
apply to gene expression microarray
data.
Piwowar and Chapman. ELPUB 2008
70. strength of data sharing policies
No applicable policy (43%)
Weak policy (24%)
should, recommend, request
must, but without requiring database accession number
Strong policy (33%)
must, required, condition of publication
requires database accession number
72. Articles published in journals
with a strong data-sharing policy
are more likely to have publicly
available datasets
73. What can we do
about it?
Learn
• Learn from those who do it well
• Focus on places that need it
74. Proportion of datasets shared
0.0
0.2
0.4
0.6
0.8
1.0
Physiol Genomics
PLoS Genet
Genome Biol
Microbiology
PLoS One
BMC Genomics
Plant Cell
Genome Res
Eukaryot Cell
Appl Environ Microbiol
BMC Med Genomics
Hum Mol Genet
Proc Natl Acad Sci U S A
Infect Immun
Am J Respir Cell Mol Biol
Dev Biol
J Bacteriol
Mol Endocrinol
BMC Cancer
Plant Physiol
Biol Reprod
Blood
J Immunol
FASEB J
Toxicol Sci
J Exp Bot
Nucleic Acids Res
Diabetes
Mol Cell Biol
Mol Cancer Ther
BMC Bioinformatics
Stem Cells
FEBS Lett
J Neurosci
Am J Pathol
J Biol Chem
J Virol
OTHER
Cancer Res
J Clin Endocrinol Metab
Plant Mol Biol
Clin Cancer Res
Genomics
Journals
Invest Ophthalmol Vis Sci
Mol Hum Reprod
Carcinogenesis
Gene
Endocrinology
Oncogene
Cancer Lett
Biochem Biophys Res Commun
(Physiological Genomics)
75. Proportion of datasets shared
0.0
0.2
0.4
0.6
0.8
1.0
Stanford University
University of Pennsylvania
University of Illinois
University of California, Los Angeles
University of Wisconsin, Madison
University of Washington
University of California, Davis
The University of British Columbia
University of California, San Francisco
University of Florida
University of California, San Diego
University of Minnesota, Twin Cities
Baylor College of Medicine
OTHER
Max Planck Gesellschaft
Harvard University
Duke University Medical Center
Yale University
Johns Hopkins University
University of Pittsburgh
(Stanford)
Washington University in Saint Louis
University of Toronto
University of California, Berkeley
University of Michigan, Ann Arbor
Michigan State University
Institutions
National Cancer Institute
Tokyo Daigaku
77. Multivariate nonlinear regressions with interactions
Odds Ratio
0.25 0.50 1.00 2.00 4.00 8.00
Has journal policy
Multivariate nonlinear regressions with interactions
Count of R01 & other NIH grants Odds Ratio
0.95
0.25 0.50 1.00 2.00 4.00 8.00
Authors prev GEOAE sharing & OA & microarray creation
Has journal policy
NO K funding other P funding
Count of R01 & or NIH grants
0.95
Authors prev GEOAE sharing & OA & microarray creation
NO K Journalfunding
funding or P impact
Institution high citations & collaboration
Journal policy consequences & Journal impact long halflife
Journal policy consequences & long halflife
Institution high citations NOTcollaboration & animals or mice
Instititution is government & NOT higher ed
NOT animals or mice
Last author num prev pubs & first year pub
Large NIH grant
Instititution is government & NOT higher ed Humans & cancer
NO geo reuse + YES high institution output
Last author num prev pubs & first year pub
First author num prev pubs & first year pub
Large NIH grant
Humans & cancer
NO geo reuse + YES high institution output
First author num prev pubs & first year pub
78. Multivariate nonlinear regressions with interactions
Odds Ratio
0.25 0.50 1.00 2.00 4.00 8.00
Has journal policy
Multivariate nonlinear regressions with interactions
Count of R01 & other NIH grants Odds Ratio
0.95
0.25 0.50 1.00 2.00 4.00 8.00
Authors prev GEOAE sharing & OA & microarray creation
Has journal policy
NO K funding other P funding
Count of R01 & or NIH grants
0.95
Authors prev GEOAE sharing & OA & microarray creation
NO K Journalfunding
funding or P impact
Institution high citations & collaboration
Journal policy consequences & Journal impact long halflife
Journal policy consequences & long halflife
Institution high citations NOTcollaboration & animals or mice
Instititution is government & NOT higher ed
NOT animals or mice
Last author num prev pubs & first year pub
Large NIH grant
Instititution is government & NOT higher ed Humans & cancer
NO geo reuse + YES high institution output
Last author num prev pubs & first year pub
First author num prev pubs & first year pub
Large NIH grant
Humans & cancer
NO geo reuse + YES high institution output
First author num prev pubs & first year pub
79. Multivariate nonlinear regression with interactions
Odds Ratio
0.25 0.50 1.00 2.00 4.00
OA journal & previous GEO-AE sharing
Amount of NIH funding
0.95
Journal impact factor and policy
Higher Ed in USA
Cancer & humans
80. Multivariate nonlinear regression with interactions
Odds Ratio
0.25 0.50 1.00 2.00 4.00
OA journal & previous GEO-AE sharing
Amount of NIH funding
0.95
Journal impact factor and policy
Higher Ed in USA
Cancer & humans
83. currency of value?
Citations.
$50!
Diamond,Arthur M. What is a Citation Worth?.
The Journal of Human Resources (1986)
vol. 21 (2) pp. 200-215
84. dataset
85 cancer microarray trials published in 1999-2003, as
identified by Ntzani and Ioannidis (2003)
citations
ISI Web of Science Citation index, citations from
2004-2005
data sharing locations
Publisher and lab websites, microarray databases, WayBack
Internet Archive, Oncomine
statistics
Multivariate linear regression
96. a) in our
communities
- strengthening policies:
- journal, conference, institutional
- decision-makers
- role-models and educators
97. b) in our tools
- measure opinions
- measure use
- be transparent!
98. c) with our data
- share it.
- ugly? incomplete? strange?
“Flawed, but out there”
is a million times better than
“perfect, but unattainable”
http://sciblogs.co.nz/seeing-data/2010/10/12/the-zen-of-open-data/
99. “Does anyone want your data?
That’s hard to predict […]
After all, no one ever knocked on your
door asking to buy those figurines
collecting dust in your cabinet before you
listed them on eBay.
Your data, too, may simply be awaiting an
effective matchmaker.”
Got data? Nature Neuroscience (2007)
100. I post my data, code, and statistical scripts:
http://researchremix.org
Share yours too!
http://www.flickr.com/photos/myklroventine/892446624/
101. More info?
• OATP oa.data tag
on Connotea, Twi1er
• FriendFeed
• Mendeley
“data sharing” group
• @researchremix
piwowar@zoology.ubc.ca
102. thank you
Todd Vision,
Michael Whitlock,
Wendy Chapman
The open science online community and those who
release their articles, datasets and photos openly