The Creative Commons license Authors/copyright owners irrevocably grant to anyone the right to use, reproduce or disseminate the research article in its entirety or in part in perpetuity provided that No substantive errors are introduced Authorship attribution is correct Citation details are provided Bibliographic details are unchanged
Electronic version of article is authoritative “ Additional files” not “Supplementary material” Additional files can be central to the reported findings of the paper
Efficient online publication processes can facilitate dataset publication Only a fraction of experimental data sets make it into the literature Many more datasets have the potential to be useful, but do not warrant a traditional publication For certain standard types of data, appropriate databases exist (e.g. nucleotide sequences) But if such databases do not exist, or if further description of the experimental context is required?
Publishers not best placed to run repositories for long term preservation of large datasets Mirrors of publisher content not able to accept arbitrary amounts of additional data Long term preservation presents a challenge with respect to continuity Redundant international mirrors with independent governance and funding could help to reduce risk BGI capable of sequencing ~2000 genomes per day (6 Tb/day = 2Pb/year)
Bioinformaticists have been rapid adopters of cloud computing (as they were of the web) Cloud computing can reduce the barriers to reproducibility Publications can include or refer to necessary datasets and the computational tools that can be fired up to carry out/reproduce the analysis Large datasets can live in cloud – take analysis to the data, rather than vice versa Deposited data sets assigned DOIs, as are data papers
Accession number system in genomics, for example Sometimes deposit data as part of institutional, funder requirements or for personal reasons
Dryad is a mechanism for enforcement of the joint data archiving policy – a community requirement in ecology/evolutionary biology. As part of a publisher’s service provision to these scientific communities we are implementing integration that enables accepted articles to be associated with data sets in Dryad. Dryad meets criteria for permanent linking to articles by assigning DOIs to data sets.
Data preservation and re-use maximises its value but restrictive licensing, IP etc are barriers to effective re-use and sharing
Transcript of "Iain Hrynaszkiewicz - Research Integrity: Integrity of the published record"
Open data and the integrity of the published record – an open access publisher’s perspective JISC research integrity conference, 13 th September 2011 Iain Hrynaszkiewicz Journal Publisher, BioMed Central iain.hrynaszkiewicz @biomedcentral.com
About BioMed Central <ul><li>Launched in 2000 and now the largest global publisher of peer-reviewed open access journals </li></ul><ul><li>Publisher of over 210 open access journals </li></ul><ul><li>>100,000 peer-reviewed open access articles published </li></ul><ul><li>Part of Springer Science+Business Media </li></ul><ul><li>All research articles published under Creative Commons attribution license </li></ul><ul><li>Article processing charges levied for accepted research </li></ul><ul><li>Established institutional membership scheme </li></ul>
The publisher as a service provider <ul><li>Help maximise research impact and pace </li></ul><ul><li>Collect, organise and distribute knowledge </li></ul><ul><li>Preservation and (rapid) dissemination </li></ul><ul><li>Development of innovative content and tools </li></ul><ul><li>Collaboration with the scientific community </li></ul>
BioMed Central and open data <ul><li>Increasing transparency in scientific research and scholarly communication is at the core of strategy </li></ul><ul><li>Data are an increasingly integral part of scholarly communication, with many opportunities for increasing the pace of knowledge discovery </li></ul><ul><li>Publishers, particularly open access publishers, are well-placed to share information across domain boundaries </li></ul><ul><li>http://blogs.openaccesscentral.com/blogs/bmcblog/resourc e/opendatastatementdraft.pdf http://www.biomedcentral.com/info/about/openaccess </li></ul><ul><li>“ We believe the concept of open data, analogous to our policy on open access, goes beyond making data freely accessible. Data should also be free to distribute, copy, re-format, and integrate into new research, without legal impediments. ” </li></ul>
Problem: Lack of credit/recognition for data sharing and publication <ul><li>In science credit is everything </li></ul><ul><li>Data sets are not generally as discoverable as journal articles </li></ul><ul><li>Data sets are not – yet – generally as citable as journal articles </li></ul><ul><li>Requirements for data sharing are field/location-specific </li></ul><ul><li>Some empirical evidence of the benefits still emerging </li></ul>
Solution #1: Innovative journals and article types enabling data publication
Solution #2: Open Data Award “ We ... recognize researchers who have ... have demonstrated leadership in the sharing, standardization, publication, or re-use of biomedical research data.” http://www.biomedcentral.com/researchawards/opendata
Problem: Where can data be stored – permanently? <ul><li>Publishers not best placed to run repositories for long term preservation of large datasets </li></ul><ul><li>Mirrors of publisher content not able to accept arbitrary amounts of additional data </li></ul><ul><li>How do you deal with the “data tsunami”? </li></ul><ul><li>Many data repositories exist but most are domain/location specific and there are many different types of funding model, license agreement and persistent identifiers in use </li></ul>
Solution #1: Integrated (cloud-based) data repository and journal http://www.gigasciencejournal.com “ GigaScience aims to revolutionize data dissemination, organization, understanding, and use. An online open-access open-data journal, we publish 'big-data' studies from the entire spectrum of life and biomedical sciences. To achieve our goals, the journal has a novel publication format: one that links standard manuscript publication with an extensive database that hosts all associated data and provides data analysis tools and cloud-computing resources.”
Solution #2: Comprehensive author information on available data repositories http://datacite.org/repolist http://www.biomedcentral.com/info/about/supportingdata
Problem: Data are not consistently linked to publications <ul><li>Data deposition policies are not established in all fields </li></ul><ul><li>Even where they are links/accession numbers tend to be inconsistently presented </li></ul><ul><li>Researchers may, independently of journal requirements, deposit data in repositories </li></ul><ul><li>A missed opportunity to enhance the literature </li></ul>
Solution #1: ‘Availability of supporting data’ article section <ul><li>A tool for editors, authors and scientific communities to, at the appropriate time, put data deposition policies into practice </li></ul><ul><li>Provides links in a consistent place within an article to supporting data - regardless of the location or format of the data - and to make it clear to readers when they can also access the data as well as the article </li></ul><ul><li>Data must be permanently available (DOI or equivalent) </li></ul><ul><li>Journals include GigaScience , BMC Research Notes </li></ul><ul><li>http://www.biomedcentral.com/info/about/supportingdata </li></ul><ul><li>http://blogs.openaccesscentral.com/blogs/bmcblog/entry/availability_of_supporting_data_crediting </li></ul>
Solution #2: Submission integration with the Dryad repository
Problem: Ambiguous and suboptimal licensing that restricts data (re)use “ The data should be released in standardized formats without intellectual property constraints. ” Conway PH, VanLare JM: Improving Access to Health Care Data: The Open Government Strategy. JAMA 2010; 304 (9):1007-1008. http://pantonprinciples.org/ http://www.isitopendata.org/ “ [P]eople mis-use copyright licenses on uncopyrightable materials and data sets: the confusion of the legal right of attribution in copyright with the academic and professional norm of citation of one's efforts. ” John Wilbanks, VP, Science, Creative Commons, http://bit.ly/djl5Fa August 11, 2010
Solution: Stakeholder engagement and community collaboration, leadership
Problem: Lack of practical guidance and exemplars, to help overcome barriers <ul><li>Online publishing makes data sharing possible, but sharing/publishing detailed human subjects data, in the absence of explicit consent, can potentially infringe privacy (ethically and legally) </li></ul><ul><li>Data are more (re)usable if published in community endorsed, standard formats </li></ul><ul><li>Standards and appropriate guidance do not yet exist in all domains </li></ul>
Solution #1: Work with journal editors to produce guidance where it is needed BMJ 2010;340:c181 Co-published in: Trials 2010, 11 :9
Solution #3: Incentivize, promote and share best practice and standards http://www.biomedcentral.com/bmcresnotes/series/datasharing http://biosharing.org/standards_view
Conclusions <ul><li>Rather than ‘why share data?’, the questions are ‘what’, how’, ‘where’, and ‘when’? </li></ul><ul><li>The future of scholarly communication depends on a commitment to data as well as papers </li></ul><ul><li>Supporting and investing in open data is a service to the scientific community </li></ul><ul><li>We can better serve funders and beneficiaries of scientific with transparency </li></ul>
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.