BioMed Central's open data initiatives

BioMed Central’s open data
initiatives
Alliance for Permanent Access conference
7th November 2012

Iain Hrynaszkiewicz
Publisher (Open Science), BioMed Central
iain.hrynaszkiewicz@biomedcentral.com
@iainh_z

About BioMed Central
• Launched in 2000, largest global publisher of peer-
reviewed open access journals (>240)
• >136,000 peer-reviewed open access articles published
• Part of Springer Science+Business Media since 2008
• Publish using Creative Commons (CC-BY) licenses
• Non-journal products include ISRCTN database
• Interested in innovation and recognise the growing need
for data sharing and publication
http://blogs.biomedcentral.com/bmcblog/tag/Open-Data/

BioMed Central and open data
• Increasing transparency in scientific research and
scholarly communication is at the core of strategy
• Data are an increasingly integral part of scholarly
communication, with many opportunities for increasing
the pace of knowledge discovery
• Publishers, particularly open access publishers, are well-
placed to share information across domain boundaries
http://www.biomedcentral.com/about/access

“By ‘open data’ BioMed Central means that these data are freely available on the public
internet permitting any user to download, copy, analyse, re-process, pass them to
software or use them for any other purpose without financial, legal, or technical
barriers other than those inseparable from gaining access to the internet itself. BioMed
Central encourages the use of fully open formats wherever possible.”

BioMed Central open data initiatives
• Data journals and article types
• Open Data Award
• Data hosting, citation, deposition and linking
• Lab notebook-journal integration (LabArchives)
• Data licensing
• Guidance and best practice e.g. human subjects –
confidentiality and consent
• Data formats and standards – efficient reuse
• Facilitation of data/text mining research

Problem: Lack of credit/recognition for
data sharing and publication
• In science credit is everything but incentives for data
publication are still emerging
• Datasets are not generally as discoverable and
citable as journal articles – yet
• Requirements for data sharing are field/location-
specific
• Need more empirical evidence of the benefits of data
publication for individual scientists

Solution #1: Journals and article types
enabling data publication
Data notes: “[B]riefly describe a biomedical data
set or database, with the data being readily
accessible and attributed to a source”
http://bit.ly/y3Jb3b
Research: E.g. The International Stroke Trial
database
http://www.trialsjournal.com/content/12/1/101

Data notes: “[E]xceptional datasets deposited
in our GigaScience repository that have been
selected for further peer review”
http://bit.ly/yPBsAA

Solution #2: Open Data Award

“We ... recognize
researchers who
have ... have
demonstrated
leadership in the
sharing,
standardization,
publication, or re-use of
biomedical research
http://www.biomedcentral.com/researchawards/opendata
data.”

Solution #3: Enable and
encourage/require data citation
“References
...
Only articles, datasets and abstracts that have been published or
are in press, or are available through public e-print/preprint servers,
may be cited
…
“Dataset with persistent identifier
Zheng, L-Y; Guo, X-S; He, B; Sun, L-J; Peng, Y; Dong, S-S; Liu, T-
F; Jiang, S; Ramachandran, S; Liu, C-M; Jing, H-C (2011): Genome
data from sweet and grain sorghum (Sorghum bicolor).
GigaScience. http://dx.doi.org/10.5524/100012."

http://blogs.biomedcentral.com/bmcblog/2012/01/19/citing-and-linking-dat

Problem: Where can data be stored –
permanently?
• Publishers not best placed to run repositories for long
term preservation of large datasets
• Mirrors of publisher content not able to accept
arbitrary amounts of additional data
• Many data repositories exist but most are
domain/location specific and there are many different
types of funding model, license agreement and
persistent identifiers in use

Solution #1: Journal with integrated database

Editor-in-Chief: Editor: Assistant Editor:
Laurie Goodman, BGI (USA) Scott Edmunds, BGI (China) Alexandra Basford, BGI (China)

GigaScience publishes ‘big-
data’ studies from the entire
spectrum of life sciences

Benefits
• Novel publishing format -
manuscript publication and
data hosting
• Assignment of data DOIs
allows separate data citation
• The BGI is covering all APCs
for the first year after
launch

www.gigasciencejournal.
com www.biomedcentral.c

GigaDB is a new database integrated with the GigaScience journal to meet the needs of a new generation of biological
and biomedical research as it enters the era of “big-data”… (see more)

Anatomy of a GigaScience Publication
Idea

Study

Metadata

Data

Analysis

Answer

Solution #2: Comprehensive author
information on available data repositories

http://datacite.org/repolist

http://www.biomedcentral.com/about/su

Solution #3: Research on repositories

http://publicationethics.org/files/u661/EthicalEditing_Autumn2012_final.pdf

We are looking for
repositories with interests
in clinical research data –
can you help?

Problem: Data are not consistently
linked to publications
• Data deposition policies are not established in all
fields
• Even where they are links/accession numbers tend to
be inconsistently presented and rarely cited
• Researchers may, independently of journal
requirements, deposit data in repositories
• A missed opportunity to enhance the literature

Solution #1: ‘Availability of supporting
data’ article section
• A tool to put data deposition policies – encouraged or
mandated – into practice
• Provides links in a consistent place within an article to
supporting data, regardless of the location or format
of the data
• Data must be permanently available (DOI or
equivalent)
• ~50 journals including GigaScience, BMC series

http://www.biomedcentral.com/about/supportingdata

Availability of supporting data

BMC Res Notes 2012, 5:21 http://www.biomedcentral.com/1756-0500/5/21/

GigaScience 2012, 1:3 http://www.gigasciencejournal.com/content/1/1/3

Solution #3: Lab notebook integration
• BMC authors entitled to LabArchives’ (
http://www.labarchives.com/bmc) online lab notebook
with 100Mb of free storage
• Features include:
- Data publishing with DOIs assignment
- Citable, linkable data supporting publications
- Reusable/integrate-able data with CC0 waiver
- Integrated manuscript submission to BMC journals
- Additional free storage (standard is 25Mb)
http://blogs.openaccesscentral.com/blogs/bmcblog/entry/labarchives_and_biomed_central_a

24 Oct 2012
Open data
partnership leads to
release of data
from Nobel Prize-
winning laboratory
for public use
http://www.biomedcentral.com/
presscenter/pressreleases/201
21024c

Problem: Licensing that restricts data
integration and (re)use efficiently

http://pantonprinciples.org/
“[P]eople mis-use copyright licenses on
uncopyrightable materials and data sets: the
confusion of the legal right of attribution in
copyright with the academic and professional
norm of citation of one's efforts. ” John
Wilbanks, VP, Science, Creative Commons,
http://bit.ly/djl5Fa August 11, 2010
“...any restrictions on use should be strongly
resisted and we endorse explicit encouragement
of open sharing.” Schofield et al.: Post-publication
sharing of data and tools. Nature 2009, 461:171.

“The data should be released in standardized
formats without intellectual property constraints. ”
Conway PH, VanLare JM: Improving Access to
Health Care Data: The Open Government
http://www.isitopendata.org/ Strategy. JAMA 2010;304(9):1007-1008.

Why Creative Commons CC0?
• interoperability: CC0 is human and machine-
readable
• universality: CC0 is global and universal and
widely recognized
• simplicity: no need for humans to make, and
respond to, individual data requests – avoids
“attribution stacking” with CC-BY licenses
Schaeffer P: Why does Dryad use CC0?
http://blog.datadryad.org/2011/10/05/why-does-dryad-use-cc0/

http://creativecommons.org/publicdomain/zero/1.0/

Solution: Stakeholder engagement and
community collaboration, leadership

Public consultation on
implementing CC0 for
data published in open
access journals: closes
10 th November 2012
http://blogs.biomedcentral.com/bmcblog/
2012/09/10/put-the-open-in-open-data/

Hrynaszkiewicz I, Cockerill MJ:
Open by default: a proposed
copyright license and waiver
agreement for open access
research and data in peer-
reviewed journals. BMC Research
Notes 2012, 5:494
http://www.biomedcentral.com/1756-
0500/5/494

Implementing CC0 in journals – how?

• Specify a date from which the new license would
apply to data (CC-BY remains for other content)
• Only applies to data submitted to the journal
• Some relatively minor technical and operational
implications
• Cultural change may be the biggest challenge
• Consultation is identifying common concerns, FAQs,
and further definitions and use cases for open data in
journal publications
Hrynaszkiewicz I, Cockerill MJ: Open by default: a proposed copyright
license and waiver agreement for open access research and data in
peer-reviewed journals. BMC Research Notes 2012, 5:494
http://www.biomedcentral.com/1756-0500/5/494

Problem: Lack of guidance, exemplars,
incentives to make date reusable
• Sharing/publishing detailed human subjects data, in
the absence of explicit consent, can potentially
infringe privacy (ethically and legally)
• Data are more (re)usable if published in community
endorsed, standard formats
• Standards and appropriate guidance do not yet exist
in all domains
• Few incentives to follow data standards

Solution #1: Work with journal editors
to produce guidance where it is needed

BMJ 2010;340:c181
Co-published in:
Trials 2010, 11:9

Solution #2: Publish exemplars

Solution #3: Incentivize, promote and
share best practice and standards
http://www.biomedcentral.com/bmcresnotes/series/datasharing http://biosharing.org/standards_view

Problem: Adding value to data of use to
researchers, readers and publishers
• Text/data mining applications often are research
project or research specific and not always attractive
to commercial publishing platforms and their
customers
• Value to the non-expert can be limited
• Makes business model/case challenging for
publishers

http://www.biomedcentral.com/about/datamining/

www.casesdatabase.com –
coming soon

The future...

Image adapted from Gillam
et al: The Healthcare
Singularity and the Age
of Semantic Medicine. In
The Fourth Paradigm (2009)

Questions?

Iain Hrynaszkiewicz
Publisher (Open Science), BioMed Central
iain.hrynaszkiewicz@biomedcentral.com

http://www.mendeley.com/profiles/iain-hrynaszkiewicz/
http://uk.linkedin.com/in/iainhz
@iainh_z

BioMed Central's open data initiatives

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to BioMed Central's open data initiatives

Similar to BioMed Central's open data initiatives (20)

BioMed Central's open data initiatives

Editor's Notes