Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
BioMed Central’s open data        initiatives Alliance for Permanent Access conference              7th November 2012     ...
About BioMed Central• Launched in 2000, largest global publisher of peer-  reviewed open access journals (>240)• >136,000 ...
BioMed Central and open data• Increasing transparency in scientific research and  scholarly communication is at the core o...
BioMed Central open data initiatives• Data journals and article types• Open Data Award• Data hosting, citation, deposition...
Problem: Lack of credit/recognition for    data sharing and publication• In science credit is everything but incentives fo...
Solution #1: Journals and article types       enabling data publication              Data notes: “[B]riefly describe a bio...
Solution #2: Open Data Award“We ... recognizeresearchers whohave ... havedemonstratedleadership in thesharing,standardizat...
Solution #3: Enable and        encourage/require data citation“References...Only articles, datasets and abstracts that hav...
Problem: Where can data be stored –           permanently?• Publishers not best placed to run repositories for long  term ...
Solution #1: Journal with integrated database
Editor-in-Chief:           Editor:                    Assistant Editor:Laurie Goodman, BGI (USA) Scott Edmunds, BGI (China...
http://gigadb.org/
GigaDB is a new database integrated with the GigaScience journal to meet the needs of a new generation of biologicaland bi...
http://gigadb.org/
Anatomy of a GigaScience Publication IdeaStudy           Metadata           DataAnalysisAnswer
Solution #2: Comprehensive authorinformation on available data repositories     http://datacite.org/repolist              ...
Solution #3: Research on repositorieshttp://publicationethics.org/files/u661/EthicalEditing_Autumn2012_final.pdfWe are loo...
Problem: Data are not consistently         linked to publications• Data deposition policies are not established in all  fi...
Solution #1: ‘Availability of supporting          data’ article section• A tool to put data deposition policies – encourag...
Availability of supporting dataBMC Res Notes 2012, 5:21 http://www.biomedcentral.com/1756-0500/5/21/GigaScience 2012, 1:3 ...
Solution #3: Lab notebook integration    • BMC authors entitled to LabArchives’ (      http://www.labarchives.com/bmc) onl...
LabArchives partnership
24 Oct 2012Open datapartnership leads torelease of datafrom Nobel Prize-winning laboratoryfor public usehttp://www.biomedc...
Problem: Licensing that restricts data         integration and (re)use efficientlyhttp://pantonprinciples.org/            ...
Why Creative Commons CC0?• interoperability: CC0 is human and machine-  readable• universality: CC0 is global and universa...
Solution: Stakeholder engagement and  community collaboration, leadership
Public consultation onimplementing CC0 fordata published in openaccess journals: closes10 th November 2012http://blogs.bio...
Implementing CC0 in journals – how?• Specify a date from which the new license would  apply to data (CC-BY remains for oth...
Problem: Lack of guidance, exemplars,   incentives to make date reusable• Sharing/publishing detailed human subjects data,...
Solution #1: Work with journal editorsto produce guidance where it is needed                     BMJ 2010;340:c181        ...
Solution #2: Publish exemplars
Solution #2: Publish exemplars
Solution #3: Incentivize, promote and             share best practice and standardshttp://www.biomedcentral.com/bmcresnote...
Problem: Adding value to data of use to  researchers, readers and publishers• Text/data mining applications often are rese...
http://www.biomedcentral.com/about/datamining/
www.casesdatabase.com
www.casesdatabase.com –      coming soon
www.casesdatabase.com –      coming soon
www.casesdatabase.com –      coming soon
The future...Image adapted from Gillamet al: The HealthcareSingularity and the Ageof Semantic Medicine. InThe Fourth Parad...
Questions?            Iain Hrynaszkiewicz     Publisher (Open Science), BioMed Central      iain.hrynaszkiewicz@biomedcent...
Upcoming SlideShare
Loading in …5
×

BioMed Central's open data initiatives

7,738 views

Published on

An overview of most of BioMed Central's open data projects in publishing. Presented at the Alliance for Permanent Access conference, 7th November 2012

  • Be the first to comment

BioMed Central's open data initiatives

  1. 1. BioMed Central’s open data initiatives Alliance for Permanent Access conference 7th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed Central iain.hrynaszkiewicz@biomedcentral.com @iainh_z
  2. 2. About BioMed Central• Launched in 2000, largest global publisher of peer- reviewed open access journals (>240)• >136,000 peer-reviewed open access articles published• Part of Springer Science+Business Media since 2008• Publish using Creative Commons (CC-BY) licenses• Non-journal products include ISRCTN database• Interested in innovation and recognise the growing need for data sharing and publication http://blogs.biomedcentral.com/bmcblog/tag/Open-Data/
  3. 3. BioMed Central and open data• Increasing transparency in scientific research and scholarly communication is at the core of strategy• Data are an increasingly integral part of scholarly communication, with many opportunities for increasing the pace of knowledge discovery• Publishers, particularly open access publishers, are well- placed to share information across domain boundaries http://www.biomedcentral.com/about/access“By ‘open data’ BioMed Central means that these data are freely available on the public internet permitting any user to download, copy, analyse, re-process, pass them to software or use them for any other purpose without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. BioMed Central encourages the use of fully open formats wherever possible.”
  4. 4. BioMed Central open data initiatives• Data journals and article types• Open Data Award• Data hosting, citation, deposition and linking• Lab notebook-journal integration (LabArchives)• Data licensing• Guidance and best practice e.g. human subjects – confidentiality and consent• Data formats and standards – efficient reuse• Facilitation of data/text mining research
  5. 5. Problem: Lack of credit/recognition for data sharing and publication• In science credit is everything but incentives for data publication are still emerging• Datasets are not generally as discoverable and citable as journal articles – yet• Requirements for data sharing are field/location- specific• Need more empirical evidence of the benefits of data publication for individual scientists
  6. 6. Solution #1: Journals and article types enabling data publication Data notes: “[B]riefly describe a biomedical data set or database, with the data being readily accessible and attributed to a source” http://bit.ly/y3Jb3b Research: E.g. The International Stroke Trial database http://www.trialsjournal.com/content/12/1/101 Data notes: “[E]xceptional datasets deposited in our GigaScience repository that have been selected for further peer review” http://bit.ly/yPBsAA
  7. 7. Solution #2: Open Data Award“We ... recognizeresearchers whohave ... havedemonstratedleadership in thesharing,standardization,publication, or re-use ofbiomedical research http://www.biomedcentral.com/researchawards/opendatadata.”
  8. 8. Solution #3: Enable and encourage/require data citation“References...Only articles, datasets and abstracts that have been published orare in press, or are available through public e-print/preprint servers,may be cited…“Dataset with persistent identifierZheng, L-Y; Guo, X-S; He, B; Sun, L-J; Peng, Y; Dong, S-S; Liu, T-F; Jiang, S; Ramachandran, S; Liu, C-M; Jing, H-C (2011): Genomedata from sweet and grain sorghum (Sorghum bicolor).GigaScience. http://dx.doi.org/10.5524/100012." http://blogs.biomedcentral.com/bmcblog/2012/01/19/citing-and-linking-dat
  9. 9. Problem: Where can data be stored – permanently?• Publishers not best placed to run repositories for long term preservation of large datasets• Mirrors of publisher content not able to accept arbitrary amounts of additional data• Many data repositories exist but most are domain/location specific and there are many different types of funding model, license agreement and persistent identifiers in use
  10. 10. Solution #1: Journal with integrated database
  11. 11. Editor-in-Chief: Editor: Assistant Editor:Laurie Goodman, BGI (USA) Scott Edmunds, BGI (China) Alexandra Basford, BGI (China) GigaScience publishes ‘big- data’ studies from the entire spectrum of life sciences Benefits • Novel publishing format - manuscript publication and data hosting • Assignment of data DOIs allows separate data citation • The BGI is covering all APCs for the first year after launch www.gigasciencejournal. com www.biomedcentral.c
  12. 12. http://gigadb.org/
  13. 13. GigaDB is a new database integrated with the GigaScience journal to meet the needs of a new generation of biologicaland biomedical research as it enters the era of “big-data”… (see more)
  14. 14. http://gigadb.org/
  15. 15. Anatomy of a GigaScience Publication IdeaStudy Metadata DataAnalysisAnswer
  16. 16. Solution #2: Comprehensive authorinformation on available data repositories http://datacite.org/repolist http://www.biomedcentral.com/about/su
  17. 17. Solution #3: Research on repositorieshttp://publicationethics.org/files/u661/EthicalEditing_Autumn2012_final.pdfWe are looking forrepositories with interestsin clinical research data –can you help?
  18. 18. Problem: Data are not consistently linked to publications• Data deposition policies are not established in all fields• Even where they are links/accession numbers tend to be inconsistently presented and rarely cited• Researchers may, independently of journal requirements, deposit data in repositories• A missed opportunity to enhance the literature
  19. 19. Solution #1: ‘Availability of supporting data’ article section• A tool to put data deposition policies – encouraged or mandated – into practice• Provides links in a consistent place within an article to supporting data, regardless of the location or format of the data• Data must be permanently available (DOI or equivalent)• ~50 journals including GigaScience, BMC series http://www.biomedcentral.com/about/supportingdata
  20. 20. Availability of supporting dataBMC Res Notes 2012, 5:21 http://www.biomedcentral.com/1756-0500/5/21/GigaScience 2012, 1:3 http://www.gigasciencejournal.com/content/1/1/3
  21. 21. Solution #3: Lab notebook integration • BMC authors entitled to LabArchives’ ( http://www.labarchives.com/bmc) online lab notebook with 100Mb of free storage • Features include: - Data publishing with DOIs assignment - Citable, linkable data supporting publications - Reusable/integrate-able data with CC0 waiver - Integrated manuscript submission to BMC journals - Additional free storage (standard is 25Mb)http://blogs.openaccesscentral.com/blogs/bmcblog/entry/labarchives_and_biomed_central_a
  22. 22. LabArchives partnership
  23. 23. 24 Oct 2012Open datapartnership leads torelease of datafrom Nobel Prize-winning laboratoryfor public usehttp://www.biomedcentral.com/presscenter/pressreleases/20121024c
  24. 24. Problem: Licensing that restricts data integration and (re)use efficientlyhttp://pantonprinciples.org/ “[P]eople mis-use copyright licenses on uncopyrightable materials and data sets: the confusion of the legal right of attribution in copyright with the academic and professional norm of citation of ones efforts. ” John Wilbanks, VP, Science, Creative Commons, http://bit.ly/djl5Fa August 11, 2010“...any restrictions on use should be stronglyresisted and we endorse explicit encouragementof open sharing.” Schofield et al.: Post-publicationsharing of data and tools. Nature 2009, 461:171. “The data should be released in standardized formats without intellectual property constraints. ” Conway PH, VanLare JM: Improving Access to Health Care Data: The Open Government http://www.isitopendata.org/ Strategy. JAMA 2010;304(9):1007-1008.
  25. 25. Why Creative Commons CC0?• interoperability: CC0 is human and machine- readable• universality: CC0 is global and universal and widely recognized• simplicity: no need for humans to make, and respond to, individual data requests – avoids “attribution stacking” with CC-BY licenses Schaeffer P: Why does Dryad use CC0? http://blog.datadryad.org/2011/10/05/why-does-dryad-use-cc0/ http://creativecommons.org/publicdomain/zero/1.0/
  26. 26. Solution: Stakeholder engagement and community collaboration, leadership
  27. 27. Public consultation onimplementing CC0 fordata published in openaccess journals: closes10 th November 2012http://blogs.biomedcentral.com/bmcblog/2012/09/10/put-the-open-in-open-data/Hrynaszkiewicz I, Cockerill MJ:Open by default: a proposedcopyright license and waiveragreement for open accessresearch and data in peer-reviewed journals. BMC ResearchNotes 2012, 5:494 http://www.biomedcentral.com/1756-0500/5/494
  28. 28. Implementing CC0 in journals – how?• Specify a date from which the new license would apply to data (CC-BY remains for other content)• Only applies to data submitted to the journal• Some relatively minor technical and operational implications• Cultural change may be the biggest challenge• Consultation is identifying common concerns, FAQs, and further definitions and use cases for open data in journal publications Hrynaszkiewicz I, Cockerill MJ: Open by default: a proposed copyright license and waiver agreement for open access research and data in peer-reviewed journals. BMC Research Notes 2012, 5:494  http://www.biomedcentral.com/1756-0500/5/494
  29. 29. Problem: Lack of guidance, exemplars, incentives to make date reusable• Sharing/publishing detailed human subjects data, in the absence of explicit consent, can potentially infringe privacy (ethically and legally)• Data are more (re)usable if published in community endorsed, standard formats• Standards and appropriate guidance do not yet exist in all domains• Few incentives to follow data standards
  30. 30. Solution #1: Work with journal editorsto produce guidance where it is needed BMJ 2010;340:c181 Co-published in: Trials 2010, 11:9
  31. 31. Solution #2: Publish exemplars
  32. 32. Solution #2: Publish exemplars
  33. 33. Solution #3: Incentivize, promote and share best practice and standardshttp://www.biomedcentral.com/bmcresnotes/series/datasharing http://biosharing.org/standards_view
  34. 34. Problem: Adding value to data of use to researchers, readers and publishers• Text/data mining applications often are research project or research specific and not always attractive to commercial publishing platforms and their customers• Value to the non-expert can be limited• Makes business model/case challenging for publishers
  35. 35. http://www.biomedcentral.com/about/datamining/
  36. 36. www.casesdatabase.com
  37. 37. www.casesdatabase.com – coming soon
  38. 38. www.casesdatabase.com – coming soon
  39. 39. www.casesdatabase.com – coming soon
  40. 40. The future...Image adapted from Gillamet al: The HealthcareSingularity and the Ageof Semantic Medicine. InThe Fourth Paradigm (2009)
  41. 41. Questions? Iain Hrynaszkiewicz Publisher (Open Science), BioMed Central iain.hrynaszkiewicz@biomedcentral.comhttp://www.mendeley.com/profiles/iain-hrynaszkiewicz/ http://uk.linkedin.com/in/iainhz @iainh_z

×