Successfully reported this slideshow.

The Rise of the Data Journal



1 of 26
1 of 26

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

The Rise of the Data Journal

  1. 1. IASSIST, Cologne, May 2013. Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License. To view a copy of this license, visit ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA. The Rise of the Data Journal IASSIST, Cologne, Germany, May 31, 2013 Marieke Guy & Monica Duke DCC, University of Bath
  2. 2. IASSIST, Cologne, May 2013. Digital Curation Centre (DCC) • Consortium comprising units from the Universities of Bath (UKOLN), Edinburgh (DCC Centre) and Glasgow (HATII) • Launched 1st March 2004 as a national centre for solving challenges in digital curation that could not be tackled by any single institution or discipline • Funded by JISC with additional HEFCE funding from 2011 for targeted institutional development • Support selection of tools: DAF, CARDIO, DMP Online, tools and metadata schema catalogues • Offer advice and support through ‘How to Guides’, ‘Briefing papers’ and Web site
  3. 3. IASSIST, Cologne, May 2013. Support from the DCC •Assess Needs •Make the case •Develop support and services RDM policy development DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team Advocacy with senior management Institutional data catalogues Pilot RDM tools Customised Data Management Plans •…and support policy implementation
  4. 4. IASSIST, Cologne, May 2013. A history of journals and data • Data publication in journals is not new • Earliest academic scientific journal is Journal des sçavans first published on 5 Jan 1665 • Data usually structured • Recently data has grown significantly in volume and more data is digital • Data published in supplementary materials • Supplementary files become too big, journals have begun to stop accepting them e.g. Journal of Neuroscience. • Journals become ‘data dumping grounds’
  5. 5. IASSIST, Cologne, May 2013. Changing face of journals •Image from V.Kiemer, Nature publishing group
  6. 6. IASSIST, Cologne, May 2013. Enhanced publications • A publication enhanced with: • research data (evidence of the research) • extra materials (to illustrate or clarify) • post-publication data (commentaries, ranking) - Driver II • Extra materials: • Audio files, illustrative images and video fragments • GIS or interactive maps • Models, algorithms • Metadata sets • Issues around who creates enhancements, who manages them? •
  7. 7. IASSIST, Cologne, May 2013. Why publish data? “Data that underpin a journal article should be made concurrently available in an accessible database" •Science as an open enterprise Report by Royal Society, June 2012 •Data should be accessible, intelligible, assessable and usable
  8. 8. IASSIST, Cologne, May 2013. BENEFITS • Avoid duplication • Scientific integrity • More collaboration • Better research • More reuse & value • Increased citation 9-30% increase depending on e.g. discipline (Piwowar et al, 2007, 2013 DRIVERS • Public expectations • Government agenda • Funder policy • Institutional policy • EU expectations • Preservation of data Other reasons…
  9. 9. IASSIST, Cologne, May 2013. Data publication benefits •Image from Ubiquity Press
  10. 10. IASSIST, Cologne, May 2013. Research data vs journal articles Research data • Difficult to manage after funding stops • Who has it? • Where is it? • Who does it belong to? • How do I make available? • Where’s the reward? Journal articles • Held by libraries • Well preserved • Impact monitored • Easy to find • Is published • Promotion and tenure processes recognise it •In the past data have been be a “second- class citizen in the scholarly record”
  11. 11. IASSIST, Cologne, May 2013. ODE data publication pyramid
  12. 12. IASSIST, Cologne, May 2013. What is a data paper/article? • A paper that describes a data set – usually stored in a repository • Gives details of the data collection (when, why, how) • Gives details of processing, software, file formats etc. • Has a cover sheet and set of links to archived artefacts • The cover sheet contains familiar elements such at title, date, authors, abstract, persistent identifiers (DOI, ARK) • There is no novel analyses or ground breaking conclusions • Authors could include those involved in data management and processing • The data paper/article format is widened out into a data journal
  13. 13. IASSIST, Cologne, May 2013. How to submit a data paper •Image from Ubiquity press
  14. 14. IASSIST, Cologne, May 2013. Data journal benefits • Academic credit for data scientists and curators • Data likely to be uploaded to a trusted repository • Data available for peer review, integrity of data checked • Data journals helpful for those wanting to reuse the data • Use of data journals shows transparency in the process • Process allows collaboration with others working in data area • The result is more than just a metadata landing page!
  15. 15. IASSIST, Cologne, May 2013. Data journal challenges • Linking issues - problems when linking data to the scientific record e.g. issues with persistence, granularity, attribution • Validation issues - validation of data sets puts a burden on the peer review process. Who reviews the data? • Effort issues - a need to use already submitted metadata, use tried and tested approaches e.g. DOIs • Access issues - need trusted repository (?), open access • Consistency issues - journal workflows vary: ‘engaged submitter’, ‘data dumper’, ‘third party requester’, variations in wording, approach, across disciplines • Responsibility issues - Who is responsible for the data? Data Availability Policy (DAP) Is the data checked?
  16. 16. IASSIST, Cologne, May 2013. Jisc MRD programme projects • Projects looking at innovative research data publication • What policies would achieve greater levels of data sharing, citation and linkages between publications and datasets? • What partnerships between journals, data centres and research organisations are necessary? • How can costs of long term data archiving be met? • What characterises a suitable repository ? • What peer review of data is appropriate before publication? • Projects: Peer REview for Publication & Accreditation of Research data in the Earth sciences (PREPARDE), Journal Research Data Policy Bank (JoRD) and Publisher, Repository and Institutional Metadata Exchange (PRIME)
  17. 17. IASSIST, Cologne, May 2013. PREPARDE project • 12 month JISC-funded activity, 7 partners from academica, publishing and library • Peer REview for Publication & Accreditation of Research data in the Earth sciences (PREPARDE) project • Aiming to capture the processes and procedures required to publish a scientific dataset, ranging from ingestion into a data repository, through to formal publication in a data journal. • Looking at key issues arising in the data publication paradigm: • How does one peer-review a dataset? • How can datasets and journal publications be cross-linked for the benefit of the wider research community?
  18. 18. IASSIST, Cologne, May 2013. PREPARDE list of data journals • Very varied • Lots of earth science, many disciplines missing • Repository criteria vary • Some hold data set too • Majority require OA for article, some for data set too •
  19. 19. IASSIST, Cologne, May 2013. Current data journals • GigaScience – Biomedcentral - publishes 'big-data' studies from the entire spectrum of life and biomedical sciences • Standard manuscript publication linked to a database that hosts all associated data and provides data analysis tools and cloud- computing resources • Journal of Open Archaeology data (JOAD) – Ubiquity press - features peer reviewed data papers describing archaeology datasets with high reuse potential • Work with institutional data repositories to ensure associated data are professionally archived, preserved, and openly available • Geoscience data journal - Wiley-Blackwell
  20. 20. IASSIST, Cologne, May 2013. • Scientific Data - Nature journal - focuses on the life, biomedical and environmental science communities. • Launching in Spring 2014, and open for submissions in Autumn 2013, open-access, online-only publication Forthcoming data journals
  21. 21. IASSIST, Cologne, May 2013. Journal Research Data Policy Bank • JoRD conducted a feasibility study into the scope and shape of a sustainable service to collate and summarise journal policies on research data (JoRD policy bank service) • Carried out by Centre for Research Communications Research at Nottingham University (UK), Research Information Network and Mark Ware Consulting Ltd. • Have carried out literature review, study of journal policies (400 international and national journals) •“Although idea of making scientific data openly accessible for share is widely accepted in the scientific community, the practice confronts serious obstacles. The most immediate of these obstacles is the lack of a consolidated infrastructure for the easy sharing of data.” JoRD
  22. 22. IASSIST, Cologne, May 2013. PRIME project • Publisher, Repository and Institutional Metadata Exchange (PRIME) aims to enable the automated exchange of metadata between publishers and repositories • Partners; UCL, Ubiquity press and Archaeology Data Service • Building on work of build upon the work of three other Jisc- funded projects: DryadUK, REWARD, and SWORD-ARM • Plan to enable the exchange of metadata between UCL Discovery, the ADS, and JOAD • Release a metadata schema, open source plugins and case- studies
  23. 23. IASSIST, Cologne, May 2013. Joint Data Archiving Policy (JDAP) • JDAP describes a requirement by a journal that supporting data be publicly available • Evolved in 2011 field of evolution and has since been adopted by other journals across various disciplines • Journals that adopt JDAP often recommend Dryad as a data repository, however the JDAP initiative is distinct from Dryad (Dryad uses CC0 public domain dedication)•<< Journal >> requires, as a condition for publication, that data supporting the results in the paper should be archived in an appropriate public archive, such as << list of approved archives here <<. Data are important products of the scientific enterprise, and they should be preserved and usable for decades in the future. Authors may elect to have the data publicly available at time of publication, or, if the technology of the archive allows, may opt to embargo access to the data for a period up to a year after publication. Exceptions may be granted at the discretion of the editor, especially for sensitive information such as human subject data or the location of endangered species.
  24. 24. IASSIST, Cologne, May 2013. DCC support • The DCC will continue to support institutions in the data publication area • We will do this by: • Writing briefing papers in this area • Support for stakeholder engagement e.g. workshops • Awareness of pros and cons of the different models • Engaging with institutions on data publication issues
  25. 25. IASSIST, Cologne, May 2013. Final thoughts… • More journals are encouraging publication of data in some way • Often data are become the focus of the publication alongside a supporting narrative • Is there a role for the data journal when traditional journals also coming on board? What will be cited? • Mixed market – lots of different approaches working along sided each other • Changing landscape offers opportunities and challenges for the publisher, author and data manager • Collaboration & communication the most effective way forward e.g. The ‘Now and Future of Data Publishing meeting’
  26. 26. IASSIST, Cologne, May 2013. Thanks - any questions? Acknowledgements: Thanks to Sarah Callaghan and Angus Whyte (PREPARDE project) , Brian Hole (Ubiquity Press) & presenters from IDCC data publishing workshop for help with slides

Editor's Notes

  •   Science about reproducibility – if we don ’ t have the data we can ’ t do that Internet allows us to link to things easily Science you want a fixed thing, still have problems when linking data to the scientific record – data persistence, data and metadata quality, attribution and credit for data producers
  • Maybe aware of history of journals – historically data published in journals – data grown in volume, extent to which digital – less in paper Contained scientific material: obituaries, church history, &amp; legal reports. Philosophical Transactions of the Royal Society first journal in the world exclusively devoted to science ( 6 March 1665) Journals have always published data
  • Setting the scene
  • Opporunities for data exchange (ODE) – The Data Publications Pyramid illustrates the most common ways to make data accessible. Research data comes in many different manifestation forms. Publications have always contained data, usually in a very condensed, processed and summarised way via graphs, tables and illustrations. At the other end of the spectrum is raw data and original data sets which too often remain unaccessible on people&apos;s computers, hard disks or in drawers. Many authors add their underlying research data in supplements to journal articles. In disciplines with community supported data archives (examples are Genbank, World Protein Database and Pangaea) researchers can deposit their data in a safe and reKiable way and publishers can ensure persistent links between the data and related publications.
  • Challenge that data journals address Turning supplementary material into something that can be mined
  • Better example – journal of open psychology data – link with DANS Address issues around fraud in Netherlands Giga science – no article processing costs Rapid peer review Our insistence on fast and thorough peer review enables us to process manuscripts quickly; we aim to reach initial decisions within 6 weeks. Rapid publication Following the acceptance of a manuscript, it is published, with final citation details, as a provisional PDF file with minimal delay (subject to formatting checks, copy-editing and author verification). Fully formatted versions of the article replace the accepted manuscript within 4 weeks. Open access All articles published in GigaScience are open access (freely available on the journal website, with the copyright retained by the author). Research articles are deposited in at least one widely and internationally-recognized open access repository complying with the NIH Public Access Policy and the Wellcome Trust Open Access Policy. To cover the cost of open access publishing we levy an article-processing charge. High visibility within the field Your work is freely accessible to a global audience. In addition, articles are available through INIST in France and in e-Depot, the National Library of the Netherlands&apos; digital archive of all electronic publications. We are also in discussion with other permanent digital archives including the British Library. Permanence All articles published in GigaScience are archived in a number of safe open access archives so permanent accessibility is assured.
  • ×