Science about reproducibility – if we don ’ t have the data we can ’ t do that Internet allows us to link to things easily Science you want a fixed thing, still have problems when linking data to the scientific record – data persistence, data and metadata quality, attribution and credit for data producers
Maybe aware of history of journals – historically data published in journals – data grown in volume, extent to which digital – less in paper Contained scientific material: obituaries, church history, & legal reports. Philosophical Transactions of the Royal Society first journal in the world exclusively devoted to science ( 6 March 1665) Journals have always published data
Setting the scene
Opporunities for data exchange (ODE) – The Data Publications Pyramid illustrates the most common ways to make data accessible. Research data comes in many different manifestation forms. Publications have always contained data, usually in a very condensed, processed and summarised way via graphs, tables and illustrations. At the other end of the spectrum is raw data and original data sets which too often remain unaccessible on people's computers, hard disks or in drawers. Many authors add their underlying research data in supplements to journal articles. In disciplines with community supported data archives (examples are Genbank, World Protein Database and Pangaea) researchers can deposit their data in a safe and reKiable way and publishers can ensure persistent links between the data and related publications.
Challenge that data journals address Turning supplementary material into something that can be mined
Better example – journal of open psychology data – link with DANS Address issues around fraud in Netherlands Giga science – no article processing costs Rapid peer review Our insistence on fast and thorough peer review enables us to process manuscripts quickly; we aim to reach initial decisions within 6 weeks. Rapid publication Following the acceptance of a manuscript, it is published, with final citation details, as a provisional PDF file with minimal delay (subject to formatting checks, copy-editing and author verification). Fully formatted versions of the article replace the accepted manuscript within 4 weeks. Open access All articles published in GigaScience are open access (freely available on the journal website, with the copyright retained by the author). Research articles are deposited in at least one widely and internationally-recognized open access repository complying with the NIH Public Access Policy and the Wellcome Trust Open Access Policy. To cover the cost of open access publishing we levy an article-processing charge. High visibility within the field Your work is freely accessible to a global audience. In addition, articles are available through INIST in France and in e-Depot, the National Library of the Netherlands' digital archive of all electronic publications. We are also in discussion with other permanent digital archives including the British Library. Permanence All articles published in GigaScience are archived in a number of safe open access archives so permanent accessibility is assured.
Transcript of "The Rise of the Data Journal"
IASSIST, Cologne, May 2013.Funded by:This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: ScotlandLicense. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ ; or,(b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.The Rise of the Data JournalIASSIST, Cologne, Germany, May 31, 2013Marieke Guy & Monica DukeDCC, University of Bathm.firstname.lastname@example.org
IASSIST, Cologne, May 2013.Digital Curation Centre (DCC)• Consortium comprising units from the Universities of Bath(UKOLN), Edinburgh (DCC Centre) and Glasgow (HATII)• Launched 1st March 2004 as a national centre for solvingchallenges in digital curation that could not be tackled by anysingle institution or discipline• Funded by JISC with additional HEFCE funding from 2011 fortargeted institutional development• Support selection of tools: DAF, CARDIO, DMP Online, toolsand metadata schema catalogues• Offer advice and support through ‘How to Guides’, ‘Briefingpapers’ and Web site
IASSIST, Cologne, May 2013.Support from the DCC•AssessNeeds•Make the case•DevelopsupportandservicesRDM policydevelopmentDAF & CARDIOassessments Guidance andtrainingWorkflowassessmentDCCsupportteamAdvocacy with seniormanagementInstitutionaldata cataloguesPilot RDMtoolsCustomised DataManagement Plans•…and support policy implementation
IASSIST, Cologne, May 2013.A history of journals and data• Data publication in journals is not new• Earliest academic scientific journal is Journaldes sçavans first published on 5 Jan 1665• Data usually structured• Recently data has grown significantly in volumeand more data is digital• Data published in supplementary materials• Supplementary files become too big, journalshave begun to stop accepting them e.g. Journalof Neuroscience.• Journals become ‘data dumping grounds’
IASSIST, Cologne, May 2013.Changing face of journals•ImagefromV.Kiemer,Naturepublishinggroup
IASSIST, Cologne, May 2013.Enhanced publications• A publication enhanced with:• research data (evidence of the research)• extra materials (to illustrate or clarify)• post-publication data (commentaries, ranking) - Driver II• Extra materials:• Audio files, illustrative images and video fragments• GIS or interactive maps• Models, algorithms• Metadata sets• Issues around who creates enhancements, who manages them?• http://xposre.nl/epfeatures/
IASSIST, Cologne, May 2013.Why publish data?“Data that underpin a journal articleshould be made concurrently availablein an accessible database"•Science as an open enterprise Report by RoyalSociety, June 2012•Data should be accessible, intelligible,assessable and usable
IASSIST, Cologne, May 2013.BENEFITS• Avoid duplication• Scientific integrity• More collaboration• Better research• More reuse & value• Increased citation9-30% increase dependingon e.g. discipline (Piwowar etal, 2007, 2013DRIVERS• Public expectations• Government agenda• Funder policy• Institutional policy• EU expectations• Preservation of dataOther reasons…
IASSIST, Cologne, May 2013.Data publication benefits•Image from Ubiquity Press
IASSIST, Cologne, May 2013.Research data vs journal articlesResearch data• Difficult to manage afterfunding stops• Who has it?• Where is it?• Who does it belong to?• How do I make available?• Where’s the reward?Journal articles• Held by libraries• Well preserved• Impact monitored• Easy to find• Is published• Promotion and tenureprocesses recognise it•In the past data have been be a “second-class citizen in the scholarly record”
IASSIST, Cologne, May 2013.ODE data publication pyramid
IASSIST, Cologne, May 2013.What is a data paper/article?• A paper that describes a data set – usually stored in a repository• Gives details of the data collection (when, why, how)• Gives details of processing, software, file formats etc.• Has a cover sheet and set of links to archived artefacts• The cover sheet contains familiar elements such at title, date,authors, abstract, persistent identifiers (DOI, ARK)• There is no novel analyses or ground breaking conclusions• Authors could include those involved in data management andprocessing• The data paper/article format is widened out into a datajournal
IASSIST, Cologne, May 2013.How to submit a data paper•Image from Ubiquity press
IASSIST, Cologne, May 2013.Data journal benefits• Academic credit for data scientists and curators• Data likely to be uploaded to a trusted repository• Data available for peer review, integrity of data checked• Data journals helpful for those wanting to reuse the data• Use of data journals shows transparency in the process• Process allows collaboration with others working in data area• The result is more than just a metadata landing page!
IASSIST, Cologne, May 2013.Data journal challenges• Linking issues - problems when linking data to the scientificrecord e.g. issues with persistence, granularity, attribution• Validation issues - validation of data sets puts a burden on thepeer review process. Who reviews the data?• Effort issues - a need to use already submitted metadata, usetried and tested approaches e.g. DOIs• Access issues - need trusted repository (?), open access• Consistency issues - journal workflows vary: ‘engaged submitter’,‘data dumper’, ‘third party requester’, variations in wording,approach, across disciplines• Responsibility issues - Who is responsible for the data? DataAvailability Policy (DAP) Is the data checked?
IASSIST, Cologne, May 2013.Jisc MRD programme projects• Projects looking at innovative research data publication• What policies would achieve greater levels of data sharing,citation and linkages between publications and datasets?• What partnerships between journals, data centres andresearch organisations are necessary?• How can costs of long term data archiving be met?• What characterises a suitable repository ?• What peer review of data is appropriate before publication?• Projects: Peer REview for Publication & Accreditation of Researchdata in the Earth sciences (PREPARDE), Journal Research DataPolicy Bank (JoRD) and Publisher, Repository and InstitutionalMetadata Exchange (PRIME)
IASSIST, Cologne, May 2013.PREPARDE project• 12 month JISC-funded activity, 7 partners from academica,publishing and library• Peer REview for Publication & Accreditation of Research datain the Earth sciences (PREPARDE) project• Aiming to capture the processes and procedures required topublish a scientific dataset, ranging from ingestion into a datarepository, through to formal publication in a data journal.• Looking at key issues arising in the data publication paradigm:• How does one peer-review a dataset?• How can datasets and journal publications be cross-linkedfor the benefit of the wider research community?
IASSIST, Cologne, May 2013.PREPARDE list of data journals• Very varied• Lots of earth science, manydisciplines missing• Repository criteria vary• Some hold data set too• Majority require OA forarticle, some for data settoo•http://proj.badc.rl.ac.uk/preparde/blog/DataJournalsList
IASSIST, Cologne, May 2013.Current data journals• GigaScience – Biomedcentral - publishes big-data studies fromthe entire spectrum of life and biomedical sciences• Standard manuscript publication linked to a database that hostsall associated data and provides data analysis tools and cloud-computing resources• Journal of Open Archaeology data (JOAD) – Ubiquity press -features peer reviewed data papers describing archaeologydatasets with high reuse potential• Work with institutional data repositories to ensure associateddata are professionally archived, preserved, and openlyavailable• Geoscience data journal - Wiley-Blackwell
IASSIST, Cologne, May 2013.• Scientific Data - Nature journal - focuses on the life,biomedical and environmental science communities.• Launching in Spring 2014, and open for submissions inAutumn 2013, open-access, online-only publicationForthcoming data journals
IASSIST, Cologne, May 2013.Journal Research Data Policy Bank• JoRD conducted a feasibility study into the scope and shapeof a sustainable service to collate and summarise journalpolicies on research data (JoRD policy bank service)• Carried out by Centre for Research Communications Researchat Nottingham University (UK), Research Information Networkand Mark Ware Consulting Ltd.• Have carried out literature review, study of journal policies(400 international and national journals)•“Although idea of making scientific data openly accessible for shareis widely accepted in the scientific community, the practiceconfronts serious obstacles. The most immediate of these obstaclesis the lack of a consolidated infrastructure for the easy sharing ofdata.” JoRD
IASSIST, Cologne, May 2013.PRIME project• Publisher, Repository and Institutional Metadata Exchange(PRIME) aims to enable the automated exchange of metadatabetween publishers and repositories• Partners; UCL, Ubiquity press and Archaeology Data Service• Building on work of build upon the work of three other Jisc-funded projects: DryadUK, REWARD, and SWORD-ARM• Plan to enable the exchange of metadata between UCLDiscovery, the ADS, and JOAD• Release a metadata schema, open source plugins and case-studies
IASSIST, Cologne, May 2013.Joint Data Archiving Policy (JDAP)• JDAP describes a requirement by a journal that supportingdata be publicly available• Evolved in 2011 field of evolution and has since been adoptedby other journals across various disciplines• Journals that adopt JDAP often recommend Dryad as a datarepository, however the JDAP initiative is distinct from Dryad(Dryad uses CC0 public domain dedication)•<< Journal >> requires, as a condition for publication, that data supporting the results inthe paper should be archived in an appropriate public archive, such as << list of approvedarchives here <<. Data are important products of the scientific enterprise, and theyshould be preserved and usable for decades in the future. Authors may elect to have thedata publicly available at time of publication, or, if the technology of the archive allows,may opt to embargo access to the data for a period up to a year after publication.Exceptions may be granted at the discretion of the editor, especially for sensitiveinformation such as human subject data or the location of endangered species.
IASSIST, Cologne, May 2013.DCC support• The DCC will continue to support institutions in the datapublication area• We will do this by:• Writing briefing papers in this area• Support for stakeholder engagement e.g. workshops• Awareness of pros and cons of the different models• Engaging with institutions on data publication issues
IASSIST, Cologne, May 2013.Final thoughts…• More journals are encouraging publication of data in some way• Often data are become the focus of the publication alongside asupporting narrative• Is there a role for the data journal when traditional journals alsocoming on board? What will be cited?• Mixed market – lots of different approaches working alongsided each other• Changing landscape offers opportunities and challenges for thepublisher, author and data manager• Collaboration & communication the most effective way forwarde.g. The ‘Now and Future of Data Publishing meeting’
IASSIST, Cologne, May 2013.Thanks - any questions?Acknowledgements:Thanks to Sarah Callaghan and Angus Whyte (PREPARDE project), Brian Hole (Ubiquity Press) & presenters from IDCC datapublishing workshop for help with slides
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.