Perspectives on the Role of Trustworthy Repository Standards in Data Journal Publication


Presentation to IASSIST 2013, in the session Expanding Scholarship: Research Journals and Data Linkages. Describes PREPARDE workshop on repository accreditation for data publication and invites comments on guidelines.

Published in: Technology
  Perspectives on the Role of Trustworthy Repository Standards in Data Journal Publication

    1. 1. Perspectives on the Role ofTrustworthy Repository Standards inData Journal PublicationIASSIST Cologne, 31 May 2013Angus Whyte, Sarah Callaghan,JonathanTedds, Matthew S. Mayernik, and thePREPARDE project
    2. 2. Aims1. Introduce the PREPARDE projectData journal and repository linksData peer-reviewRepository trust accreditation *2. Repository certification backgroundWhy relevant to data journalsStandards developedIssues being discussed1. PREPARDE GuidelinesInput to them from IDCC workshopHopefully also your comments…Q. What should repositories,depositors and journalsexpect from one another?Q. What are use cases fordata journals in socialsciences?Q. What support shouldinstitutions offer?PREPARDE Guidelines
    3. 3. PREPARDE:Peer REview for Publication & Accreditationof Research Data in the Earth sciencesLead Institution: University of LeicesterPartners– British Atmospheric Data Centre (BADC)– US National Centre for Atmospheric Research (NCAR)– California Digital Library (CDL)– Digital Curation Centre (DCC)– University of Reading– Wiley-Blackwell– Faculty of 1000 LtdProject Lead: Dr Jonathan Tedds (University of Leicester, Manager: Dr Sarah Callaghan (BADC, )Length of Project: 12 monthsProject Start Date: 1st July 2012Project End Date: 31st June 2013
    4. 4. 3 main areas of interest (in orange)1. Workflows and cross-linkingbetween journal and repository2. Repository accreditation3. Scientific peer-review of dataMain aim: to put in place thepolicies and procedures neededfor data publication in theGeoscience Data Journal and togeneralise those policies forapplication outside the EarthSciences.PREPARDE topics
    5. 5. Why: Reasons for citing and publishing data•Pressure from (UK) government to make data frompublicly funded research available for free.• Scientists want attribution and credit for their work• Public want to know what the scientists are doing• Research funders want reassurance that they’re gettingvalue for money• Relies on peer-review of science publications (wellestablished) and data (not done yet!)• Allows the wider research community to find and usedatasets, and understand the quality of the data• Extra incentive for scientists to submit their data to datacentres in appropriate formats and with full metadata
    6. 6. • Partnership to develop a mechanism for theformal publication of data in the Open AccessGeoscience Data Journal• GDJ publishes short data articles cross-linkedto and citing datasets that have been depositedin approved data centres and awarded DOIs orother permanent identifier.• A data article describes a dataset collection,processing, software, file formats, etc., withoutthe requirement of novel analyses or groundbreaking conclusions.• the when, how and why data was collectedand what the data-product is.How: Geoscience Data Journal, Wiley-Blackwelland the Royal Meteorological Society
    7. 7. Dataset submission“authors must complete the followingtwo-tiered process:The dataset, along with supportingmetadata, must be formally archived ina Geoscience Data Journal approvedrepository or data centre (andpreferably have been assigned a digitalobject identifier (DOI))…An approved repository is one that iscommonly used by the scientificcommunity it supports, has a formaldata management policy in place, andcan mint a DOI or provide a stable URLand unique identifier for the dataset. “Author GuidelinesCurrent approved repositories are:3TU.DatacentrumBritish Atmospheric Data Centre (BADC)British Oceanographic Data Centre (BODC)CSIRO Data Access PortalEnvironmental Information Data Centre (EIDC)FigshareNational Geoscience Data Centre (NGDC)NERC Earth Observation Data Centre (NEODC)PANGAEAPolar Data Centre (PDC)
    8. 8. Dataset submission… Subject to satisfactory reviews of both datasetand paper, Geoscience Data Journal will publishthe data description paper, along with a link tothe underlying dataset (usually by means of thedatasets DOI).Author Guidelines
    9. 9. BADCData DataBODCDataDataA Journal(Any onlinejournal system)PDF PDF PDF PDF PDFWord processing softwarewith journal templateData Journal(Geoscience Data Journal)html html html html1) Author prepares thepaper using wordprocessing software.3) Reviewer reviews thePDF file against thejournal’s acceptancecriteria.2) Author submitsthe paper as aPDF/Word file.Word processing softwarewith journal template1) Author prepares thedata paper using wordprocessing software andthe dataset usingappropriate tools.2a) Author submitsthe data paper tothe journal.3) Reviewer reviewsthe data paper andthe dataset it pointsto against thejournals acceptancecriteria.The traditional online journal modelOverlay journal model for publishing data2b) Author submitsthe dataset to arepository.DataHow we publish data
    10. 10. GDJ Reviewers consider three sets of questionsReview I – Data description document1. Is the method used to create the data of a highscientific standard?2. Is enough information provided (in metadataalso) to enable the data to be re-used or theexperiment to be repeated?3. Does the document provide a comprehensivedescription of all the data that is there?4. Does the data make an important and uniquecontribution to the meteorological sciences?5. What range of applications to meteorologicalsciences does it have?6. Are all contributors and existing workacknowledged?Peer Review
    11. 11. GDJ Reviewers consider three sets of questionsReview II – Metadata7. Does the metadata establish the ownership ofthe data fairly?8. Is enough information provided (in datadescription document also) to enable the data tobe re-used or the experiment to be repeated?9. Are the data present as described, andaccessible from a registered repository using thesoftware provided?Peer ReviewOverlaps with repositoryappraisal, curationprocesses…and trustcertification?
    12. 12. GDJ Reviewers consider three sets of questionsReview III – Data themselves10. Are the data easily readable, e.g. acrossdifferent platforms such as Linux Mac andWindows?11. Are the data of high quality e.g. are errorlimits and quality statements adequate to assessfitness for purpose, is spatial or temporalcoverage good enough to make the data useable?12. Are the data values physically possible andplausible?13. Are there missing data that mightcompromise its usefulness?Peer ReviewOverlaps with repositoryappraisal, curationprocesses…and trustcertification?
    13. 13. Repository accreditation schemesEuropean Framework for Audit and Certification of DigitalRepositories.Three levels, in increasing trustworthiness:1. Basic Certification is granted to repositories which obtainDSA (Data Seal of Approval) certification;2. Extended Certification is granted to Basic Certificationrepositories which in addition perform astructured, externally reviewed and publicly available self-audit based on ISO 16363 or DIN 31644;3. Formal Certification is granted to repositories which inaddition to Basic Certification obtain full external audit andcertification based on ISO 16363 or equivalent DIN 31644."
    14. 14. DataCentreRepository accreditation – IDCC workshopLink between data paper anddataset is crucial!• How can data journaleditors know a repositoryis trustworthy• How can repositoriesprove they’re trustworthy• What does “trustworthy”mean for data journalpeer review?What guidelines can journals use?• General, cross-disciplinaryand concrete• How far do certificationstandards help
    15. 15. IDCC Workshop backgroundPREPARDE Workshop, Amsterdam 17 Jan. 2013• Research Data Alliance - Working Group onRepository Accreditation• Previous work on integrating data andpublications e.g. DRIVER project andOpportunities for Data Exchange report• Innovation in data integrationE.g. PANGAEA – Elsevier since 2010• New data journals e.g. Journal of OpenPsychology Data (Ubiquity Press, DANS)
    16. 16. Workshop perspectives36 Participants – range of rolesData Centres - UKDA, PANGAEA,BADCLearned Society - Royal SocietyChemistryPublisher - ElsevierInstitutions - UK, US, De, Aus,NL, Ch.National Libraries & Orgs -STM Assoc. DANS (NL), NRF(SA), BL, DCC (UK)Common Ground• Data journals offer reuseand citationBut a passing phase?• Data journals offer creditto data managers• Certification yes, it offersjournals some assurances• Collaboration key as roles& infrastructure evolve
    17. 17. For data publication “trust” means…What certification standards say itis…Collections policyActive curation & mgmtLong-term preservation plansPersistent LinksLanding pagesContinuity planSupport for multi- stage reviewRepository – QA, appraisalPeer – open or closedUser – e.g. DANS studyJournals can plan how to integratemore data into articleDon’t have to look at process detailfor each dataset reviewedData centrescan support policycompliance – track outputs againstgrants (e.g. IDEA Data ComplianceReporting Tool) or data sharingstatements
    18. 18. Cloudier issuesHow do repository accreditation anddata quality relate to each other?What about quality of service todepositors, users?Researchers’ and other stakeholderroles …e.g. advocacy, tool support to gatherprovenance info for publicationearlier?Repository directories – informingdecisions on trust?Indicators of repository value…not coveredin certification?•Funding•Community acceptance•Alt-metrics – access and reuse metricsService level agreements, memorandums ofunderstanding may better meet someneeds than certification
    19. 19. Draft guidelines for journal editorsFor data publication, repositories must:1. Ensure persistence and stability of published datasets2. Have a clear and public indication to preserve the data or haveresponsibility for providing access to the data over the long term3. Assign globally unique persistent IDs to the published datasets andmaintain all URLs associated with those IDs4. Provide persistent, actionable links to enable citations to data5. Ensure that data will be accessible (open data, or info on license terms)6. Actively manage and curate the data in their archive7. Appropriate, formal succession plan, contingency plans/ escrow in casecease8. Provide info on numbers of deposits and frequency of user access
    20. 20. Draft guidelines for journal editorsRepositories can ‘prove’ capabilities to provide persistent access by…1. Certification on any of 3 levels in TrustedDigitalRepository.eu2. Regular or network membership of ICSU World Data System3. Data Centre accreditation via MEDIN4. Contractual arrangement with DataCite managing agent to mint DOIs5. Operate using the OAIS reference model6. Clear intent in mission statement, institutional data mgmt policy, datapreservation plan, collections policy7. Evidence of community take-up e.g. user numbers, service levelagreements, partnership agreements with well established journals, alearned society or equivalent body.Use directory e.g. Re3data for reference on some of above.
    21. 21. Landing page requirementsPermanent IDs for the dataset must resolve to apublicly accessible landing page which must:• be open and human readable (can also beprovided in a format which is machine readable)• describe the data object and include metadata andpermanent identifier• be maintained, even if the data is no longeravailable.Metadata:• Must be human readable, where possible machinereadable (e.g. DataCitemetadata schema)• Freely available for discovery purposes• Repo must develop and implement suitable qualitycontrol measures to ensure the metadata is correct
    22. 22. Social Science Use Cases?Data centres increase reuseFunders, data centres,researchers, learned societiesimprove transparencyData centres, researchers,learned societies, institutionsImprove visibilityData managers Publication route, get creditResearchers Provide snapshot of richcontent, sensitive dataReusers Support meta-analysisMine structured descriptionVisualisation
    23. 23. Thank youAnd please! Tell us what you website: blog:
    24. 24. Peer-review of dataSummary Recommendations fromWorkshop at the British Library, 11 March2013Workshop attendees includedfunders, publishers, repository managersand other interested parties.Draft recommendations put up fordiscussion and feedback from audiencecaptured.Feedback from the community stillwelcome!
    25. 25. Connecting data review with data managementplanning1. All research funders should at least require a “data sharing plan” as part of allfunding proposals, and if a submitted data sharing plan is inadequate, appropriateamendments should be proposed.2. Research organisations should manage research data according to recognisedstandards, providing relevant assurance to funders so that additional technicalrequirements do not need to be assessed as part of the funding application peerreview. (Additional note: Research organisations need to provide adequatetechnical capacity to support the management of the data that the researchersgenerate.)3. Research organisations and funders should ensure that adequate funding isavailable within an award to encourage good data management practice.4. Data sharing plans should indicate how the data can and will be shared andpublishers should refuse to publish papers which do not clearly indicate howunderlying data can be accessed, where appropriate.
    26. 26. 1. Articles and their underlying data or metadata (by the same or other authors)should be multi-directionally linked, with appropriate management for dataversioning.2. Journal editors should check data repository ingest policies to avoid duplication ofeffort , but provide further technical review of important aspects of the datawhere needed. (Additional note: A map of ingest/curation policies of the differentrepositories should be generated.)3. If there is a practical/technical issue with data access (e.g. files don’t open orexist), then the journal should inform the repository of the issue. If there is ascientific issue with the data, then the journal should inform the author in the firstinstance; if the author does not respond adequately to serious issues, then thejournal should inform the institution who should take the appropriate action.Repositories should have a clear policy in place to deal with any feedback.Connecting scientific, technical review and curation
    27. 27. 1. For all articles where the underlying data is being submitted, authors need toprovide adequate methods and software/infrastructure information as part oftheir article. Publishers of these articles should have a clear data peer reviewprocess for authors and referees.2. Publishers should provide simple and, where appropriate, discipline-specific datareview (technical and scientific) checklists as basic guidance for reviewers.3. Authors should clearly state the location of the underlying data. Publishers shouldprovide a list of known trusted repositories or, if necessary, provide advice toauthors and reviewers of alternative suitable repositories for the storage of theirdata.4. For data peer review, the authors (and journal) should ensure that the dataunderpinning the publication, and any tools required to view it, should be fullyaccessible to the referee. The referees and the journal need to then ensureappropriate access is in place following publication.5. Repositories need to provide clear terms and conditions for access, and ensurethat datasets have permanent and unique identifiers.Connecting data review with article review