• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The Open Research Challenge: Peer Review and Publication of Research Data"
 

Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The Open Research Challenge: Peer Review and Publication of Research Data"

on

  • 620 views

http://dlab.berkeley.edu/event/open-research-challenge-peer-review-and-publication-research-data ...

http://dlab.berkeley.edu/event/open-research-challenge-peer-review-and-publication-research-data

A talk by Dr. Jonathan Tedds, Senior Research Fellow, D2K Data to Knowledge, Dept of Health Sciences, University of Leicester.

PI: #BRISSKit www.brisskit.le.ac.uk
PI: #PREPARDE www.le.ac.uk/projects/preparde

The Peer REview for Publication & Accreditation of Research data in the Earth sciences (PREPARDE) project seeks to capture the processes and procedures required to publish a scientific dataset, ranging from ingestion into a data repository, through to formal publication in a data journal. It will also address key issues arising in the data publication paradigm, namely, how does one peer-review a dataset, what criteria are needed for a repository to be considered objectively trustworthy, and how can datasets and journal publications be effectively cross-linked for the benefit of the wider research community.

I will discuss this and alternative approaches to research data management and publishing through examples in astronomy, biomedical and interdisciplinary research including the arts and humanities. Who can help in the long tail of research if lacking established data centers, archives or adequate institutional support? How much can we transfer from the so called “big data” sciences to other settings and where does the institution fit in with all this? What about software?

Publishing research data brings a wide and differing range of challenges for all involved, whatever the discipline. In PREPARDE we also considered the pre and post publication peer review paradigm, as implemented in the F1000 Research Publishing Model for the life sciences. Finally, in an era of truly international research how might we coordinate the many institutional, regional, national and international initiatives – has the time come for an international Research Data Alliance?

Statistics

Views

Total Views
620
Views on SlideShare
613
Embed Views
7

Actions

Likes
0
Downloads
6
Comments
0

1 Embed 7

http://t.co 7

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The Open Research Challenge: Peer Review and Publication of Research Data" Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The Open Research Challenge: Peer Review and Publication of Research Data" Presentation Transcript

    • THE OPEN RESEARCH CHALLENGE: PEER REVIEW AND PUBLICATION OF RESEARCH DATA Dr Jonathan Tedds jat26@le.ac.uk @jtedds Senior Research Fellow, D2K Data to Knowledge Research Group (University of Leicester) PI #PREPARDE http://www.le.ac.uk/projects/preparde
    • http://www.astrogrid.org (April 2008 1st public release)
    • Science as an Open Enterprise Report Why open? • As a first step towards this intelligent openness, data that underpin a journal article should be made concurrently available in an accessible database • We are now on the brink of an achievable aim: for all science literature to be online, for all of the data to be online and for the two to be interoperable. [p.7] • Royal Society June 2012, Science as an Open Enterprise, http://royalsociety.org/policy/projects/science -public-enterprise/report/ • Issues linking data to the scientific record: – Data persistence – Data and metadata quality – Attribution and credit for data producers • Geoffrey Boulton (Edinburgh), Lead author: – “Science has been sleepwalking into crisis of replicability...and of the credibility of science” – “Publishing articles without making the data available is scientific malpractice”
    • Data Reuse: asking new questions Hubble Space Telescope • Papers based upon reuse of archived observations now exceed those based on the use described in the original proposal. – http://archive.stsci.edu/hst/bibliography/pubstat.html • See also work by Piwowar & Vision re life sciences: “Data reuse and the open data citation advantage” – http://peerj.com/preprints/1/
    • Oh, and…. says so :P We are committed to openness in scientific research data to speed up the progress of scientific discovery, create innovation, ensure that the results of scientific research are as widely available as practical, enable transparency in science and engage the public in the scientific process. • To the greatest extent and with the fewest constraints possible publicly funded scientific research data should be open, while at the same time respecting concerns in relation to privacy, safety, security and commercial interests, whilst acknowledging the legitimate concerns of private partners. • Open scientific research data should be easily discoverable, accessible, assessable, intelligible, useable, and wherever possible interoperable to specific quality standards. • To ensure successful adoption by scientific communities, open scientific research data principles will need to be underpinned by an appropriate policy environment, including recognition of researchers fulfilling these principles, and appropriate digital infrastructure.
    • Scale of the problem: who, what, when where….? http://blogs.scientificamerican.com/absolutely-maybe/2013/09/10/opening-a-can-of-data- sharing-worms/ • Timothy Vines and colleagues studied reproducibility of data sets in zoology and changes through time – gathered 516 papers published between 1991 and 2011 – then they tried to track the data down… • Even tracking down the authors was a challenge – Over time a dwindling minority of papers were accompanied by author email addresses that still functioned • only 37% of the data - even from papers in 2011 - were still findable and retrievable – proportion dropped each earlier year • For papers published in 1991 – only 7% of the data could be determined to truly still be in existence and retrievable – few authors could be found, and most of them were reporting that their data were lost or inaccessible
    • This isn’t new… Henry Oldenburg – inveterate correspondent – now think of as scientist • Had idea to publish Philosophical Transactions (1665): o Should be written in vernacular not Latin o Underlying evidence must be concurrently published o Helped propel Europe at the time o Concept of scientific self correction • able to write it's errors • Wrote: “thought fit to employ the [printing] press……..Universal Good of Mankind“ o How do we achieve these ends in the post- Gutenburg era?
    • Science ecosystems (Peter Fox, Rensselaer) • These elements are what enable scientists to explore/ confirm/ deny their research ideas and collaborate! • Abduction as well as induction and deduction Accountability ProofExplanation Justification Verifiability ‘Transparency’ -> Translucency Trust IdentityCitability Integrateability
    • Data as a “public good” (2011) • Public good • Preservation • Discovery • Confidentiality • First use • Recognition • Public funding
    • http://osc.universityofcalifornia.edu/openaccesspolicy/
    • So what do we mean by publishing data? • The familiar: – Supplementary tables via journal or – Archived raw or calibrated facility data – Discipline specific and institutional / national archives • Data under the graph? – In order to reproduce and adapt article analysis • “Research ready” open data – In order to reuse and repurpose – for interdisciplinary researchers, community, business – Ideally peer reviewed?
    • Research data example - level 1: • A typical example from physical sciences (astronomy) distinguishes between broad categories within the research data spectrum: • raw/initially auto-processed data produced at a research facility such as an observatory • typically made publically available in this format after an embargo period of e.g. 1 year • in some cases available immediately - e.g. Swift Gamma Ray Burst satellite
    • "research ready" processed data which has been fully calibrated, combined and cleaned/annotated • often produced by individuals or collaborations • rarely available to anyone outside the collaboration except upon request/collaboration • needed for re-analysis or reuse for science unless you have detailed sub domain specific knowledge and detailed contextual information to reproduce from raw • considered to enable a competitive advantage for producers • may be produced by dedicated data scientists on behalf of the community for major survey/missions e.g. ESA XMM-Survey Science Centre (Leicester), NASA, NCAR… Research data example – level 2
    • output dataset – following detailed analysis of research ready datasets • forms the data under the graph in a journal publication following analysis of research ready datasets • Might be available as a Table via journal, CDS etc • May not be available outside the collaboration except upon request/collaboration • may well generate future additional samples and papers for the owning collaboration on top of the original • other researchers may request the data for their own research but may not get it! Research data example – level 3
    • ….and STOP! • Next project – Proposal long since written – Probably already underway… • Feel free to email ME if you would like to work on an idea using this dataset or code – As long as I’m a co-author on the paper! – You have to go through me to find out what you really need to know to reuse the data/code
    • • published catalogue type representation of published output dataset • NOT a “data paper”….but could be • optional in many cases, mandatory for most major surveys • usually made available via project specific online resource • may be provided as table of parameters based on research ready dataset, usually linked from and associated with a journal • specifically produced in order for the wider community to reuse (cite!) and repurpose if wanted • The well-known Sloan Digital Sky Survey is a classic example or more recently the 2XMMi X-ray catalogue I have a close involvement with (largest X-ray survey of the sky). Research data example – level 4a
    • http://adsabs.harvard.edu/abs/2013arXiv1302.5329E e.g.
    • data paper describing and linking to output dataset(s) Research data example – level 4b Live Data paper! Dataset citation is first thing in the paper and is also included in reference list (to take advantage of citation count systems) DOI: 10.1002/gdj3.2
    • Data Publications • Initiatives for open publication of data, datasets, data papers and open peer review • RDMF8 ‘Engaging with the publishers’: http://www.dcc.ac.uk/events/research- data-management-forum-rdmf/rdmf8- engaging-publishers • Particularly Rebecca Lawrence on peer review policies • Earth System Science Data: http://www.earth-system-science- data.net/ • Pensoft Data Publishing Policies and Guidelines for Biodiversity Data http://www.pensoft.net/news.php?n=59 Slide: Simon Hodson (Jisc / CODATA)
    • 21 ODE Data Publication Pyramid: Pubs Supps Data Archives Data on Disks and in Drawers (1) Top of the pyramid is stable but small (2) Risk that supplements to articles turn into Data Dumping places (3) Too many disciplines lack a community endorsed data archive (4) Estimates are that at least 75 % of research data is never made openly avaiable 21
    • From Mayernik et al (in prep) – PREPARDE project - Most cited Bulletin of the American Meteorological Society (BAMS) articles. Data from Web of Science, gathered on June 11, 2013 Article Data paper? Citations Article details 1 Yes 10,113 Kalnay, E; et al. The NCEP/NCAR 40-year reanalysis project, 1996. 2 No 3,201 Torrence, C; Compo, GP. A practical guide to wavelet analysis, 1998. 3 No 2,367 Mantua, NJ; et al. A Pacific interdecadal climate oscillation with impacts on salmon production, 1997. 4 Yes 1,987 Kistler, R; et al. The NCEP-NCAR 50-year reanalysis: Monthly means CD-ROM and documentation, 2001. 5 Yes 1,791 Xie, PP; Arkin, PA. Global precipitation: A 17-year monthly analysis based on gauge observations, satellite estimates, and numerical model outputs, 1997. 6 Yes 1,448 Kanamitsu, M; et al. NCEP-DOE AMIP-II reanalysis (R-2), 2002. 7 No 1,014 Baldocchi, D; et al. FLUXNET: A new tool to study the temporal and spatial variability of ecosystem-scale carbon dioxide, water vapor, and energy flux densities, 2001. 8 Yes 902 Rossow, WB; Schiffer, RA. Advances in understanding clouds from ISCCP, 1999. 9 Yes 900 Rossow, WB; Schiffer, RA. ISCCP cloud data products, 1991. 10 No 877 Hess, M; Koepke, P; Schult, I. Optical properties of aerosols and clouds: The software package OPAC, 1998. 11 No 815 Willmott, CJ. Some comments on the evaluation of model performance, 1982. 12 No 815 Trenberth, KE. The definition of El Nino, 1997. 13 Yes 785 Woodruff, SD; Slutz, RJ; et al. A comprehensive ocean- atmosphere data set, 1987. 14 Yes 776 Meehl, G.A.; et al. The WCRP CMIP3 multimodel dataset - A
    • It’s a long road…. What do researchers need to make this all possible? – Incentives - citations, promotion, support • long way to go – Institutional and funder policy framework • mostly there now? – Appropriate discipline specific community centres of expertise • rare, mostly limited to big science niches or very broad but may be poorly sustained – Institutional support services for the basics • pilots to date – Software tools that are open and can be adapted • on the way
    • PREPARDE: Peer REview for Publication & Accreditation of Research Data in the Earth sciences Jonathan Tedds (Leicester), Sarah Callaghan (BADC), Fiona Murphy (Wiley), Rebecca Lawrence (F1000R), Geraldine Stoneham (MRC), Elizabeth Newbold (BL), Rachel Kotarski (BL), Matthew Mayernik (NCAR), John Kunze, Carly Strasser (CDL), Angus Whyte (DCC), Becca Wilson (Leicester), Simon Hodson (Jisc) and #PREPARDE project team + Geraldine Clement Stoneham (MRC), Elizabeth Newbold, Rachel Kotarski (BL) on data peer review http://www.le.ac.uk/projects/preparde
    • • Partnership formed between Royal Meteorological Society and academic publishers Wiley Blackwell to develop a mechanism for the formal publication of data in the Open Access Geoscience Data Journal • GDJ publishes short data articles cross-linked to, and citing, datasets that have been deposited in approved data centres and awarded DOIs (or other permanent identifier). • A data article describes a dataset, giving details of its collection, processing, software, file formats, etc., without the requirement of novel analyses or ground breaking conclusions. • the when, how and why data was collected and what the data-product is. http://www.geosciencedata.com/ PREPARDE key use case: Geoscience Data Journal, Wiley-Blackwell and the Royal Meteorological Society
    • • capture the processes and procedures required to publish a scientific dataset – ingestion into a data repository – formal publication in a data journal • address key issues in data publication – how to peer-review a dataset? – what criteria are needed for a repository to be considered objectively trustworthy? – how can datasets and journal publications be effectively cross-linked for the benefit of the wider research community? • PREPARDE team includes key expertise in – Research – academic publishing – data management • Earth Sciences focus but produce general guidelines applicable to a wide range of scientific disciplines and data publication types incl life sciences (F1000R) PREPARDE: Peer REview for Publication & Accreditation of Research Data in the Earth sciences http://www.le.ac.uk/projects/preparde
    • BADC Data Data BODC DataData A Journal (Any online journal system) PDF PDF PDF PDF PDF Word processing software with journal template Data Journal (Geoscience Data Journal) html html html html 1) Author prepares the paper using word processing software. 3) Reviewer reviews the PDF file against the journal’s acceptance criteria. 2) Author submits the paper as a PDF/Word file. Word processing software with journal template 1) Author prepares the data paper using word processing software and the dataset using appropriate tools. 2a) Author submits the data paper to the journal. 3) Reviewer reviews the data paper and the dataset it points to against the journals acceptance criteria. The traditional online journal model Overlay journal model for publishing data 2b) Author submits the dataset to a repository. Data How: to publish data in GDJ
    • Live Data paper! Dataset citation is first thing in the paper and is also included in reference list (to take advantage of citation count systems) DOI: 10.1002/gdj3.2
    • Data Centre Trust: Repository accreditation • Link between data paper and dataset is crucial! • How do data journal editors know a repository is trustworthy? • How can repositories prove they’re trustworthy? • What makes a repository trustworthy? • Many things: mission, processes, expertise, workflows, history, systems, documentation, … • Assessing trustworthiness requires assessing the entire repository workflow • PREPARDE / IDCC13 Workshop – report out soon! • Peer review of data is implicitly peer review of repository And what does “trustworthy” mean, when you get right down to it?
    • DataCite Repository List • working document • initated via a collaboration between the British Library, BioMed Central and the Digital Curation Centre • aims to capture growing number of repositories for research data • provided for information purposes only: • DataCite provides no endorsements of quality or suitability of the repositories • encourage community participation in developing this resource http://www.datacite.org/repolist/
    • Dryad Data Repository JDAP: Joint Data Archiving Policy  Joint Data Archiving Policy: http://datadryad.org/jdap  Joint declarations, Feb 2010, in American Naturalist, Evolution, the Journal of Evolutionary Biology, Molecular Ecology, Heredity, and other key journals in evolution and ecology: http://www.journals.uchicago.edu/doi/full/10.1086/650340  This journal requires, as a condition for publication, that data supporting the results in the paper should be archived in an appropriate public archive, such as GenBank, TreeBASE, Dryad, or the Knowledge Network for Biocomplexity.  Allows embargos of up to one year; allows exceptions for, e.g., sensitive information such as human subject data or the location of endangered species.  ‘Data that have an established standard repository, such as DNA sequences, should continue to be archived in the appropriate repository, such as GenBank. For more idiosyncratic data, the data can be placed in a more flexible digital data library such as the National Science Foundation-sponsored Dryad archive at http://datadryad.org.' Slide: Simon Hodson (Dryad / Jisc / CODATA)
    • PREPARDE and Bi-Directional Data Linking • already have a link from the GDJ data article to the data repository via DOI • GDJ can also pull the standard DOI metadata attached to that DOI from the DataCite metadata store • need to figure out a way so GDJ can inform the repository that their dataset has been cited/published! • At this time, we have a manual work- around (i.e. email) • Workshop on cross-linking between data centres and publishers 30th April 2013 at RAL, UK • Report out soon! BADC NCAR GDJ Standardised metadata DataCite Metadata Store Standardised metadata
    • Peer review of data: the Perfect Disaster? • Support for the peer review process – scholars contributing peer reviews with little formal reward – opportunity to polish and refine understanding of the cutting edge of research • But peer review system under stress – exploding number of journals, conferences, and grant applications – self-publication tools - blogs and wikis - allow scholars to disseminate their research results and products • Faster and more directly • Now adding research data into the publication and peer review queues …
    • Peer-review of data • Technical – author guidelines for GDJ – Funder Data Value Checklist – implicit peer review of repository? • Scientific – pre-publication? – post-publication? E.g. F1000R – guidelines on uncertainty e.g. IPCC – discipline specific? – EU Inspire spatial formatting • Societal – contribution to human knowledge – reliability http://libguides.luc.edu/content.php?pid=5464&sid=164619
    • Open Peer Review of Data ESSD peer review ensures that the datasets are: Plausible with no immediately detectable problems; Sufficient high quality and their limitations clearly stated; Well annotated by standard metadata and available from a certified data center/repository; Customary with regard to their format(s) and/or access protocol, and expected to be useable for the foreseeable future; Openly accessible (toll free) Earth System Science Data journal: http://www.earth-system-science-data.net/ Rebecca Lawrence, Data Publishing: peer review, shared standards and collaboration, http://www.dcc.ac.uk/events/research-data- management-forum-rdmf/rdmf8-engaging- publishers Faculty 1000 Open Peer Review Sanity check: Format and suitable basic structure adherence A standard basic protocol structure is adhered to Data stored in the most appropriate and stable location Open Peer Review: Is the method used appropriate for the scientific question being asked? Has enough information been provided to be able to replicate the experiment? Have appropriate controls been conducted, and the data presented? Is the data in a useable format/structure? Are stated data limitations and possible sources of error appropriately described Does the data ‘look’ ok (optional; e.g. Microarray data)
    • Draft Recommendations on Peer- review of data • Summary Recommendations from Workshop at British Library, 11 March 2013 • Workshop attendees included funders, publishers, repository managers, researchers …. • Draft recommendations put up for discussion and feedback captured • Feedback from the community still welcome • 2nd workshop 24 June: put recommendations to peer reviewers! http://libguides.luc.edu/content.php?pid=5464&sid=164619 Document at: http://bit.ly/DataPRforComment Feedback to: https://www.jiscmail.ac.uk/DATA- PUBLICATION
    • Draft Recommendations on data peer review Summary Recommendations from Workshop at the British Library, 11 March 2013 • Connecting data review with data management planning • Connecting scientific, technical review and curation • Connecting data review with article review • 4-5 draft recommendations in each of above • Assist Researchers, Publishers, Journal Editors, Reviewers, Data Centres, Institutional Repositories to map requirements for data peer review • Matrix of stakeholders vs processes – Assist in assigning responsibilities for given context – New for most disciplines – Learn from disciplines where this already happens
    • Connecting data review with data management planning 1. All research funders should at least require a “data sharing plan” as part of all funding proposals, and if a submitted data sharing plan is inadequate, appropriate amendments should be proposed. 2. Research organisations should manage research data according to recognised standards, providing relevant assurance to funders so that additional technical requirements do not need to be assessed as part of the funding application peer review. (Additional note: Research organisations need to provide adequate technical capacity to support the management of the data that the researchers generate.) 3. Research organisations and funders should ensure that adequate funding is available within an award to encourage good data management practice. 4. Data sharing plans should indicate how the data can and will be shared and publishers should refuse to publish papers which do not clearly indicate how underlying data can be accessed, where appropriate.
    • Connecting scientific, technical review and curation 1. Articles and their underlying data or metadata (by the same or other authors) should be multi-directionally linked, with appropriate management for data versioning. 2. Journal editors should check data repository ingest policies to avoid duplication of effort , but provide further technical review of important aspects of the data where needed. (Additional note: A map of ingest/curation policies of the different repositories should be generated.) 3. If there is a practical/technical issue with data access (e.g. files don’t open or exist), then the journal should inform the repository of the issue. If there is a scientific issue with the data, then the journal should inform the author in the first instance; if the author does not respond adequately to serious issues, then the journal should inform the institution who should take the appropriate action. Repositories should have a clear policy in place to deal with any feedback.
    • Connecting data review with article review 1. For all articles where the underlying data is being submitted, authors need to provide adequate methods and software/infrastructure information as part of their article. Publishers of these articles should have a clear data peer review process for authors and referees. 2. Publishers should provide simple and, where appropriate, discipline-specific data review (technical and scientific) checklists as basic guidance for reviewers. 3. Authors should clearly state the location of the underlying data. Publishers should provide a list of known trusted repositories or, if necessary, provide advice to authors and reviewers of alternative suitable repositories for the storage of their data. 4. For data peer review, the authors (and journal) should ensure that the data underpinning the publication, and any tools required to view it, should be fully accessible to the referee. The referees and the journal need to then ensure appropriate access is in place following publication. 5. Repositories need to provide clear terms and conditions for access, and ensure that datasets have permanent and unique identifiers.
    • Publishing research data • Research is heavily context specific => So keep it context specific? • Publishers and Professional & Learned Societies can help galvanise agreement of researchers • To define how they want their data represented and preserved for reuse & citation => your researchers need you • Along with funder appointed peer review committees often stronger connection than to institution • Institutional managers/services cannot cover wide range of discipline specific expertise • Like publishers don’t cover all fields • Relatively constant as career progresses • Change host institution • Change field(s) • Change technique • In UK RCUK funding for APCs strictly for articles only…. – How to fund APCs for depositing data in repositories? – Will publishers charge APCs to publish data papers?
    • 2012-02-07 DCC roadshow East Midlands - CC-BY-SA 42 PDB GenBank UniProt Pfam Spreadsheets, Notebooks Local, Lost High throughput experimental methods Industrial scale Commons based production Public data sets Cherry picked results Preserved CATH, SCOP (Protein Structure Classification) ChemSpider Research and the long tail Slide: Carole Goble
    • Enabling Open Data Publishing • Active Data Management Planning – built in at proposal stage • Local institutional tweaks of funder and local templates – Implemented and evolved in project • Data Management Plan as a live, evolving object • Annotate data on the fly – lab notebook approach – Curated & preserved using permanent identifiers • Appropriate repository and data collection descriptors
    • Active Data Storage: Identifying the Holy Grail ? • “what is needed is a tool to transparently sync local and network storage” (Marieke Guy, JISCMRD2 Nottingham) • CKAN (Orbital, Lincoln)? • Research Data Toolkit (Herts) – hybrid solutions • UC3 suite at CDL… • DropBox-like functionality a must • Usability • Technical interoperability • aim to help create databases for research data so • facilitates collaboration and data sharing • enables the subsequent publication of datasets • challenge is to ensure data are • documented • Preserved • service is sustainable
    • http://halogen.le.ac.uk  Portable Antiquities Scheme (British Museum)  Place-names (Nottingham)  Surnames  Genetics  IT hosting and GIS  Best practice: #JISCMRD, UKRDS, DCC, international
    • Halogen as template for research data management #jiscmrd  Requirements Analysis – must be iterative!  Data Management Plan – use DMPonline (UK Digital Curation Centre)  Scalable research data management infrastructure  pilot phase to nationally available resource  LAMP stack IT infrastructure: host research database – work with JISC/DCC  A model for the long term delivery of a data management service within the institution including  support, maintenance, governance & charging policies  Include researchers, IT services, research support office, library services etc.
    • BENEFITS  New research opportunities  Cross database work – seed new research samples  Scholarly communication/access to national resources  Key to English Place Names (Nottingham)  Portable Antiquities Scheme (British Museum)  Verification, re-purposing, re-use of data  Cleaning & enhancing private research datasets for reuse & correlation  No re-creation of data  Increased transparency  excellent training for best practice in research data management  Increasing research productivity  Build in cleaning, annotation, enhancement into normal research workflows  research datasets may immediately be reusable and interoperable  Impact & Knowledge Transfer  Reuse IT infrastructure  Increasing skills base of researchers/students/staff
    • Reward = Leverhulme Trust funding £1.3m!
    • CHALLENGES  interdisciplinary research database  ingest each input dataset in form such that sufficient information is carried forward to enable interoperation  Cultural differences  versioning & provenance for input datasets  which software tools, infrastructure , Query interface?  suitable for multi disciplinary researchers  Requirements upon the institution for sustaining the research assets & skills  Requirements upon the researchers  Annotating  Refreshing  Maintainence of datasets
    • No Response 63% Response Received 37% Researcher Responses to Contacts Made
    • Suggested timeline for implementing institutional research data management From Whyte & Tedds (2011), DCC Briefing http://www.dcc.ac.uk/resources/briefing-papers/making-case-rdm
    • Challenge for institutions – Rise to scientific and research challenge • Not just a management challenge • Responsibility for the knowledge they create – Library • “Doing the wrong things through the wrong people”? • Challenge for library to enable: • curation of data and publications • active support from data scientists • from centralised to dispersed support • Expert centres such as D-Lab essential! – IT Service • Provide research data platforms for researchers: – Active storage – Enable collaboration – Connect to preservation services through Library
    • But that’s not all…  What about the software underpinning data driven research?  If we’re going to publish as open data:  How do we help researchers to store, annotate and discover the datasets they create?  How do you sustain and reuse that?
    • Biomedical Research Infrastructure Software Service Kit A vision for cloud-based open source research applications #BRISSKit http://www.brisskit.le.ac.uk
    • BRISSKit context: The I4Health goal of applying knowledge engineering to close the ‘ICT gap’ between research and healthcare (Beck, T. et al 2012)
    • www.brisskit.le.ac.uk Email: brisskit@le.ac.uk
    • http://www.brisskit.le.ac.uk
    • The semantic bridge ? OBiBa Onyx Records participant consent, questionnaire data and primary specimen IDs i2b2 Cohort selection and data querying Bio-ontology!
    • Research Software Sustainability • OS community engagement • standards compliance • consortium approach • work with grain of researchers • discipline specific forks? • Github versioning an example for research data? • OS Community Engagement Charter • defining engagement with existing & new OS communities • including adoption & code commitments See Rob Baxter blog: “The research software engineer” • http://dirkgorissen.com/2012/09/13/the-research- software-engineer/
    • Lessons for institutions?  Can’t do it all in house!  But many disciplines don’t have data centres  Build coalition of institutional actors  Essential to have high level support  Take and shape  Identify what you do have in-house  Access external tools, standards where possible  Active storage, collaboration, eprints…  Propose best of breed for (inter)national reuse  Share benefits (and costs) over acacdemic networks  Sustainability the key challenge  As much cultural as technical – needs networks…
    • But institutions alone aren’t enough – we need an alliance!
    • Accepted Research Data Alliance Interest Group Publishing Data • http://rd-alliance.org/ • Close coordination with ICSU-WDS working group, CODATA and other ongoing initiatives in data publication – WDS under International Council of Science, RDA wider – Avoid duplication within related RDA and WDS WGs – join up – For WDS partnerships between publishers and data centres key • scope the territory – gap analysis • Use RDA Forum and new http://jiscmail.ac.uk/data-publication 350+ list • Take findings from RDA / WDS group(s) and trial in other communities / disciplines / institutional repositories
    • 15-9-2013 Launch meeting discussion 67
    • “Keep reaching for the stars” • increase the trustworthiness and value of individual data sets • strengthen the findings based on cited data sets • increase the transparency and traceability of data and publications • enable reuse and repurposing i.e. Problems but extraordinary opportunities – all hands on deck!
    • Thank you for listening and thanks to CDL, D- Lab and the project partners  Dr Jonathan Tedds jat26@le.ac.uk @jtedds Senior Research Fellow, D2K Data to Knowledge Research Group (University of Leicester) #PREPARDE http://www.le.ac.uk/projects/preparde Mailing list: http://jiscmail.ac.uk/DATA-PUBLICATION