Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

RDM for trainee physicians

428 views

Published on

Invited presentation at the Royal College of Physicians Edinburgh workshop on Critical appraisal and research for trainees

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

RDM for trainee physicians

  1. 1. Stuart Macdonald Associate Data Librarian EDINA & Data Library University of Edinburgh stuart.macdonald@ed.ac.uk Research Data Management: What you need to know Research - an introduction for trainee physicians Royal College of Physicians of Edinburgh 22 March 2016
  2. 2. Running order  Defining research data & data types  Research Data Management (RDM)  Funder requirements  Data (and software) management planning  Organising data  File formatting  Documentation & metadata  Storage & security  Data protection, rights & access  Preservation, sharing & licensing
  3. 3. Defining research data  Research data are collected, observed or created, for the purposes of analysis to produce and validate original research results.  Data can also be created by researchers for one purpose and used by another set of researchers at a later date for a completely different research agenda.  Digital data can be: o created in a digital form ('born digital') o converted to a digital form (digitised)
  4. 4. Types of research data
  5. 5. Research Data Management (RDM) • RDM is a general term covering how you organise, structure, store, and care for the data used or generated during the lifetime of a research project. • It includes: – How you deal with data on a day-to-day basis over the lifetime of a project, – What happens to data after the project concludes.  RDM is considered an essential part of good research practice.  Good research needs good data!
  6. 6. Activities involved in RDM  Data management Planning  Creating data  Documenting data  Storage and backup  Sharing data  Preserving data
  7. 7. Why manage your data?  So you can find and understand it when needed.  To avoid unnecessary duplication.  To validate results if required.  So your research is visible and has impact.  To get credit when others cite your work.
  8. 8. Drivers of RDM “Publicly funded research data are a public good, produced in the public interest, which should be made openly available with as few restrictions as possible in a timely and responsible manner that does not harm intellectual property.” RCUK Common Principles on Data Policy http://www.rcuk.ac.uk/research/datapolicy/
  9. 9. Funding bodies’ requirements  Funders are increasingly requiring researchers to meet certain data management criteria.  When applying for funding, you need to submit a technical or data management plan.  You are expected to make your data publicly available where appropriate at the end of your project and include a short statement, describing how and on what terms any supporting research data may be accessed.  Horizon 2020 Open Data Pilot is driving lots of national RDM pilots across Europe  Parallels the response to the EPSRC data policy in UK
  10. 10. EPSRC Policy Framework on Research Data http://www.epsrc.ac.uk/about/standards/researchdata/impact/
  11. 11. EPSRC Expects that: • published research papers should include a short statement, describing how and on what terms any supporting research data may be accessed, • metadata on the research data they hold will be published by institutions within 12 months of data generation, • data will be securely preserved for a minimum of 10 years from the date of last 3rd party access. https://www.epsrc.ac.uk/about/standards/researchdata/expectations/ https://www.epsrc.ac.uk/files/aboutus/standards/clarificationsofexpectationsresearchdatamanagement/
  12. 12. RCUK Concordat Research Councils UK (RCUK) published a draft Concordat on Open Research Data (17 August 2015) The 10 principles aims to ensure that research data generated by UK researchers is made openly available for re-use: • in a manner consistent with relevant legal, ethical and regulatory frameworks • recognising the autonomy of researchers • emphasises responsibilities and accountabilities (research institutions, universities, funders) • it does not intend to mandate specific activities. http://www.rcuk.ac.uk/RCUK-prod/assets/documents/documents/ConcordatOpenResearchData.pdf
  13. 13. University’s RDM Policy  University of Edinburgh is one of the first few Universities in UK who adopted a policy for managing research data: http://www.ed.ac.uk/is/research-data-policy  The policy was approved by the University Court on 16 May 2011.  It’s acknowledged that this is an aspirational policy and that implementation will take some years. http://www.ed.ac.uk/is/research-data-policy
  14. 14. Day-to-day management of your research data
  15. 15. What would you do if you lost all your data? • Dropping your laptop • Hard drive failures • Software updates • Obsolescence • Poorly described data (metadata) • Theft of equipment • Overwriting data/versioning • File formats • Media degradation – CDRs, memory sticks… It could happen to you too!
  16. 16. …and one from University of Edinburgh
  17. 17. What to do? Consider:  Having a Data Management Plan (DMP).  Organising your data: o structure o file names and versions.  File formats.  Documentation & metadata.  Secure data storage & regular backup.
  18. 18. What is a Data Management Plan (DMP) DMPs are written at the start of a project to define:  What data will be collected or created?  How the data will be documented and described?  Where the data will be stored?  Who will be responsible for data security and backup?  Which data will be shared and/or preserved?  How the data will be shared and with whom? DMPs are often submitted as part of grant applications, but are useful in their own right whenever you are creating data.
  19. 19. DMPonline Free and open web-based tool to help researchers write plans: https://dmponline.dcc.ac.uk/ It features: o Templates based on different funder requirements o Tailored guidance (disciplinary, funder etc.) o Customised exports to a variety of formats o Ability to share DMPs with others DMPonline screencast: http://www.screenr.com/PJHN
  20. 20. Tips to share  Keep it simple, short and specific.  Avoid jargon.  Seek advice - consult and collaborate.  Base plans on available skills and support.  Make sure implementation is feasible.  Justify any resources or restrictions needed. Also see: http://www.youtube.com/watch?v=7OJtiA53-Fk
  21. 21. Software Management Plans  SMPs are relatively new for research proposals.  The EPSRC Software for the Future call requires SMPs as part of the Pathways to Impact. NSF SI2 funding requires software to be addressed as part of mandatory data management plans.  A prototype Software Management Plan (SMP) Service has been developed by the Software Sustainability Institute to help researchers write software management plans  A guide is on writing & using a software management plan is available: http://www.software.ac.uk/resources/guides/software-management-plans
  22. 22. Organising data Why? To ensure your research data files are identifiable by you and others in the future. Organising and labelling your research data files and folders will help to:  prevent file loss through overwriting, deleting, misplacing  facilitate location and future retrieval  save you time (mostly in the future) How? With consistent & disciplined approach by:  Setting conventions at the start of your project  Adopting an appropriate file naming & versioning convention
  23. 23. File formats Type Recommended Avoid for sharing Tabular data CSV, TSV, SPSS portable Excel Text Plain text, HTML, RTF, PDF/A only if layout matters Word Media Container: MP4, Ogg Codec: Theora, Dirac, FLAC Quicktime, H264 Images TIFF, JPEG2000, PNG GIF, JPG Structured data XML, RDF RDBMS Files encoded as text or binary files: • Text encoding: machine- and human-readable. Less likely to become obsolete .txt, .csv, .html, .xml, .tex, etc. • Binary encoding: only readable with appropriate software .fcp, .xlxs, .docx, .psd, .nc, etc.
  24. 24. File formatting If you need to convert or migrate your data files to another format be aware of the potential risk of loss or corruption of your data.  Always test the files you convert or migrate You may also use the data normalisation process i.e. convert data from one format (e.g. proprietary) into another for use or preservation (e.g. into raw ASCII). When compressing your data files (storage, sending, sharing) you encode the information using fewer bits than the original representation. Compression programs like Zip and Tar.Z produce files such as .zip, .tar.gz, .tar.bz2
  25. 25. Documentation and metadata Documentation (intending for reading by humans)  Contextual information o Aims & objectives of the originating project  Explanatory material o data source o collection methodology & process o questionnaire, codebook o dataset structure o technical information Metadata (intended for reading by machines)  ‘data about data’  descriptors to facilitate cataloguing and discoverability.
  26. 26. Why it is necessary  To help you …  remember the details of your data  archive your data for future access & re-use  To help others …  discover your data  understand the aims and conduct of the originating research  verify your findings  replicate your results
  27. 27. Data Storage - basic principles  Use managed, network services whenever possible to ensure: o Regular back-up o Data Security o Accessibility  Avoid using portable HD’s, USB memory sticks, CD’s, or DVD’s to avoid: o Data loss due to damage or failure o Quality control issues due to version confusion o Unnecessary security risks e.g. theft Digital Preservation Coalition’s new promotional USB stick: https://twitter.com/digitalfay/status/411444578122 600450/photo/1
  28. 28. Secure storage & backup  Make at least 3 copies of the data: o on at least 2 different media, o keep storage devices in separate locations with at least 1 offsite, o check they work regularly, o ensure you know the back-up procedure and follow it.  Ensure you can keep track of different versions of data, especially when backing-up to multiple devices. o Use a versioning software e.g., SVNTortoise, Subversion One copy = risk of data loss •CC image by Sharyn Morrow on Flickr •CCimagebymomboleumonFlickr
  29. 29. Keeping sensitive data secure  Ensure PC’s, laptops, and portable data storage devices are stored securely and encrypted if necessary - BitLocker (Windows), FileVault (Mac).  Be aware that if the any encrypted data will be lost if the password/encryption key is lost or if the hard disk fails.  Give access to data to authorised people only System lock: Image by Yuri Yu. Samoilov - Flickr (CC- BY) https://www.flickr.com/photos/110751683@N02/
  30. 30. Data disposal  Ensure disposal of confidential data securely. o Hard drives: use software for secure erasing such as BC Wipe, Wipe File, DeleteOnClick, Eraser for Windows; ‘secure empty trash’ for Mac. o USB Drives: physical destruction is the only way o Paper and CDs/optical Discs: shredding  UoE has a comprehensive guide on the disposal of confidential and/or sensitive waste held on paper, CDs, DVDs, tapes, discs hard drives etc. http://www.ed.ac.uk/schools-departments/estates- buildings/waste-recycling/how/confidential-waste
  31. 31. Things to think about …  Ethics  Requirements relating to data that relates to human subjects.  Privacy, confidentiality & disclosure  Data protection  Intellectual Property Rights (IPR)  Copyright
  32. 32. Ethics Ethics committees  Review research applications and advise on whether they are ethical.  Safeguard the rights of research participants. Participants  Must be fully informed as to the purpose and intended uses of the research, and advised of what their involvement will entail.  Participation must be voluntary, fully informed and free of any coercion.  Confidentiality of information collected and anonymity of subjects must be respected at all times.
  33. 33. Privacy, confidentiality & disclosure Privacy  An entitlement of an individual subject.  Handling, storage and sharing of data must be managed to preserve the privacy of the subject. Confidentiality  Refers to the behaviour of the researcher, whereby the privacy of the subject is maintained at all times. Disclosure  Must be guarded against!  Various techniques to avoid it, whether for ethical, legal reasons or commercial reasons, e.g. o removing identifiers from personal information (e.g. D.o.B, Nat. Ins. No.) o aggregating geographical data to reduce precision o anonymising data – but without overdoing it!
  34. 34. Data protection & Intellectual Property Rights (IPR)  The UK Data Protection Act 1998 is a Parliamentary Act defining the law on the processing of data on living people.  It is the main piece of legislation that governs the protection of personal data in the UK  Research data falls within the scope of this Act.  Failure to observe it can result in:  monetary penalty notices,  prosecutions  enforcement notices  audit without consent  IPR is the legally recognized rights and protection given to persons for ‘creations of the mind’  e.g. music, literature, and other artistic & scholarly works; discoveries, inventions, symbols, and designs  IPR grants exclusive rights to creators to:  Publish a work  License its distribution to others  Sue if unlawful copies or use is made of it
  35. 35. Copyright  Can be contentious & complex!  When data are archived or shared, the creator retains copyright.  Data structured within a database as a result of intellectual investment, retains an additional ‘database right’  Can sit alongside the copyright attached to the data contents.
  36. 36. Freedom of Information  The Freedom of Information Act 2000  … gives a right of access to information held by 'public authorities‘, which includes most universities  … covers all records and information held by them , whether digital or print, current or archived.  Some research data are exempt (data about human subject, commercial partners, national security)
  37. 37. Data preservation … Preservation is key to the long term existence and future accessibility of research data and is worth thinking about at the planning stage. For the purposes of preservation data should be deposited in a trusted repository.  Research-funders  ESRC data store: http://store.data-archive.ac.uk/store/  Zenodo (EU): https://zenodo.org/  Institutional (UoE)  Edinburgh DataShare: http://datashare.is.ed.ac.uk/  Discipline-specific  Archaeology Data Service: http://archaeologydataservice.ac.uk/  Discipline-agnostic  Figshare: http://figshare.com/ Mapping the preservation process, workflow devised by Higgins, S., DCC (Digital Curation Centre)
  38. 38. Data sharing .. … the researcher  Comply with funder requirements  Research can be validated  Increase impact through citation (reputation)  Increase visibility of research  Long-term data storage (preservation)  Enables future re-use (you & others) … research & society  Avoid duplication of effort & resources  Publicly funded research is available  Academic & scientific integrity  increases transparency & accountability  facilitates scrutiny of research findings  prevents fraud  Extend reach of original research & fosters collaboration ..is making your research available for others to reuse & build upon. Benefits
  39. 39. Barriers to sharing “Scientists would rather share their toothbrush than their data!” Carol Goble, Keynote address, EGEE (Enabling Grid for EsciencE) ’06 Conference Valid reasons not to share:  Research conducted in clinical settings (e.g. clinical trials)  Research that includes confidential data pertaining to human subjects  Research for national security (e.g. with MoD)  Research with commercial partners to develop patents (e.g. for drug development) Future ‘share-ability’ of the data - issues to consider:  Format, Software, Documentation, Ethics, Consent & Confidentiality, Anonymisation  Timescale for release (embargo)  Infrastructure for sharing  Rights & licensing http://openclipart.org/detail/172856/toothbrush-by-bpcomp-172856
  40. 40. Data licensing Why?  The license explicitly states how your data may be used  Makes them available to others (where appropriate)  Ensures your data are open! How?  Repository rights statement’  Creative Commons (CC): http://wiki.creativecommons.org  Open Data Commons (ODC): http://opendatacommons.org/
  41. 41. Thank You! Questions? Email: Stuart.macdonald@ed.ac.uk

×