Preparing Research Data
for Sharing
An overview for LSHTM students
Gareth Knight & Victoria Cranna
This work is licensed u...
Data Sharing in the News
Research Data
“Data produced during the research activity
should be managed appropriately, ensuring
that it is stored, org...
To Share or not to Share
1. Is the Sharing justified?
• What benefits will it provide?
• What are the risks associated wit...
Reasons for
• Encourages validation of research
findings
• Increase visibility of research
findings through attribution an...
Reasons against
• Ownership issues , e.g. 3rd party
rights
• Participant Confidentiality - DPA
1998 –not apply to anonymis...
Protection of
Research Participants
“ Researchers must ensure the confidentiality
of personal information relating to rese...
Data Protection Act 1998
Personal Data
Info that can be used to identify
individual in isolation, or in tandem with
other ...
Data Protection Principles
Eight principles which broadly state that personal data shall be:
1. Fairly and lawfully proces...
Potential Exemptions
No blanket exemption, but...
• Certain exemptions for research purposes including
statistical or hist...
Reducing Disclosure risk
Disclosure Types:
• Identity: Identify person directly
• Attribute: ID sensitive info on subject
...
Ensuring continued access
Problems:
1. User doesn’t possess relevant
software package
2. User runs a different operating
s...
Choosing File Formats
Format should be:
• Accessible using wide-range of
software tools
• In widespread use
• Support rele...
Recommended Formats
Quantitative tabular:
• Preferred: SPSS portable format (.por), delimited txt & command/setup file
• A...
Ensuring Understandability
Researcher Qs:
• What does the variable mean?
• How were the results produced?
• What are the b...
Ensuring Usability
Scenarios:
1. Uncertain if permitted to
analyse data – does not use.
2. Researcher uses data in researc...
1. Standard licence model
Creative Commons
Attribution (BY): Creator must be credited
No Derivatives (ND): No editing or m...
2. Tailored Licence form
• National Cancer Research Institute - Data
and Material Transfer Agreement
template
• http://www...
LSHTM Data Repository
• Public: data made available for
anonymous access
• Registered: End user required to
register for t...
A Few Useful References
• MANTRA – Data Management training for PhD students
http://datalib.edina.ac.uk/mantra/
• UK Data ...
Contact
Open Access
Andrew.gray@lshtm.ac.uk
Data Protection
Victoria.cranna@lshtm.ac.uk
Data Management
gareth.knight@lsht...
Image References
• “Sharing” (CC BY-NC 2.0) http://www.flickr.com/photos/tobanblack/3773116901/
• "Women slicing tomatoes ...
Upcoming SlideShare
Loading in …5
×

Preparing research data for sharing

968 views

Published on

workshop session delivered alongside 'Making your thesis legal' workshop in July and September 2013 to PhD, MPhil, DrPh students who are completing their thesis. Discusses standards for sharing data, issues that need addressing, formats, data protection, usability, licenses

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Preparing research data for sharing

  1. 1. Preparing Research Data for Sharing An overview for LSHTM students Gareth Knight & Victoria Cranna This work is licensed under a Creative Commons Attribution 2.0 UK: England & Wales License LSHTM eThesis session Presented on 10th and 18th July 2013
  2. 2. Data Sharing in the News
  3. 3. Research Data “Data produced during the research activity should be managed appropriately, ensuring that it is stored, organised and documented in a manner that allows it to be understood and used for the intended purpose.” Research Degrees Handbook: Academic Year 2012-13
  4. 4. To Share or not to Share 1. Is the Sharing justified? • What benefits will it provide? • What are the risks associated with sharing data? 2. Do you have the ability to share? • Intellectual Property Rights (IPR) • Participant Consent • Other obligations, e.g. confidentiality 3. Are there any conditions associated with sharing? • What measures need to be in place to protect data? (e.g. record access requests, specific use only) Information Commissioner Office. Data Sharing Code of Practice http://www.ico.org.uk/for_organisations/data_protection/topic_guides/data_sharing
  5. 5. Reasons for • Encourages validation of research findings • Increase visibility of research findings through attribution and further analysis • Comply with sponsor obligations • Comply with journal publisher req. • Simple way to deal with annoying data requests
  6. 6. Reasons against • Ownership issues , e.g. 3rd party rights • Participant Confidentiality - DPA 1998 –not apply to anonymised data • Sensitivity - Implications of release (e.g. geo-references for animal migration). • Commercial/Research exploitation • Contractual, regulatory, & legislative What are the risks of data release?
  7. 7. Protection of Research Participants “ Researchers must ensure the confidentiality of personal information relating to research participants” “Prior to publication or depositing data in a public depository, data should be fully anonymised” LSHTM Guidelines on Good Research Practice
  8. 8. Data Protection Act 1998 Personal Data Info that can be used to identify individual in isolation, or in tandem with other info. E.g. Name, age, address, etc. Sensitive Personal Data racial or ethnic origin political opinions religious beliefs trade union membership physical or mental health sexual life criminal convictions Protect living individual’s fundamental rights and freedoms in relation to storage, processing, and disclosure of information held about them
  9. 9. Data Protection Principles Eight principles which broadly state that personal data shall be: 1. Fairly and lawfully processed 2. Obtained only for specified purposes, and shall not be further processed for other purposes that are incompatible with the original reason 3. Adequate, relevant and not excessive in comparison to original purpose 4. Accurate and where necessary, kept up to date 5. Held no longer than is necessary 6. Processed in accordance with the data subject’s rights 7. Kept securely and safely with appropriate measures to prevent unauthorised or unlawful processing of the data and against accidental loss, destruction or damage 8. Not transferred to countries without adequate protection
  10. 10. Potential Exemptions No blanket exemption, but... • Certain exemptions for research purposes including statistical or historical purposes. • If the research processing is not targeted at particular individual & does not cause substantial distress or damage to a data subject, then: • 2nd principle - personal data can be processed for purposes other than for which they were originally obtained • 5th principle - personal data can be held indefinitely • Analysis results do not identify data subjects Information Commissioner Office: Guide to Data Protection http://www.ico.org.uk/for_organisations/data_protection/the_guide
  11. 11. Reducing Disclosure risk Disclosure Types: • Identity: Identify person directly • Attribute: ID sensitive info on subject • Inferential: Determine value of a subject’s characteristic more accurately than would have been otherwise possible Techniques: • Remove obvious identifiers (DPA 1998) • Replace real data with synthetic • Limit variables that are made available • Sampling with a larger group • Group significant values / Top/bottom coding • Limit geographic detail Avoiding inappropriate attribution of information to a data subject Information Commissioner Office: Anonymisation Code of Practice http://www.ico.org.uk/for_organisations/data_protection/topic_guides/anonymisation
  12. 12. Ensuring continued access Problems: 1. User doesn’t possess relevant software package 2. User runs a different operating system than the creator (e.g. Linux, MacOS) 3. Software package is obsolete Options: • Emulation of original environment • Export to other format
  13. 13. Choosing File Formats Format should be: • Accessible using wide-range of software tools • In widespread use • Support relevant information attributes without loss • Based upon a public specification • Able to be created without DRM or other limitations “turning [a] PDF into XML is like turning a hamburger into a cow” Peter Murray-Rust
  14. 14. Recommended Formats Quantitative tabular: • Preferred: SPSS portable format (.por), delimited txt & command/setup file • Acceptable: SPSS (.sav), Stata (.dta), MS Access & other proprietary formats Geospatial: • Preferred: ESRI Shapefile, Geo-referenced TIFF (.tif, .tfw) • Acceptable: SRI Geodatabase format (.mdb), MapInfo Interchange Format (.mif), Keyhole Mark-up Language (KML) (.kml) Qualitative text: • Preferred: XML-encoded text (e.g. DDI, TEI), Open Document Format (ODF), Rich Text Format (RTF) • Acceptable: MS Word, NVivo Still Images: • Preferred: TIFF, Uncompressed lossless JP2000 • Acceptable: PNG, RAW, Compressed JP2000
  15. 15. Ensuring Understandability Researcher Qs: • What does the variable mean? • How were the results produced? • What are the boundaries of the measurement? • What instruments and measures were used? A user – a 3rd party or future self) has difficult understanding some aspect of the research data Source: • Lab notebooks & research protocols • Codebooks and data dictionaries • Equipment settings & instrument calibration Approach: 1. Check reqs in your field (e.g. Clinical) 2. Look at other collections (e.g. UKDS) 3. Consider Qs that user may have when accessing
  16. 16. Ensuring Usability Scenarios: 1. Uncertain if permitted to analyse data – does not use. 2. Researcher uses data in research for non-permitted purpose End user unsure on permitted use of data Licence should specify: • Data that the licence applies to; • Who owns each component; • Who is permitted access & use; • Conditions associated with use
  17. 17. 1. Standard licence model Creative Commons Attribution (BY): Creator must be credited No Derivatives (ND): No editing or manipulation Non-Commercial (NC): Cannot be sold Share Alike (SA): Share under same licence Open Data Commons Public Domain Dedication & License (PDDL) Attribution License (ODC-By) Open Database License (ODC-ODbL) Attribution Share-Alike Various software Licence Models GNU General Public License (GPL) GNU General Public License (LGPL) BSD license Etc.
  18. 18. 2. Tailored Licence form • National Cancer Research Institute - Data and Material Transfer Agreement template • http://www.ncri.org.uk/default.asp?s=1& p=8&ss=9 • UK Data Service licence http://ukdataservice.ac.uk/deposit- data/support/licence.aspx • CELCIUS Data Access Agreement http://celsius.lshtm.ac.uk/documents/Dat a%20Access%20Agreement.doc • Participant Consent form http://www.lshtm.ac.uk/research/ethicsc ommittees/ Digital Curation Centre: How to License Research Data http://www.dcc.ac.uk/resources/how-guides/license-research-data
  19. 19. LSHTM Data Repository • Public: data made available for anonymous access • Registered: End user required to register for time-limited access • Approved: End user must state purpose they wish to use data for. • Embargoed: Data associated withheld for a designated time period, e.g. 5 years. • Request: Data not held in the repository may be requested from the creator In-development service capable of curating, preserving, and sharing LSHTM research data
  20. 20. A Few Useful References • MANTRA – Data Management training for PhD students http://datalib.edina.ac.uk/mantra/ • UK Data Archive – Managing and Sharing Data http://www.data-archive.ac.uk/media/2894/managingsharing.pdf • LSHTM Information Management support material http://intra.lshtm.ac.uk/infoman/ • Data Protection web pages: http://intra.lshtm.ac.uk/infoman/data/ • Guidelines on good research practice: Implementing research governance: http://www.lshtm.ac.uk/research/ethicscommittees/good_research_practice.p df • Research Degrees Handbook: http://www.lshtm.ac.uk/study/currentstudents/studentinformation/rd_handbo ok_12_13.pdf • Information Management and Security Policy: http://intra.lshtm.ac.uk/infoman/security/index.html
  21. 21. Contact Open Access Andrew.gray@lshtm.ac.uk Data Protection Victoria.cranna@lshtm.ac.uk Data Management gareth.knight@lshtm.ac.uk
  22. 22. Image References • “Sharing” (CC BY-NC 2.0) http://www.flickr.com/photos/tobanblack/3773116901/ • "Women slicing tomatoes for food preparation" (CC BY-NC 2.0) • http://www.flickr.com/photos/45796762@N03/7999269493/ • “Warned” (CC BY 2.0) http://www.flickr.com/photos/figgenhoffer/2598487764/ • “Day 114, Project 365 - 2.13.10” (CC BY 2.0) • http://www.flickr.com/photos/93841400@N00/4355611690/ • "license" (CC BY 2.0) • http://www.flickr.com/photos/flowizm/3861998999/ • Rosetta Stone (CC BY-NC 2.0) http://www.flickr.com/photos/65713088@N00/6268592919/ • “Obsolete Packages” (CC BY-SA 2.0) http://www.flickr.com/photos/floydwilde/160475157/ • “Activity SpreadSheet. Aug. 1” (CC BY-NC 2.0). http://www.flickr.com/photos/bitchcakes/7993211140/ • "2006-06-14 012 - Cow" (CC BY-NC 2.0) http://www.flickr.com/photos/chrisq/167074953/ • My favorite (CC BY-SA 2.0) • http://www.flickr.com/photos/erwss/3129884643/

×