Introduction to Research Data Management for postgraduate students
Introduction to Research Data Management For postgraduate Students University of Northampton, 20th February 2013 Marieke Guy DCC, University of Bath firstname.lastname@example.org Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.
Today’s Talk…Will consider:• What are data• What is the research process• What is research data management and why does it matter• How you can start thinking about good research data management We hope you leave able to explain why research data management is important and what roles postgraduate students / researchers play.
What are data?• The lowest level of abstraction from which information and knowledge are derived• Research data are collected, observed or created, for the purposes of analysis to produce and validate original research results• Both analogue and digital materials are data• Digital data can be: • created in a digital form ("born digital") • converted to a digital form (digitised)
What is the research process? Research Process •Research360
What is research data management?• Data curation is “the active management and appraisal of data• Over the lifecycle of scholarly and scientific interest”• Data have importance as the evidential base• of scholarly conclusions• Curation is part of good research practice
Why carry out RDM?Managing your data properly: Sharing data can• Can prevent data loss increase citation• Makes researching easier• Helps with validation of results and research integrity• Offers new research opportunities and collaborations• Is inline with the UON policy released in June 2011 More citations: 69% ↑• Is important for UON’s reputation (Piwowar, 2007 in PLoS)• Is part of good research practice
Data lifecycle 1. What data will you produce? 5. Preservation 1. 2. How will you organise the Create & Re-Use data? 3. Can you/others understand the data 4. 2.Publication& Deposit Active Use 4. What data will be deposited and where? 3. Documentation 5. Who will be interested in re-using the data?
Conceptualise and planActivities• define a research question and design your methodology• bid for funding (incl. data management and sharing plans)• plan data creation (capture methods, standards, formats)• data management plans!RolesPGR student, supervisory team, sponsors / funding bodies, IT,research governance, ethics panelDecisions made now have an impact on every other stage of thelifecycle, so it is worth getting things right from the start!
1. What data will you produce? • What type of data will 5. you produce? 1. Preservation Create & Re-Use • What types of file format? 4. • How easy is it to create or 2.Publication& Deposit Active Use reproduce? • Who owns it and is 3. Documentation responsible for it?
Data can take many forms• Notebooks & lab books• Instrument measurements• Experimental observations •http://www.flickr.com/photos/charleswelch/3 597432481//• Still images, video & audio• Consent forms •http://www.google.co.uk/imgres? q=illumina+bgi&hl=en&client=firefox- a&hs=Jl2&rls=org.mozilla:en-• Text corpuses GB:official&biw=1366&bih• Models & software• Survey results & interview transcripts
Data types Data Type Value ExampleObservational data Usually irreplaceable Sensor readings,captured around the time telemetry, survey results,of the event neuro-imagesExperimental data from Often reproducible but can Gene sequence,lab equipment be expensive chromatograms, toroid magnetic field readingsSimulation data generated Model and metadata Climate models, economicfrom test models (inputs) more important models than output data. Large modules can take a lot of computer time to reproduceDerived or compiled data Reproducible Text and data mining, (but very expensive) compiled databases, 3D models
Who owns or is responsible for your data?Ownership• Data ownership is complex, often defined on a case-by-case basis• May be dependent on individual contractual agreements• Contracts define needs of the University, staff, students, funders, collaboratorsManagement• Contact the University of Northampton ethics department
Responsibilities for dataIn practice - Everyone plays their partIf you’re generating and using data, you should: • Comply with guidelines from your group, department, faculty, collaborators • Make sure your data is securely stored and backed up • Describe your data so that you/others can understand it in futureIf you’re managing a project, you should: • Be fully aware of funder, collaborator and publisher requirements • Ensure you have access to group data • Assess what should be published and/or archived• More info: http://www.data-archive.ac.uk/create-manage
2. How will you look after your data? 5. 1. Preservation & Re-Use Create • Is your data safe? • Is your data organised? 4.Publication 2. Active Use • Can you find your data?& Deposit 3. Documentation
Storage and Security 3… 2… 1… Backup! at least 3 copies of a file on at least 2 different media with at least 1 offsite Test file recovery At set up time and on a regular basis Access Protect your hardware If sensitive use file encryption Keep passwords safe (e.g. Keypass) At least 2 people should have access to your dataMore info: http://www.data-archive.ac.uk/create-manage/storage
Storage & Security – Back up options Media Advantages Disadvantages CDs or DVDs • Useful for quick restore in • Static capture of data the event of minor disaster • Not built to last • Vulnerable to theft • Physical loss of media External hard • Dynamic capture of data • Must store securely and drives, including • Useful for quick restore in remotely to original copy dropbox the event of minor disaster • Vulnerable to theft • Must use file encryption if sensitive Northampton • Resilient backup • Lack of offline access server • Must have a Northampton • TUNDRA2 will replace account Digital scans of • Can carry out using campus • Manipulation of page lab books printer content difficult
Can you find your data?A Clear Directory Structure• Top level folder and substructureFile Version Control• Discard obsolete versions if no longer needed after making backups• Manage using: File naming (see below), version control software (e.g. Git, Mercurial, SVN)File Naming Conventions• Record any naming conventions or abbreviations used e.g. [Experiment]_[Reagent]_[Instrument]_[YYYYMMDD].dat• Date/time stamp or use a separate ID (e.g. v1) for each versionMore info: http://www.jiscdigitalmedia.ac.uk/crossmedia/advice/ choosing-a-file-name/
3. Documenting data • Do you still understand your older work? 5. 1. Preservation Create & Re-Use • Is the file structure / naming understandable to others? 4. 2.Publication& Deposit Active Use • Which data will be kept? 3. • Which data can be Documentation discarded?
Understanding your data• Students: • Will you be able to write up your methods at the end of your studies?• Project leads: • Will you be able to respond to reviewers comments? • Will you be able to find the information you need for final project reports?• Can you reproduce your work if you need to?• What information would someone else need to replicate your work?
Understanding your dataDo you know how you generated your data? • Equipment or software used • Experimental protocol • Other things included in (e.g.) a lab notebook • Can reference a published article, if it covers everythingAre you able to give credit to external sources of data? • Include details of where the data are held, identified & accessed • Cite a publication describing the data • Cite the data itself e.g.
Metadata • Contextual information for data is called metadata — literally data about data • Data repositories & archives require some generic metadata, e.g. author, title, publication date • For data to be useful, it will also need subject-specific metadata e.g. reagent names, experimental conditions, population demographic • Record contextual information in a text file (such as a ‘read me’ file) in the same directory as the data e.g. • codes for categorical survey responses • ‘999 indicates a dummy value in the data’More info: http://www.data-archive.ac.uk/create-manage/document
3.What data will be deposited & where? • Are you expected to share your data? 5. 1. Preservation & Re-Use Create • Are you allowed to share your data? • Define the core data set 4.Publication 2. Active Use of the project& Deposit • Which data will be 3. Documentation included in your publication / thesis?
Data Sharing – Why share your data?• Share with your future self – avoid repeating research!• Promote your research – get cited!• Enable new discoveries• Replication• Store your data in a reliable archive• Comply with funding requirements
Requirements to share your data• Some journal publishers have a policy on data availability.• Most UK funders now expect research data to be made publically available – UON policy• Are you making any of your data available as supplementary information?• Is there sufficient information with the data so that it can be understood and reused? declaration data are a public good and Code of good research conduct should be openly available data should be preserved and accessible for 10 years + Funders’ data policies Common principles on data policywww.dcc.ac.uk/resources/policy-and- www.rcuk.ac.uk/research/Pages/legal/funders-data-policies DataPolicy.aspx
How to share your data• Deposit in a data repository eg. GenBank, UKDA• Data can be licensed• Culture of data sharing: can make available your data under a CC- BY or CC0 declaration to make this explicit • CC-BY license permits reuse but requires attribution • CC0 declaration is a waiver of copyright. • Note that laws about data vary in different countries.• You may have rights to first use or to commercial exploit data How to license research data: http://www.dcc.ac.uk/resources/how-guides/license-research-
5. Preservation and reuse • How long will your data 5. be reusable for? 1. Preservation Create & Re-Use • Do you need to prepare your data for long term archive? 4. 2.Publication Active Use& Deposit • Which data do you need to keep? 3. Documentation
Data retention and archivingHow permanent are the data? • Short term (e.g. 3-5 years) • Long term (e.g. 10 years) • IndefiniteShould discarded data be destroyed? • Keep all versions? Just final version? First and last?What are the re-processing costs? • Keep only software and protocol/methodology informationAre there tools/software needed to create, process or visualisethe data? Archive these with your data
Selection and appraisal Make a start on selection and appraisal from as early a point as possible. Plan for what you think you’ll need to keep to support your research findings. What is the minimum you’ll need to support your findings over time? Know who you are keeping it the data for and what you want them to be able do with it (is for yourself only, or for other people too?). This may affect the way you keep it and what you keep. Conversely, know what you need to dispose of. Destruction is often vital to ensure compliance with legal requirements. Appraise for the here and now but with an eye to the future. Think about resources required / available. These will affect you selection and appraisal decisions. Look for relationships with other data sets in your archive/repository as part of the appraisal process (i.e. does the dataset augment another collection significantly?). Some funders stipulate that you must identify whether the data exists already. This process might highlight additional datasets that your new research might augment significantly.
File formats for long term access • Unencrypted • Uncompressed • Non-proprietary/patent-encumbered • Open, documented standard • Standard representation (ASCII, Unicode) Type Recommended Avoid for data sharing Tabular data CSV, TSV, SPSS portable Excel Text Plain text, HTML, RTF Word PDF/A only if layout matters Media Container: MP4, Ogg Quicktime Codec: Theora, Dirac, FLAC H264 Images TIFF, JPEG2000, PNG GIF, JPG Structured data XML, RDF RDBMSFurther examples: http://www.data-archive.ac.uk/create-manage/format/formats-table
Summary• Data management is important at all stages of a project• There are tools available to help you• Keep your data safe: Back up your data and test your back- ups• Keep your data organised • Find it – good formats and file names • Understand it - check documentation and metadata• Consider publishing your data so that you can get recognition for your work• Help is available at: http://researchsupporthub.northampton.ac.uk/contact-us/
Thanks - any questions? Acknowledgements:Thanks to Research360, DCC staff, UK Data Archive, Mantra (University of Edinburgh) for slides