A bunchofus(inLibraryTeaching and Learning) havebeentalking a lotoverthelastfewmonthsaboutResearchDataManagement and it’ssomethingthatwethinkwillbebecoming more and more important, sowewanted to giveeveryoneanoverviewofwhatitactuallyis.Researchdatacanbeallsortsofthings:SoilaciditymeasurementsPlotson a mapImagesofseeds, or 3D imagesofinsectsVideosoforalhistoriesData has always been a vital part of research but increasingly research is involving *lots* of data, making it harder to store, organise, keep safe, and publish.And there’s a bigpotential for Library, Teaching and Learning to getinvolvedinsupportingtheseprocessesinthesamewaythatwe:supportbibliographicmanagementwithEndnoteinstructionsupportwriting and statisticsskillssupportsystemsliketheResearchArchive and someofLincoln’sOpenJournals
There are lots of diagrams out there about data management from different points of view.This is a good one showing the cycle from the point of view of a researcher doing a research project – or from the point of view of Library, Teaching and Learning staff providing training, advice and support – and I’m going to structure the rest of this presentation around it.Sowe’llbetalkingabout:1 – how the current drive for sound data management is coming from university / funder / journal policies2 – what data management planning involves3 – ethical and legal considerations4 – organising, structuring, and documenting data5 – backing up and version control6 – security vs collaboration7 – publishing data (whether in journals or repositories)8 – archiving (which might or might not be public)And at the centre of the model is the training and advice and technical systems needed to support all these processes.
Research Data Management ultimately makes life easier for researchers themselves and the public funding that research.Butas a matterofpolicy, it’sbeingdrivenfromthese three groups:Funders increasingly require researchers to produce Data Management Plans to get a grant, and to maintain them during the course of research.Journals increasingly require authors to provide datasets on request, or even to make them available online at the time the paper is published.Universities want value for money – they need data to be secure, they need to guarantee data is valid, they need to defend any IP challenges.
A data management plan needs to cover the type of data to be gathered, the format, size, organisation, structure, metadata, storage, backing up, manipulation, version control, security, privacy, confidentiality, tikanga, collaboration, publication, preservation, licensing, citation….This is a lot to cope with when you’re new to data management so a lot of institutions use an online tool like DMPonline or DMPTool that takes the researcher through step by step.All you need to do is select a template (your funder may require one or your institution may have a default) and then fill it out. Then as you carry out your research and your research direction inevitably changes, you treat your DMP as a living document and tweak it accordingly.
Here I’m showing “BigBrotherwatchingyou”, butthere are a wholeraftofethical and legalissuesthatneedconsidering.“Privacy is about people. Confidentiality is about data.” (http://www.research.uci.edu/ora/hrpp/privacyAndConfidentiality.htm)Research involving or affecting Maori communities/individuals/information requires consultation and may require use of Maori research methods.Use/misuse of data – consider benefits to publication of data, and how any risks might be mitigated. Eg liability for information about food or poisonsLicense agreements signed for use of software or external data in your research need to be adhered to.And as always you need to attribute the creators of data just as you’d cite any other source.
Organise – use simple, logical folder structures and file naming conventions so you, others in the project, and people reading about it can find things again.(U of Edinburgh has created good naming conventions that are widely used and recommended.)Also organise the data inside the file in some simple and logical way. Tables, graphs, whatever’s most appropriate. Many disciplines have their own standards.Document – add all the labels and units needed for someone to understand the data if they just open the file with no idea about the research project.You may need a separate file to document other contextual information, eg who collected the data, where and when, using what tools, sampling methods? How was the data analysed and manipulated?This is all metadata – data about data.People are developingsoftware and othersystems to make thiseasier, butit’s a complexproblembecause (asmentionedatthestart) datacanbesomanythings. Sothat’s one area we’rewatching.
Computers die. Files get corrupted, storage media fails. Also cars get stolen along with a laptop containing years’ worth of research. We see stories like this regularly in the news but people keep leaving their laptop in the boots of their cars. It’s convenient.So backing up is vital. For any important data you need to make multiple copies, on different devices and in different physical locations.And having a backup is no good if it’s the version from 2009. And humans aren’t good at remembering to do things regularly, so it should be an automated system. Luckily there are lots of these around. The system I use at home checks my computer every hour and copies any changes onto a separate hard drive over the wireless network.This system also does basic version control. I can go back in time and check what my file looked like yesterday, or last week, or last year.Version control on a collaborative project will track exactly which words were changed, who changed them, and what time to the second was it changed.(You can see how this works on Wikipedia’s “history” tabs, for one example.)
While you’re working on a research project you need to prevent unauthorised use of your data.You want to make sure no-one else scoops you, especially if your research might lead to a patent.And storing data on a US-based cloud service like Amazon or Google Drive means it’s subject to US-based laws like the Patriot Act. Depending on your research, this potential risk to participants’ privacy mightn’t be acceptable.But at the same time you need to be able to easily access and manipulate your data, and it needs to be easy to work with collaborators as well, whether they’re on campus, off campus, or on the other side of the world.Sothisisanotherthing RDM systemsneed to take intoaccount.
Researchers love the idea of being able to find other people’s data. Sharing your own data is more intimidating:Will people take your work without credit?How will you protect your IP?What if they misuse the data?So I think it’s worth talking instead about publishing data, just like you publish an article. In fact there are data journals which accept datasets just like other journals accept papers. Another option is to publish in a discipline-specific repository – or universities are starting now to create data repositories analogous to the research archive, and that’s something we’ll probably be investigating.So there are times you don’t want to publish, or don’t want to publish yet. When you do publish, you need to have considered the things we’ve been talking about - ethical and legal issues, and documenting it so readers can use it.And people who’ve done this are finding there are lots of benefits to publication:Lots of journals and funders require you to anywayPeople can cite your data – and papers with openly published datasets associated with them tend to get cited more oftenOthers can build on it in ways you didn’t expect, and you can build on their work in turn
Even if you don’t publish your data you may need to archive it securely for some period of time.It may be needed to support IP claims if you want to file for a patent.And it may be needed in case someone challenges your research results.If you’re storing it long-term, you need to think about how. You don’t want to end up with a bunch of 5” floppy disks and nothing that can read them.Where/how is it physically stored? (Again, this is something that LTL will be looking into.)What file format are you using? (Obscure and/or proprietary, or common and/or open?)If file formats change, who’s going to convert your old file to the new format?And again documentation is important so people can still understand the data even after the research project members have all moved on.
And finally, at the centre of the model is training. We all need this.Undergraduates don’t need to have all the skills but there are some core competencies we hope can be included in the new curriculum with the qualification reform.We also hope to work with the Research Committee and develop more advanced workshops for postgrad students and research staff.And if LTL is going to support RDM with training, advice and systems, we need to have a good understanding of RDM ourselves.So obviously this presentation isn’t going to make experts out of us, but hopefully at least it’s given us some shared background that we can use as a foundation to build on.
Research data management: a brief introduction
Research DataManagement:a brief introductionDeborah Fitchett 15 Feb 2013