Introduction to Research Data Management - 2014-02-26 - Mathematical, Physical and Life Sciences Division, University of Oxford
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Introduction to Research Data Management - 2014-02-26 - Mathematical, Physical and Life Sciences Division, University of Oxford

  • 252 views
Uploaded on

This slideshow was used in an Introduction to Research Data Management course taught for the Mathematical, Physical and Life Sciences Division, University of Oxford, on 2014-02-26. It provides an......

This slideshow was used in an Introduction to Research Data Management course taught for the Mathematical, Physical and Life Sciences Division, University of Oxford, on 2014-02-26. It provides an overview of some key issues, looking at both day-to-day data management, and longer term issues, including sharing, and curation.

More in: Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
252
On Slideshare
252
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
1
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • The first question to address is what the term ‘data’ actually refers to. Definitions vary, and to some extent, what counts as data will depend on the field of study. For many people, their initial association with the word ‘data’ will be numerical information (statistics, spreadsheets, or experimental results, for example), or perhaps the contents of highly structured information sources such as relational databases.However, data is far from being limited to these. Other examples include:Textual sources (literary or historical works that are being analysed, or interview transcripts)Websites (including all sorts of sites such as social media sites, as well as established academic sources)Works of art and other imagesAudio files (e.g. oral history, recordings of interviews or focus groups)VideosEmailsComputer source codeBooksPapersCatalogues, concordances and indexes The Digital Curation Centre suggests that data is “A reinterpretable representation of information in a formalized manner suitable for communication, interpretation, or processing.”Image montage adapted from PrePARe Project slideshow ‘What is data?”: http://www.lib.cam.ac.uk/dataman/training.html
  • A very broad definition – such as ‘any information you use in your research’ – works well for thinking about data management: it helps make sure you don’t miss out something important!Whatever your area of research is, you will be dealing with data in one form or another. Bear in mind that not all data is digital: print resources, handwritten notes, tape recordings, and hard copies of images may also be important sources.In addition to the data you collect or generate and analyse as part of a research project, it’s also worth thinking about the data you will create. This might include very structured collections of information, such as a relational database – or it might be something much more informal, such as a file of your own notes, summaries you create for your own reference, or a list of items to be examined.Image montage adapted from PrePARe Project slideshow ‘What is data?”: http://www.lib.cam.ac.uk/dataman/training.html
  • Discussion exercise for small groups.The length of this exercise can be varied depending on the time available – the idea is just to get participants to introduce themselves, and get them thinking about their own data.
  • Most of us find that we have many calls on our time, and that packing everything that needs to be done into the week is often a challenge. That being the case, it’s easy to feel as though research data management is simply one more thing to add to an already endless to-do list – or worse, that it’s a distraction from real work. However, there are a number of key reasons that it’s worth paying some attention to it.Good data management does require an investment of effort – but ultimately it’s something that can actually save you time, by helping you work more efficiently. You want to complete your research project to the best of your ability, but with minimum stress – and good research data management is one of the tools that can help you to do that.Many of us are all too well acquainted with the frustration of trying to track down a fact or a document we know we have somewhere. Setting up an organizational system that works for you, and ensuring everything is properly filed or labelled to enable re-identification and retrieval – can make life a lot easier. And it’s not just a matter of saving time and reducing unnecessary effort (though clearly that’s a major benefit): having everything well ordered can also help you get a better feel of the shape and scope of your research material, which in turn can enable you to spot patterns or connections that might otherwise get missed.As well as this being true for your own research, the data might ultimately be of use to other researchers. Having everything well organized and properly labelled also has the potential to save you a lot of time at the end of a research project, when it comes to deciding what to do with your data – but more of that later.Oxford now has a policy on data management, which sets out the various responsibilities of researchers working in the University.There may also be requirements imposed by your funding body which you need to meet.Image credit: Microsoft clip art.
  • As of summer 2012, the University of Oxford has an official policy on the management of research data and records
  • Note that the policy uses a specific definition of research data as the information that supports or validates research outputs. The policy only applies explicitly to data in this category – however, it’s still well worth thinking about the management of data construed more broadly, both from the perspective of making life easier for yourself, and because you may produce data that isn’t needed to back up an output from this particular project, but which nevertheless might be of use if shared with other researchers.The policy outlines two broad types of responsibility that researchers haveThe first of these is about data integrity – data should be correct and well storedThe second is about data sharing – as far as is reasonably possible, data should be made available for other to use
  • The key question for day-to-day data management is whether you can locate the material you want quickly and easily. It’s important to have a good system in place for dealing with new data when you acquire it – so you know where and how to store it so you’ll be able to retrieve it again without difficulty.
  • By default, most operating systems will organize things in a hierarchical file structure – files inside folders, which may be nested inside other folders. This great if your material can easily be grouped into relatively discrete categories. In planning a hierarchical folder structure, aim for a balance between breadth and depth – so no one category gets too big, but also so that you don’t have to click through endless folders to find a file. In some cases, it may be more helpful to use a tag-based system – where each file is assigned one or more tags, or labels. This makes it easier to have overlapping categories, and files can be categorised in multiple ways simultaneously (by subject, by author, and by the project it relates to, for example). Some modern operating systems will allow you to add tags to files; file tagging software is also available. Sometimes it can be quicker to find a file using the desktop search function rather than to look through your folder or tag structure. Windows and Mac both have decent in-built search utilities.It’s also worth taking time every now and then to reassess your folder or tag structure, perhaps moving old, unused items to a folder called ‘Archive’ or something similar so they don’t clutter up the screen.
  • More recent versions of Windows and Mac OS allow you to add tags to files as you save them. These can be used to retrieve the file at a later point.
  • Even within a hierarchical structure, there are ways of linking relating material.Hyperlinks can be used to link to another file on your computer (or a particular place within a file). So you could, for example, create a document listing all the data files which relate to a particular project, with some notes about them, and add hyperlinks to each data file so you can open them from within the document.If you want to be able to put a file in multiple places without duplicating it, try using a shortcut. Recognizable by the small curved arrow on the icon, these allow you to open a file that’s stored elsewhere on your computer.One use of this is to create project folders. If you have a collection of material which is relevant to a particular piece of work – a conference presentation, for example – but which is scattered around your file system because it also relates to other projects, you can create a shortcut to each file, and group these together in a project folder. You’ll have a quick way to access everything you need for that piece of work, without disturbing your original arrangement of material.
  • An ideal filename is concise yet informative. Ideally, you should be able to tell what’s in a file without opening itThe order of elements in a filename will also usually make a difference to the order of files within a folder, so a bit of planning can help ensure similar items are grouped together. Using the year-month-date format at the beginning of a filename makes it easy to sort files into chronological order. (The date that a file was created and last edited will often be recorded automatically, but you may sometimes want to associate a file with a date that is neither of these (e.g. when a particular meeting happened).)You can also force a particular order by adding a number to the start of a filename, or by adding a leading underscore to a file you want to appear at the top of the list. Filenames can also be used to record version information, so you can be sure you’re using the most recent one
  • It’s worth taking some time now and then to assess whether your current methods of handling information are meeting your needs.It can be tempting to stick with a software package because you’re familiar with it and don’t have to spend time learning something new, but if it doesn’t do what you need it to (or doesn’t do it easily), this is likely to cost you time (and cause additional hassle and frustration) in the medium and long term.One good way of finding out about new ways of working is by asking friends and colleagues for their recommendations. What do they use for similar tasks? How helpful do they find it?Image credit: Microsoft clip art.
  • The Research Skills Toolkit website provides an overview of lots of useful software and services, plus other tools and resources for researchers. It includes a substantial section on managing information. The Toolkit team also holds a series of hands-on workshops each year.The site provides a guide to software, tools, University services, and other things that are useful to know about. There’s a substantial section on information management.The site is hosted on WebLearn, and you’ll need to log in using your SSO credentials – the same username and password you use for Nexus email.
  • The IT Learning Programme offers an extensive range of IT coursesThese cover learning how to use specific pieces of software, IT-related skills (such as database design or programming), and how to make use of new technologies (such as social media or podcasting). Software courses can be a great way of trying out a software package without committing yourself to buying a copy before you’re sure it’s for you.The ITLP Portfolio website offers the course materials which you can use for self-study, and access to a range of other related resources
  • A new University service which will become available later this year is ORDS – the Online Research Database Service. It’s designed to allow academic researchers to create relational databases – so it’s a tool that might be used as an alternative to something like Microsoft Access or FileMaker Pro.The service uses cloud storage – so rather than your database being stored on your own computer, it’s hosted on a server, and you access it via a Web interface. This means you can access it from any computer with Internet access, and also has the advantage of meaning back up is taken care of automatically, without you needing to worry about itThe system is also set up to make collaboration – with people both in and outside Oxford – easy. All members of a project team can access the same version of the database, so there are no worries about whether you’re working with the latest version.If they wish to do so, the service will also allow users to make their databases publicly available. This might happen at the end of a project – or you might want to publish a specific sub-set of the data to accompany a research publication.For the longer term, if ORDS isn’t the most appropriate long-term home for your data, the system will be set up to allow easy transfer to the University’s new data archive (ORA-Data – more of that later) or elsewhere.The system is currently being tested by a small group of early adopters, but will become more widely available later in 2014. ORDS will be a paid-for service – the hope is that people will cost it into a research proposal from the beginning. If you’d be interested in finding out more, please email ords@it.ox.ac.uk
  • Losing crucial research material is the stuff of nightmares… but nightmares come true sometimes. This is a genuine poster from a pub in Cambridge [the picture has only been altered to straighten it, change the contrast to make it easier to read, and remove some of the details, e.g. the address of the pub and the person’s contact information]You might think ‘Ah, but I would take more care of my laptop/external hard-drive/back up disks’, but sometimes things are out of your control – fires, floods, and burglaries can all deprive you of your hard-won research data. Slide adapted from PrePARe Project slideshow “What is data?”: http://www.lib.cam.ac.uk/dataman/training.html
  • Back up is probably one data management thing that most people are aware that they should be doing, or doing better. It’s actually a good idea to have more than one back up copy, particularly of important and/or irreplaceable material; this is part of the LOCKSS principle (Lots Of Copies Keeps Stuff Safe).  It’s also a good idea to keep these copies in different places, for example you might keep a copy of some material in a cloud-based service (WARNING: if your research deals with sensitive data you may not be able to do this), on an external hard-drive or on DVDs/CDs. Consider asking a friend/colleague or family member to look after one copy, or keep one copy at home and one in your office, so your material is physically in separate places. This minimises the risk of data loss in the case of flood, fire or theft. But remember that back-up isn’t the same as preservation – it’s just one aspect of it! If you have made a back-up copy of your data, that means you now have two copies in total to look after. But the good news is that this greatly reduces the risks to your data, and goes a long way to helping it stay safe over time.Slide adapted from PrePARe Project slideshow “Store it Safely”: http://www.lib.cam.ac.uk/dataman/training.htmlImage credits: Microsoft clip art
  • This is the storage and back-up plan of an analytical chemistry postdoc working in Oxford.Storing data on a computer connected to the instruments used to collect it is often a practical and convenient option. However, it places a lot of reliance on one machine, hence it’s important to have proper back-ups. In this case, two types of additional copy are made of most of the material – to DVDs, and to external hard drives attached to desktops. (DVDs are a relatively hardwearing storage solution for medium-term preservation. However, if data is being kept long-term, it’s good practice to refresh the storage media (e.g. by copying the data to new disks) every three years. DVDs may last significantly longer, but unfortunately you generally only find out that something’s gone wrong after the event, when it’s too late to do anything about it.)A shared folder on a departmental server is a handy way of allowing everyone in a particular research group to access files. Using a server which is automatically backed up every day takes a lot of the stress out of keeping data safe – as it doesn’t rely on individual researchers remembering to make a back-up copy.It’s worth remembering that not all data that needs to be taken care of is digital. In this case, hard copies of lab books also make up an important part of the research record.
  • Oxford has a central back-up and archiving service called HFS, provided via IT Services. (You may also sometimes hear people refer to this as TSM – this is the name of the client software used to run back-ups.)The service is free to University staff and postgraduates.You can set up the system to perform automated back-ups of computers connected to the University network (these usually happen overnight). If that’s not convenient, you can run a manual back-up. (If you’ve had trouble with automated back-ups, contact the HFS team and they should be able to help.)Three copies of your data will be made. One of these is stored outside Oxford, so even if there were to be a flood or a fire at IT Services, your data would still be safe.
  • It’s worth thinking about what you’re storing your data on. Storage media don’t last forever – they can degrade over time, get broken, or in the case of small, portable storage solutions like USB drive, simply get lost down the back of the sofa. Ultimately, they may become obsolete – when was the last time you saw a computer with a floppy disc drive?Think also about the file formats you’re using. Software changes rapidly, and over the long term, proprietary formats can also become obsolete: you might find yourself going back to an old file and not being able to open it. Some formats are better because they are open (i.e. not controlled by one particular software company), in widespread use, or conform to standards, and it’s best to use these for the long-term preservation of a file once you’re no longer working on it (even if you use a different format while you’re actually working on the data). Many proprietarysoftware packages have a ‘Save As’ or ‘Export’ function that you can use to make a copy of your data which will be readable by other applications – in plain text or .csv format, for example – though be aware that this probably won’t preserve formatting. Slide adapted from PrePARe Project slideshow “Store it Safely”: http://www.lib.cam.ac.uk/dataman/training.htmlAdditional image credits: CD Rom and floppy disk images are Microsoft clip art
  • Discussion exercise for small groups, or for people to chat about over coffee.The length of this exercise can be varied depending on the time available. If time permits, it may be useful to ask the small groups to feed back to the group as a whole, and in particular to encourage sharing of hints, tips, and solutions to specific problems.
  • Documentation is an umbrella term covering the contextual information that a user would need to make sense of a dataset. Sometimes this will be given at the study level – perhaps a text document that accompanies the data, giving information about when, where, and by whom the study was conducted, what its aims were, the methods used, and so forth. Sometimes it will be at the data level – labels or other information which ensure data can be properly interpreted (this might include giving helpful variable or field names, or including the units for measurements).
  • First of all, because documentation should be thorough it will contain a lot of information that might seem obvious. But will that same information still be obvious in a few months, years, decades, centuries… time?It’s very easy to assume that you will remember everything, but in fact it’s all too easy to lose track of crucial information if it’s not recorded somewhere. Having that information accessible also means that other people can understand what you’ve done and why. It’s important to include context (why you did your research, how it fits into other contemporary research, or follows on from previous work), as well as explaining your methods and analytical techniques. This is related to the next point…Slide adapted from PrePARe Project slideshow “Explain It”: http://www.lib.cam.ac.uk/dataman/training.html
  • By providing documentation, you can provide the methodology of how you generated, collected or produced your data (for example information about collection strategies, interview methods, survey techniques, algorithms, database searches), and how you reached your conclusions from your data (for example any statistical methods you used). This is useful for you if you need to replicate or adapt or re-purpose an aspect of your research method later on.This is important as it means that people can reproduce your research, either to verify your conclusions or as a starting point to develop your work further. In many research groups, this could be a student or post-doc who continues work started by a previous group member. Replicating methodology can also be a useful training tool.Slide adapted from PrePARe Project slideshow “Explain It”: http://www.lib.cam.ac.uk/dataman/training.html
  • Proper documentation helps avoid this sort of situation…From http://twentytwowords.com/scientists-explain-their-processes-with-a-little-too-much-honesty-17-pictures/
  • Possible answers include:Data level:Units for ‘Distance’ and ‘Speed’ columns need to be providedWhat does ‘FTC?’ stand for?More helpful/informative column names generally would be useful – what’s actually being recorded here?Date information could also be provided – when did these experiments take place?Study level:Does the dataset have a title?What is the nature of the experiment being recorded here?Who did the work? When? Where? What were they setting out to achieve?What methods were used?Specifically, what do the four processes (control, wash, rinse, soak) denote?What does the sample name signify? What system was used to generate these strings of numbers?How were missing values indicated within the dataset? (On line 27, there’s a zero in the ‘Speed’ column – does this actually indicate a speed of 0, or is the value here missing?)
  • Slide adapted from PrePARe Project slideshow “Explain It”: http://www.lib.cam.ac.uk/dataman/training.html
  • Metadata is a specific type of documentation – a formal description of a dataset which conforms to a particular structure. One typical use of metadata is to create a catalogue record for a dataset held in an archive.(Note: at time of writing, the data had not actually been published in the BMRB, although the researcher planned to deposit it there after the research based on the data had been published.)The image shows metadata for a dataset from the research project. It follows the Dublin Core metadata standard – a straightforward, widely-used structure which is not tied to any specific discipline. The metadata (in blue in this image) is enclosed in tags, much like HTML. This makes the metadata machine readable – by using a standard set of tags, an automatic system can tell where the information about the title, creator, description and so forth begin and end. Dublin Core is only one of many metadata standards – others may be appropriate in specific disciplines where there is other information (location details, for example) that need to be recorded.(Individual researchers may or may not need to create formal metadata of this sort for a given dataset. However, for all datasets, it’s important that researchers preserve all the contextual information that’s needed to enable proper interpretation of the data. If this is recorded, creating metadata if/when it’s needed should be straightforward.)
  • “The open source ISA metadata tracking tools facilitate standards compliant collection, curation, local management and reuse of datasets in an increasingly diverse set of life-science domains.”Image from http://isa-tools.org/
  • This 18th centurypainting by Maria Cosway is part of a collection on display at Chatsworth House in Derbyshire. The subject is Georgiana Cavendish, Duchess of Devonshire (portrayed by Keira Knightley in the 2008 film The Duchess).It shows her as Diana, the goddess of the moon. Some sources, however, say she’s depicted as Cynthia from Spenser’s Faerie Queene. (At time of writing, the Wikimedia Commons metadata is itself inconsistent: the image title says she’s Diana, but the image description says she’s Cynthia.) In fact, Diana and Cynthia are different names for the same figure, so this isn’t as much of a contradiction as it might appear. However, there’s plenty of potential for confusion here!If you look closely, you can see that Georgiana has six toes. There are various theories about why this is: perhaps she really did have six toes (though there’s a lack of other evidence to support this), perhaps it’s an artistic shorthand hinting that the subject had supernatural abilities or a sixth sense, or perhaps the artist simply couldn’t count! However, no one really knows why: there’s no surviving record of the artist’s intention in giving her subject this unusual feature.A symbolic message, or just a mistake? Without the relevant metadata, we’ll never know.Image credit: Wikimedia Commons: http://commons.wikimedia.org/wiki/File:Georgiana_Cavendish,_Duchess_of_Devonshire_as_Diana.jpg
  • As you’ve probably put a lot of effort into creating data in the course of your research, it’s worth thinking about how that data can be preserved for the long term after your project ends. As mentioned previously, many funders now require this.The best way to do this is to deposit it in an archive or repository. There may be an appropriate archive devoted to data in your discipline. For datasets that don’t have another natural home, Oxford will soon have its own data archive – more of that laterIdeally, data should be made available for others to re-use
  • Sharing data can build your reputation in number of ways. Laying your work open to scrutiny means that you will get credit for high quality research, increased understanding of your methods and allowing your work to be verified by others. Sharing allows you to make a greater contribution to your community – and to be recognized for doing so. It can also help extend your reputation beyond that community.There is also substantial evidence that making your data openly available leads to increased citations – of the datasets themselves, and of the papers or other publications based on the data.Slide adapted from PrePARe Project slideshow “Share It”: http://www.lib.cam.ac.uk/dataman/training.html
  • Sharing your research allows it to be re-used; this might be within your field, for example using the data as a starting point for a complementary study, or as test data for new software and algorithms. It might be useful for teaching purposes. Sharing data means that someone else working in a similar area doesn’t have to waste time duplicating the work you’ve already done. If datasets can be used in multiple research projects, that means the funding that allowed them to be created is being used more effectively – a key reason that many funding bodies are now requiring that data be shared where possible.Data might even be re-used in contexts that can’t currently be envisaged – for example in new developments several years down the line, or in completely different fields. And you will get credit as your work will be cited each time.Slide adapted from PrePARe Project slideshow “Share It”: http://www.lib.cam.ac.uk/dataman/training.html
  • A major change is happening within academia at the moment. Data outputs are being viewed as increasingly important, and this trend is only likely to continue - for example, major journals are increasingly looking to publish (or provide access to) datasets alongside the articles reporting on and interpreting the data.This provides an exciting opportunity for researchers: a chance to be at the forefront of a new movement. It’s well worth embracing this change – if you start getting your data out there in the public sphere now, then you’ll have a headstart.Image credit: Microsoft clip art
  • Figshare is a free online data sharing platform – anyone can upload a dataset (or other research objects, such as charts, visualizations, or posters) and make it available on the Web, with a DOI to enable consistent citation.There are advantages and disadvantages to using a service like Figshare. It’s quick, convenient, and doesn’t require you to meet the sometimes extensive requirements of conventional repositories. On the other hand, your data won’t be as easily discoverable, and because of the lack of quality control, people who do find your data this way won’t have any assurance that the dataset is one they can trust. Figshare may, however, be useful in situationsWhere you don’t have easy access to a conventional repository, or where no suitable repository exists for your discipline.If you need to share some data rapidly – for example, you’re giving a conference presentation and want to be able to make the underlying data available as well. Figshare will let you do this, and you can then reference the data’s DOI in your presentation.
  • Link to video from http://www.youtube.com/watch?v=N2zK3sAtr-4. A data management horror story by Karen Hanson, Alisa Surkis and Karen Yacobucci, of NYU Health Sciences Libraries.This short and entertaining video highlights a few potential pitfalls in the process of data sharing, through a vivid example of how not to do it.
  • In some cases, there may be concerns about sharing data, or reasons why all or part of a dataset needs to be kept private. These may be ethical (the data is confidential), legal (the dataset includes third party material with restrictions on usage), or professional (you intend to publish the results, and don’t want someone to get there first).Image credit: Microsoft clip art
  • You can also redact material, for example 3rd party copyrighted material in a PhD thesis, or place embargoes so that it cannot be accessed for a certain period, for example because of publisher requirements or applying for a patent. Such measures may also be necessary with some confidential information.It’s worth noting that many difficulties or concerns about sharing data can be alleviated by advance planning. For example, ensuring you get proper permissions when data is collected can reduce problems with sharing personal data. If your dataset is a combination of third party data and new material, you may need to have a version of the data where these are kept separate. Proper documentation is also important here: this will help keep track of what you’re allowed to do with data, and what’s happened to it in the course of the project.Slide adapted from PrePARe Project slideshow “Share It”: http://www.lib.cam.ac.uk/dataman/training.html
  • A data management plan is, as the name suggests, a document which outlines how data will be managed over the course of a project.One may be created when a project is still in the initial planning stages, as part of a funding application (this may be a requirement), or when the project is in the process of getting underwayIt’s common for there to be more than one version of a plan: an initial outline might be produced for the funding application, then fleshed out if the application is successfulThe plan gives details of what sort of data the project expects to be dealing with, and what will be done with it. This might include:A description of the type of data that will be used and where it will come from – how it will be created, or where it will be obtained from if pre-existing datasets are being usedHow the data will be stored and kept safe during the projectWhat plans there are for preserving the data after the end of the project, and for sharing it with other researchers
  • Practical exercise which can last a flexible amount of time. The resources available will include David Shotton’s ‘Twenty Questions for Research Data Management’, the DCC’s checklist leaflet, and a very basic data management plan template based on one developed by DataTrain. Participants can make use of whichever of these they find most helpful.If it seems appropriate, this may be followed by a brief discussion session, in which participants are invited to give feedback on their experience of trying to draft a data management plan.
  • The Digital Curation Centre is a national service providing advice and resources to researchers and their institutions. Although their primary focus is (as their name suggests) on longer-term curation and preservation of research data, they offer information relating to the whole data lifecycle.One particularly helpful resource is their online data management planning tool. When building a plan, you can select a template which reflects the requirements of your particular funding body.
  • A final thought on the subject of plans and planning.A research project isn’t – or shouldn’t be – a battle, but President Eisenhower’s words nevertheless have some relevance in this context. It is almost inevitable that unexpected events will arise – it’s very rare that everything goes exactly as anticipated. But although this means you may often have to adapt your plan on the fly, this makes having created a plan in the first place more essential, not less. If you’ve thought through all the relevant issues, you’re less likely to be taken by surprise – and you’ll be better placed to respond when the unexpected does crop up.Public domain image, from http://commons.wikimedia.org/wiki/File:Dwight_D._Eisenhower,_official_Presidential_portrait.jpg
  • ORA-Data (formerly known as DataBank) and DataFinder are two forthcoming University of Oxford services. They will be key parts of a larger research data management infrastructure that the University is in the process of developing. These services are being offered in part to enable researchers to comply with funder requirements and the demands of the new University policy.The launch date of these services is still to be determined: at the moment the plans are being reviewed by the relevant University committees.(The DataFinder screenshots is taken from the development version, and the ORA screenshot from the current repository home page – the final versions will look slightly different. It’s also still possible the names of the services will change.)
  • ORA-Data will be the University of Oxford’s institutional data archive. This is the new name for the planned service formerly known as DataBank.It is intended to provide a long-term preservation option for datasets without another natural home – where, for example, no suitable national or discipline-based repository is available.Once depositing DPhil data becomes a condition of award for the degree, ORA-Data may be a suitable place for some DPhil data to be deposited.DOIs (Digital Object Identifiers) can be assigned to datasets deposited in ORA-Data. A DOI is a unique, permanent identifier for an electronic object such as a document, Web page, or dataset – it can be set to point to wherever the object is currently hosted. This means a DOI can be used to refer to the dataset in publications and so forth, and as long as the DOI metadata is kept updated, it will always send the reader to the right place. (This is preferable to using a URL, as these frequently change.)ORA-Data will operate in parallel with ORA-Publications, which is what the University’s existing archive for research publications will become known as. It will be possible to create a link between a publication in ORA and the underlying datasetResearchers depositing datasets in ORA-Data will have control over the availability of their data. They may choose to make a dataset publicly available, or to embargo it for a fixed period (so, for example, the data might become available a year or three years after being placed in ORA-Data). Sensitive data may be kept hidden permanently; in this case the data owner may choose either to make a record for the data available (so others can see that it exists, and perhaps contact the data owner to ask questions about it), or to make both data and record invisible.
  • DataFinder is a catalogue of datasets held by the University of Oxford and elsewhereDataFinder records will provide information about the nature of the dataset, where it is hosted, and (if details are given by the source) the availability of the data. Records for non-digital data can also be created in DataFinder: in this case, the record will include a description of the data and contact details for the data holder.DataFinder will harvest metadata about datasets from ORA-Data, and from other repositories or data stores that make their metadata available in a suitable form. These include ORDS, the database service mentioned earlier.This means that if a datasets is deposited in ORA-Data, a record for it will automatically be created in DataFinder (unless, of course, the ORA-Data record is set to be invisible).It will also be possible to add records to DataFinder manually, and researchers depositing data elsewhere are strongly encouraged to do this. The aim is for DataFinder to include a comprehensive listing of datasets created or owned by members of the University of Oxford.Once populated, DataFinder will be a substantial resource for researchers who want to find datasets they might be able to reuse in their own research, or who are looking for information about research that has already been conducted.
  • The University of Oxford has a central Research Data Management website, which provides a central information source on this subject. A copy of the University Policy on the Management of Research Data and Records can be downloaded from here.The site was relaunched (with a new URL) in February 2014.
  • IT Services has a team of people who provide support to researchers. They can assist with various aspects of the technical side of a research project throughout the project lifecycle – planning, setting up, doing the work, and what happens at the end of the project. If you need some help setting up a database, building a website, or working out where and how to store your data, the Research Support Team may be able to help.The earlier in the research process you seek advice, the better – preferably while things are still in the planning stages.You can find more information on the team’s website, http://blogs.it.ox.ac.uk/acit-rs-team/about/, or by emailing researchsupport@it.ox.ac.uk
  • Research Data MANTRA is a series of free interactive online training modules covering key research data management issues.The modules are designed for postgraduates and early career researchers. The course describes itself as being particularly geared towards people working in geosciences, social and political sciences, and clinical psychology, but don’t be put off by this – in fact much of the course material is relevant to all research disciplines.

Transcript

  • 1. Introduction to research data management Slides provided by the Research Support Team, IT Services, University of Oxford
  • 2. WHAT IS RESEARCH DATA MANAGEMENT? Introduction to research data management
  • 3. What is data? “A reinterpretable representation of information in a formalized manner suitable for communication, interpretation, or processing.” Digital Curation Centre Slide adapted from the PrePARe Project Introduction to research data management
  • 4. What is data? Any information you use in your research Slide adapted from the PrePARe Project Introduction to research data management
  • 5. Introductions  What sort of data do you use?  Where does it come from?  Are you creating new data?  Are you working with pre-existing data?  Where is your data stored? Introduction to research data management
  • 6. What is data management?  A general term covering how you organize, structure, store, and care for the information used or generated during a research project  How you deal with information on a day-today basis over the lifetime of a project happens to data in the longer term – what you do with it after the project concludes  What Introduction to research data management
  • 7. Carrots and sticks  Work efficiently and  University of Oxford with minimum hassle Policy on the now Management of Research Data and  Save time and avoid problems in the future Records  Make it easy to share  Funding body requirements your data Introduction to research data management
  • 8. University of Oxford policy Introduced July 2012 Introduction to research data management
  • 9. University of Oxford policy    The full policy can be viewed on the University of Oxford Research Data Management website Research data defined as the information needed „to support or validate a research project‟s observations, findings or outputs‟ Research data should be:  Accurate, complete, identifiable, retrievable, and securely stored  Able to be made available to others Introduction to research data management
  • 10. University of Oxford policy  Research data should be retained for „as long as they are of continuing value to the researcher and the wider research community‟ – but a minimum of three years   Specific requirements from funders take precedence Researchers are responsible for:   Planning for the ongoing custodianship of their data   Developing and documenting clear data management procedures Ensuring that legal, ethical, and funding body requirements are met Policy applies to University staff and doctoral students  Depositing relevant research data may ultimately become a condition of award for doctorates Introduction to research data management
  • 11. Funders‟ requirements  Funding bodies are taking an increasing interest in what happens to research data  You may be required to make your data publicly available at the end of a project  Check the small print in your grant conditions  Many funders require a data management plan as part of grant applications  Oxford‟s RDM website provides a summary of requirements Introduction to research data management
  • 12. DAY-TO-DAY DATA MANAGEMENT Introduction to research data management
  • 13. Can you find what you need, when you need it? „What a mess‟ by .pst, via Flickr: http://www.flickr.com/photos/psteichen/3915657914/. Introduction to research data management
  • 14. Hierarchical systems vs. tagging  Hierarchical organization uses nested folders   Default option for most operating systems Tagging allows more flexibility   Some operating systems support tagging   Items can be in multiple categories File tagging software is also available Sort… or search? Introduction to research data management
  • 15. Adding tags in Windows 7 Introduction to research data management
  • 16. Hyperlinks and shortcuts   Hyperlinks are not just for websites – they can also lead to other files on your computer Use shortcuts to avoid duplicating files  Create project folders as an easy way to access related material Introduction to research data management
  • 17. File naming  Aim for concise but informative names  Ideally, you should be able to tell what‟s in a file without opening it  Think about the ordering of elements within a filename  YYYY-MM-DD dates allow chronological sorting  You can force an order by adding a number at the beginning of the name  Consider including version information Introduction to research data management
  • 18. File naming strategies – examples  Order by date:  Order by type: 2013-04-12_analysis_ASPH.xlsx 2013-04-12_raw-data_ASPH.txt Analysis_JARID1A_2013-04-12.xlsx 2012-12-15_analysis_JARID1A.xlsx Raw-data_ASPH_2012-12-15.txt 2012-12-15_raw-data_JARID1A.txt  Analysis_ASPH_2012-12-15.xlsx Raw-data_JARID1A_2013-04-12.txt Order by subject:  Forced order with numbering: ASPH_analysis_2012-12-15.xlsx 01_JARID1A_raw-data_2013-04-12.txt ASPH_raw-data_2012-12-15.txt 02_JARID1A_analysis_2013-04-12.xlsx JARID1A_analysis_2013-04-12.xlsx 03_ASPH_raw-data_2012-12-15.txt JARID1A_raw-data_2013-04-12.txt 04_ASPH_analysis_2012-12-15.xlsx Introduction to research data management
  • 19. File naming strategies – examples In retrospect I am not very happy with the method I used for naming files. The biggest problem was with the newspaper articles I downloaded… I named the files only based on the topic of the article, without mentioning the name of the periodical and the year of publication, which would have been very useful later, when I began writing the thesis. – Doctoral student researching communication history Introduction to research data management
  • 20. Are you using the right tools for the job?  Take time to assess whether your current software and methods are meeting your needs  Sticking with old familiars can be false economy  Ask friends and colleagues for recommendations Introduction to research data management
  • 21. Research Skills Toolkit  Website and handson workshops  A guide to software, University services, and other tools and resources for research  Requires SSO login http://www.skillstoolkit.ox.ac.uk/ Introduction to research data management
  • 22. IT Learning Programme  Over 200 different IT courses  Covering software, skills, and new technologies http://www.oucs.ox.ac.uk/itlp/  ITLP Portfolio offers course materials and other resources http://portfolio.it.ox.ac.uk/ Introduction to research data management
  • 23. ORDS – Online Research Database Service  Specifically designed for academic research data  Cloud-hosted and automatically backed up  Web interface makes collaboration straightforward  If desired, databases can easily be made public  Designed to permit easy archiving  Currently being used by a small group of test users – will become more widely available later in 2014  http://ords.ox.ac.uk/ Introduction to research data management
  • 24. KEEPING YOUR DATA SAFE Introduction to research data management
  • 25. Backing up is easier than replacing lost data… http://blogs.ch.cam.ac.uk/pmr/2011/08/01/why-you-need-a-data-management-plan/ Slide adapted from the PrePARe Project Introduction to research data management
  • 26. Make multiple copies… …and keep them in different places Automate the process if you can Slide adapted from the PrePARe Project Introduction to research data management
  • 27. Example back-up plan     Raw data from instruments are stored on the instrument PC, which is backed up every couple of months to DVDs Much raw data also transferred to desktop computers – usually stored on external hard drives Analysed data (e.g. Excel spreadsheets and PowerPoint files) are stored in a shared folder on a departmental server which is backed up daily Lab books are stored inside the laboratory in locked cupboards Introduction to research data management
  • 28. IT Services: Data Back-up on the HFS HFS is Oxford‟s central back-up and archiving service  Free of charge to University staff and postgraduates  Automated back-ups of machines connected to University network  Copies kept in multiple places  Introduction to research data management
  • 29. Think about your storage media… … and about file formats Slide adapted from the PrePARe Project Introduction to research data management
  • 30. For discussion  What data management challenges have you encountered?  What strategies have you personally found useful?  Be ready to feed back to the group Introduction to research data management
  • 31. DOCUMENTATION AND METADATA Introduction to research data management
  • 32. Documentation and metadata  Documentation is the contextual information required to make data intelligible and aid interpretation A users‟ guide to your data  May  be given at study level or data level Metadata is similar, but usually more structured  Conforms  Machine to set standards readable Introduction to research data management
  • 33. Make material understandable What‟s obvious now might not be in a few months, years, decades… MAKE SURE YOU CAN UNDERSTAND IT LATER Adapted from „Clay Tablets with Linear B Script‟ by Dennis, via Flickr: http://www.flickr.com/photos/archer10/5692813531/ Slide adapted from the PrePARe Project Introduction to research data management
  • 34. Make material verifiable Image by woodleywonderworks , via Flickr: http://www.flickr.com/photos/wwworks/4588700881/ • Detailing your methods helps people understand what you did • Reduces risk of misinterpretation • Helps make your work reproducible • Conclusions can be verified Slide adapted from the PrePARe Project Introduction to research data management
  • 35. Introduction to research data management
  • 36. Exercise  In small groups, look at the sample data sheet  Imagine you have just downloaded this dataset from an archive  What contextual or explanatory information is missing?  What additional documentation would you like to see supplied  At the data level?  At the study level? Introduction to research data management
  • 37. Documentation – what to include • Who created it, when and why • • • • Description of the item Methodology and methods Units of measurement Definitions of jargon, acronyms and code • References to related data Slide adapted from the PrePARe Project Introduction to research data management
  • 38. Metadata – data about data  A formal, structured description of a dataset  Used by archives to create catalogue records Introduction to research data management
  • 39. ISA tools software suite Open source metadata tracking tools for the life sciences http://isa-tools.org/ Introduction to research data management
  • 40. Missing metadata – or the riddle of the sixth toe    This painting shows Georgiana, Duchess of Devonshire as Diana … or maybe Cynthia She has six toes – but no one knows why Public domain image from Wikimedia Commons: http://commons.wikimedia.org/wiki/File:Georgiana_Cavendish,_Duchess_of_Devonshire_as_Diana.jpg Introduction to research data management
  • 41. WHAT HAPPENS AT THE END OF THE PROJECT? Introduction to research data management
  • 42. Data archiving  Data generated during a research project is valuable  Don‟t leave it languishing on your hard drive  Consider depositing it in an archive or repository A number of national disciplinary archives exist  DataBib  Oxford  provides a catalogue: http://databib.org/ will soon have its own data archive If possible, make it available for others to re-use Introduction to research data management
  • 43. Why share data? Reputation  Get credit for high quality research  Recognition for contribution to research community  Open data leads to increased citations  Of the data itself  Of associated papers Slide adapted from the PrePARe Project Introduction to research data management
  • 44. Why share data? Reuse  Reduces duplication of effort  Allows public research funding to be used more effectively  Extend research beyond your discipline  Perhaps into contexts not currently envisaged Slide adapted from the PrePARe Project Introduction to research data management
  • 45. Why share data? Be a trailblazer!  A paradigm shift in how research outputs are viewed is occurring  Data outputs are of increasing importance – and are likely to become even more so   Major journals are increasingly looking to publish datasets alongside articles Be at the forefront of an important shift in the academic world Introduction to research data management
  • 46. Figshare  Free online data sharing platform   Shared research is allocated a DataCite DOI A possible alternative to conventional repositories  If no suitable repository is available  If you need a data sharing solution in a hurry Introduction to research data management
  • 47. Video by NYU Health Sciences Libraries: http://www.youtube.com/watch?v=N2zK3sAtr-4 Introduction to research data management
  • 48. Data sharing – concerns  Ethical concerns  Confidential  Legal concerns  Third  or sensitive data party data Professional concerns  Intended publication  Commercial issues (e.g. patent protection) Introduction to research data management
  • 49. Data sharing – concerns • Redact or embargo if there is good reason • Planning ahead can reduce difficulties Slide adapted from the PrePARe Project Introduction to research data management
  • 50. Data licensing  A licence clarifies the conditions for accessing and making use of a dataset  User knows what‟s allowed without asking further permission  Doesn‟t exclude possibility of specific requests to go beyond the terms of the licence  For databases, structure and content may be covered by separate rights Introduction to research data management
  • 51. Data licences - examples  Creative Common licences   Six different flavours, plus CC0 public domain dedication   Widely used and recognized http://creativecommons.org/ Open Data Commons  Specifically designed for datasets  Recognizes the structure/content distinction  http://opendatacommons.org/ Introduction to research data management
  • 52. Data licensing - guidance  „How to License Research Data‟ A guide from the Digital Curation Centre http://www.dcc.ac.uk/resources/how-guides/license-research-data Introduction to research data management
  • 53. DATA MANAGEMENT PLANNING Introduction to research data management
  • 54. Data management plans  A document which may be created in the early stages of a project  While  An  planning, applying for funding, or setting up initial plan may be expanded later Details plans and expectations for data  Nature of data and its creation or acquisition  Storage and security  Preservation and sharing Introduction to research data management
  • 55. Exercise  Using the resources available, have a go at drafting a data management plan for your own research  If there are questions you can‟t answer at this stage, make a note of  What you need to find out  Decisions you need to make Introduction to research data management
  • 56. Digital Curation Centre  A national service providing advice and resources  Create a data management plan using the DMP online tool http://www.dcc.ac.uk/ https://dmponline.dcc.ac.uk/ Introduction to research data management
  • 57. „In preparing for battle, I have always found that plans are useless but planning is indispensable.‟ Dwight D. Eisenhower Introduction to research data management
  • 58. UNIVERSITY SERVICES Introduction to research data management
  • 59. ORA-Data and DataFinder   Two forthcoming University of Oxford services Launch date TBC Introduction to research data management
  • 60. ORA-Data (formerly DataBank)   University of Oxford‟s institutional data archive Long term preservation for datasets without another natural home    Datasets will be assigned DOIs Will work alongside ORA-Publications to form a composite University archive   In some cases, may a suitable home for DPhil data Possible to link publications and datasets in ORA Depositors can opt to make datasets publicly available, embargoed for a fixed period, or hidden Introduction to research data management
  • 61. DataFinder  A catalogue of datasets   Will harvest metadata from ORA-Data and other compatible data stores    Information on the nature, location, and availability of the data So anything in ORA-Data will have a record in DataFinder Researchers depositing data elsewhere strongly encouraged to add a record to DataFinder Should provide a substantial resource for researchers seeking datasets for reuse Introduction to research data management
  • 62. FURTHER INFORMATION AND RESOURCES Introduction to research data management
  • 63. Research data management website Oxford‟s central advisory website  University policy is available  Questions? Email researchdata @ox.ac.uk  http://researchdata.ox.ac.uk/ Introduction to research data management
  • 64. IT Services: Research Support Team  Can assist with technical aspects of research projects at all stages of the project lifecycle  Help  But  with DMPs, selecting software or storage, etc. the earlier you seek advice, the better For more information, see our website: http://research.it.ox.ac.uk Introduction to research data management
  • 65. Research Data MANTRA  Free online interactive training modules  Aimed at postgraduates and early career researchers http://datalib.edina.ac.uk/mantra/ Introduction to research data management
  • 66. Any questions? Ask now, or email us on researchdata@ox.ac.uk Introduction to research data management
  • 67. Rights and re-use     This slideshow is part of a series of research data management training resources prepared by the DaMaRO Project at the University of Oxford With the exception of clip art used with permission from Microsoft, commercial logos and trademarks, and images credited to other sources, the slideshow is made available under a Creative Commons Attribution Non-Commercial Share-Alike License Parts of this slideshow draw on teaching materials produced by the PrePARe Project, DATUM for Health, and DataTrain Archaeology Within the terms of this licence, we actively encourage sharing, adaptation, and re-use of this material Introduction to research data management