Introduction to research
data management

Slides provided by the Research Support
Team, IT Services, University of Oxford
WHAT IS RESEARCH DATA
MANAGEMENT?
Introduction to research
data management
What is data?
“A reinterpretable representation of information in a formalized
manner suitable for communication, interpre...
What is data?

Any information you use in your
research

Slide adapted from
the PrePARe Project

Introduction to research
...
What is data management?


How you organize, structure, store, and care
for the information used or generated during a
re...
Why spend time and effort on this?






So you can work efficiently and
effectively
Because your data is precious
To ...
University of Oxford policy

Introduced July 2012

Introduction to research
data management
University of Oxford policy





The full policy can be viewed on the University of
Oxford Research Data Management web...
University of Oxford policy


Research data should be retained for „as long as they
are of continuing value to the resear...
Funders‟ requirements


Funding bodies are taking an increasing
interest in what happens to research data
 You may be re...
Thinking ahead is vital




It‟s easy to think of long term data
management as something only relevant
to the end of a p...
DOCUMENTATION AND
METADATA
Introduction to research
data management
Documentation and metadata


Documentation is the contextual information
required to make data intelligible and aid
inter...
Make material understandable

What‟s obvious
now might not
be in a few
months, years,
decades…

MAKE SURE
YOU CAN
UNDERSTA...
Make material verifiable and reusable

Image by woodleywonderworks , via Flickr:
http://www.flickr.com/photos/wwworks/4588...
Documentation – what to include
• Who created it, when and why

•
•
•
•

Description of the item
Methodology and methods
U...
Metadata – data about data


A formal,
structured
description
of a dataset
 Used by
archives
to create
catalogue
records...
Missing metadata – or the riddle of the
sixth toe





This painting shows
Georgiana, Duchess of
Devonshire as Diana
… ...
For discussion


What data management challenges have you
encountered?
 What strategies have you personally found
useful...
WHAT HAPPENS AT THE END
OF THE PROJECT?
Introduction to research
data management
Data archiving


The best way of ensuring long-term
preservation of your data is depositing it in an
archive or repositor...
Why share data? Reputation


Get credit for high quality
research
 Recognition for contribution
to research community
 ...
Why share data? Reuse


Reduces duplication of
effort
 Allows public research
funding to be used more
effectively
 Cont...
Why share data? Be a trailblazer!


A paradigm shift in how research outputs are
viewed is occurring
 Data outputs are o...
Video by NYU Health Sciences Libraries: http://www.youtube.com/watch?v=N2zK3sAtr-4

Introduction to research
data manageme...
Data sharing – concerns


Ethical concerns
 Confidential



Legal concerns
 Third



or sensitive data

party data

P...
Data sharing – concerns
• Redact or embargo if there is good reason
• Planning ahead can reduce difficulties

Slide adapte...
Data licensing


A licence clarifies the conditions for accessing
and making use of a dataset
 User

knows what‟s allowe...
Data licences - examples


Creative Common licences



Six different flavours, plus CC0 public domain dedication




...
Data licensing - guidance


„How to License Research Data‟
A

guide from the Digital Curation Centre

http://www.dcc.ac....
DATA MANAGEMENT
PLANNING
Introduction to research
data management
Data management plans


A document which may be created in the early
stages of a project
 While
 An



planning, apply...
Exercise


Using the resources available, have a go at
drafting a data management plan for your own
research
 If there a...
Digital Curation Centre


A national service
providing advice and
resources
 Create a data
management plan
using the DMP...
„In preparing for
battle, I have always
found that plans are
useless but planning
is indispensable.‟
Dwight D. Eisenhower
...
UNIVERSITY SERVICES
Introduction to research
data management
DataBank and DataFinder



Two forthcoming University of Oxford services
Launch date TBC

Introduction to research
data ...
DataBank



University of Oxford‟s institutional data archive
Long term preservation for datasets without another
natura...
DataFinder


A catalogue of datasets




Will harvest metadata from DataBank and other
compatible data stores





I...
ORDS – Online Research Database
Service


Specifically designed for academic research data
 Cloud-hosted and automatical...
IT Learning Programme


Over 200 different IT
courses
 Covering software, skills,
and new technologies
http://www.oucs.o...
IT Services: Data Back-up on the HFS
HFS is Oxford‟s central back-up and archiving
service
 Free of charge to University ...
IT Services: Research Support Team


Can assist with technical aspects of research
projects at all stages of the project ...
FURTHER INFORMATION AND
RESOURCES
Introduction to research
data management
Research data management website





Oxford‟s central
advisory website
Covers data
management
planning, back-up and
se...
Research Skills Toolkit


Website and handson workshops
 A guide to software,
University services,
and other tools and
r...
Research Data MANTRA


Free online
interactive
training modules
 Aimed at
postgraduates
and early career
researchers
htt...
Any questions?

Ask now, or email us on
researchsupport@it.ox.ac.uk

Introduction to research
data management
Rights and re-use


This slideshow was developed by the IT Services Research
Support Team, University of Oxford, based on...
Upcoming SlideShare
Loading in …5
×

Introduction to Research Data Management - 2014-01-27 - Social Sciences Division, University of Oxford

413 views

Published on

This slideshow was used in an Introduction to Research Data Management course taught in the Social Sciences Division, University of Oxford, on 2014-01-27. It provides an overview of some key issues, focusing on long-term data management, sharing, and curation.

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
413
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • The first question to address is what the term ‘data’ actually refers to. Definitions vary, and to some extent, what counts as data will depend on the field of study. For many people, their initial association with the word ‘data’ will be numerical information (statistics, spreadsheets, or experimental results, for example), or perhaps the contents of highly structured information sources such as relational databases.However, data is far from being limited to these. Other examples include:Textual sources (literary or historical works that are being analysed, or interview transcripts)Websites (including all sorts of sites such as social media sites, as well as established academic sources)Works of art and other imagesAudio files (e.g. oral history, recordings of interviews or focus groups)VideosEmailsComputer source codeBooksPapersCatalogues, concordances and indexes The Digital Curation Centre suggests that data is “A reinterpretable representation of information in a formalized manner suitable for communication, interpretation, or processing.”Image montage adapted from PrePARe Project slideshow ‘What is data?”: http://www.lib.cam.ac.uk/dataman/training.html
  • A very broad definition – such as ‘any information you use in your research’ – works well for thinking about data management: it helps make sure you don’t miss out something important!Whatever your area of research is, you will be dealing with data in one form or another. Bear in mind that not all data is digital: print resources, handwritten notes, tape recordings, and hard copies of images may also be important sources.In addition to the data you collect or generate and analyse as part of a research project, it’s also worth thinking about the data you will create. This might include very structured collections of information, such as a relational database – or it might be something much more informal, such as a file of your own notes, summaries you create for your own reference, or a list of items to be examined.Image montage adapted from PrePARe Project slideshow ‘What is data?”: http://www.lib.cam.ac.uk/dataman/training.html
  • Data management is a broad term covering a lot of aspects of the research process.It has two main strands – how you deal with information on a day-to-day basis during the active phase of a research project, and what happens to it in the longer term. Both of these are important, but because of limited time, we’ll deal mostly with thinking about the second today.
  • Most of us find that we have many calls on our time, and that packing everything that needs to be done into the week is often a challenge. That being the case, it’s easy to feel as though research data management is simply one more thing to add to an already endless to-do list – or worse, that it’s a distraction from real work. However, there are a number of key reasons that it’s worth paying some attention to it.Good data management does require an investment of effort – but ultimately it’s something that can actually save you time, by helping you work more efficiently. You want to complete your research project to the best of your ability, but with minimum stress – and good research data management is one of the tools that can help you to do that.Many of us are all too well acquainted with the frustration of trying to track down a fact or a document we know we have somewhere. Good research data management – setting up an organizational system that works for you, and ensuring everything is properly filed or labelled to enable re-identification and retrieval – can make life a lot easier. And it’s not just a matter of saving time and reducing unnecessary effort (though clearly that’s a major benefit): having everything well ordered can also help you get a better feel of the shape and scope of your research material, which in turn can enable you to spot patterns or connections that might otherwise get missed.It’s also well worth doing, because the data you’re producing or working with is valuableAs well as this being true for your own research, the data might ultimately be of use to other researchers. Having everything well organized and properly labelled also has the potential to save you a lot of time at the end of a research project, when it comes to deciding what to do with your data – but more of that later.Finally, there may be requirements imposed by your funding body and/or the university which you need to meet(The rest of the presentation fleshes out the points on this slide.)
  • As of summer 2012, the University of Oxford has an official policy on the management of research data and records
  • Note that the policy uses a specific definition of research data as the information that supports or validates research outputs. The policy only applies explicitly to data in this category – however, it’s still well worth thinking about the management of data construed more broadly, both from the perspective of making life easier for yourself, and because you may produce data that isn’t needed to back up an output from this particular project, but which nevertheless might be of use if shared with other researchers.The policy outlines two broad types of responsibility that researchers haveThe first of these is about data integrity – data should be correct and well storedThe second is about data sharing – as far as is reasonably possible, data should be made available for other to use
  • It’s easy to put off thinking about long term data management – or even to regard it as something you don’t need to worry about until after a project concludes. However, doing that can often lead to problems - many aspects of data management need planning from the beginning of a project. More about the planning process later.Image credit: Microsoft clip art
  • First of all, because documentation should be thorough it will contain a lot of information that might seem obvious. But will that same information still be obvious in a few months, years, decades, centuries… time?It’s often tempting to assume that you will remember everything, but in fact it’s all too easy to forget crucial information. It also means that other people can understand what you’ve done and why. It’s important to include context (why you did your research, how it fits into other contemporary research, or follows on from previous work), as well as explaining your methods and analytical techniques. This is related to the next point…Slide adapted from PrePARe Project slideshow “Explain It”: http://www.lib.cam.ac.uk/dataman/training.html
  • By providing documentation, you can provide the methodology of how you generated, collected or produced your data (for example information about collection strategies, interview methods, survey techniques, algorithms, database searches), and how you reached your conclusions from your data (for example any statistical methods you used). This isn’t purely for the benefit of other researchers – it may also be useful for you if you need to replicate or adapt or re-purpose an aspect of your research method later on.But it does also mean that people can reproduce your research, either to verify your conclusions or as a starting point to develop your work further. In many research groups, this could be a student or post-doc who continues work started by a previous group member. Replicating methodology can also be a useful training tool.It’s important to ensure all the relevant contextual information is provided to reduce the risk of data being misinterpreted.Additionally, producing good metadata means that it’s easier to find your data, as it highlights the important aspects in a machine-readable way. This makes computer-based searches, whether on your searching your own hard drive or looking for something on a database online, work better for you – they’re more likely to find relevant files and information more quickly.Slide adapted from PrePARe Project slideshow “Explain It”: http://www.lib.cam.ac.uk/dataman/training.html
  • Slide adapted from PrePARe Project slideshow “Explain It”: http://www.lib.cam.ac.uk/dataman/training.html
  • Metadata is a specific type of documentation – a formal description of a dataset which conforms to a particular structure. One typical use of metadata is to create a catalogue record for a dataset held in an archive.The image shows metadata for a dataset from an anthropology project. It follows the Dublin Core metadata standard – a straightforward, widely-used structure which is not tied to any specific discipline. The metadata (in blue in this image) is enclosed in tags, much like HTML. This makes the metadata machine readable – by using a standard set of tags, an automatic system can tell where the information about the title, creator, description and so forth begin and end.Keeping proper records during a research project will make it easy to provide metadata when this is needed.As well as being a requirement for depositing data in many archives, there are benefits to providing clear, comprehensive metadata. For example, it makes it much easier for other researchers to find your data. In turn, that means they’re more likely to reuse it, and give you credit.
  • This 18th centurypainting by Maria Cosway is part of a collection on display at Chatsworth House in Derbyshire. The subject is Georgiana Cavendish, Duchess of Devonshire (portrayed by Keira Knightley in the 2008 film The Duchess).It shows her as Diana, the goddess of the moon. Some sources, however, say she’s depicted as Cynthia from Spenser’s Faerie Queene. (At time of writing, the Wikimedia Commons metadata is itself inconsistent: the image title says she’s Diana, but the image description says she’s Cynthia.) In fact, Diana and Cynthia are different names for the same figure, so this isn’t as much of a contradiction as it might appear. However, there’s plenty of potential for confusion here!If you look closely, you can see that Georgiana has six toes. There are various theories about why this is: perhaps she really did have six toes (though there’s a lack of other evidence to support this), perhaps it’s an artistic shorthand hinting that the subject had supernatural abilities or a sixth sense, or perhaps the artist simply couldn’t count! However, no one really knows why: there’s no surviving record of the artist’s intention in giving her subject this unusual feature.A symbolic message, or just a mistake? Without the relevant metadata, we’ll never know.Image credit: Wikimedia Commons: http://commons.wikimedia.org/wiki/File:Georgiana_Cavendish,_Duchess_of_Devonshire_as_Diana.jpg
  • Discussion exercise for small groups, or for people to chat about during the coffee breakThe length of this exercise can be varied depending on the time available. If time permits, it may be useful to ask the small groups to feed back to the group as a whole, and in particular to encourage sharing of hints, tips, and solutions to specific problems.
  • As you’ve probably put a lot of effort into creating data in the course of your research, it’s worth thinking about how that data can be preserved for the long term after your project ends. As mentioned previously, many funders now require this. The best way to do this is to deposit it in an archive or repository. There may be an appropriate archive devoted to data in your discipline. For the social sciences, a major repository is the UK Data Archive, based at the University of Essex. Their website offers a host of useful information about best practice for data creation, storage, and sharing.For datasets that don’t have another natural home, Oxford will soon have its own data archive – more of that laterIdeally, data should be made available for others to re-use
  • Sharing data can build your reputation in number of ways. Laying your work open to scrutiny means that you will get credit for high quality research, increased understanding of your methods and allowing your work to be verified by others. Sharing allows you to make a greater contribution to your community – and to be recognized for doing so. It can also help extend your reputation beyond that community.There is also substantial evidence that making your data openly available leads to increased citations – of the datasets themselves, and of the papers or other publications based on the data.Slide adapted from PrePARe Project slideshow “Share It”: http://www.lib.cam.ac.uk/dataman/training.html
  • Sharing your research allows it to be re-used; this might be within your field, for example using the data as a starting point for a complementary study, or as test data for new software and algorithms. It might be useful for teaching purposes. Sharing data means that someone else working in a similar area doesn’t have to waste time duplicating the work you’ve already done. If datasets can be used in multiple research projects, that means the funding that allowed them to be created is being used more effectively – a key reason that many funding bodies are now requiring that data be shared where possible.Data might even be re-used in contexts that can’t currently be envisaged – for example in new developments several years down the line, or in completely different fields. And you will get credit as your work will be cited each time.Slide adapted from PrePARe Project slideshow “Share It”: http://www.lib.cam.ac.uk/dataman/training.html
  • A major change is happening within academia at the moment. Data outputs are being viewed as increasingly important, and this trend is only likely to continue - for example, major journals are increasingly looking to publish (or provide access to) datasets alongside the articles reporting on and interpreting the data.This provides an exciting opportunity for researchers: a chance to be at the forefront of a new movement. It’s well worth embracing this change – if you start getting your data out there in the public sphere now, then you’ll have a headstart.Image credit: Microsoft clip art
  • Link to video from http://www.youtube.com/watch?v=N2zK3sAtr-4. A data management horror story by Karen Hanson, Alisa Surkis and Karen Yacobucci, of NYU Health Sciences Libraries.
  • In some cases, there may be concerns about sharing data, or reasons why all or part of a dataset needs to be kept private. These may be ethical (the data is confidential), legal (the dataset includes third party material with restrictions on usage), or professional (you intend to publish the results, and don’t want someone to get there first).Image credit: Microsoft clip art
  • You can also redact material, for example 3rd party copyrighted material in a PhD thesis, or place embargoes so that it cannot be accessed for a certain period, for example because of publisher requirements or applying for a patent. Such measures may also be necessary with some confidential information.It’s worth noting that many difficulties or concerns about sharing data can be alleviated by advance planning. For example, ensuring you get proper permissions when data is collected can reduce problems with sharing personal data. If your dataset is a combination of third party data and new material, you may need to have a version of the data where these are kept separate. Proper documentation is also important here: this will help keep track of what you’re allowed to do with data, and what’s happened to it in the course of the project.Slide adapted from PrePARe Project slideshow “Share It”: http://www.lib.cam.ac.uk/dataman/training.html
  • A data management plan is, as the name suggests, a document which outlines how data will be managed over the course of a project.One may be created when a project is still in the initial planning stages, as part of a funding application (this may be a requirement), or when the project is in the process of getting underwayIt’s common for there to be more than one version of a plan: an initial outline might be produced for the funding application, then fleshed out if the application is successfulThe plan gives details of what sort of data the project expects to be dealing with, and what will be done with it. This might include:A description of the type of data that will be used and where it will come from – how it will be created, or where it will be obtained from if pre-existing datasets are being usedHow the data will be stored and kept safe during the projectWhat plans there are for preserving the data after the end of the project, and for sharing it with other researchers
  • Practical exercise which can last a flexible amount of time. The resources available will include David Shotton’s ‘Twenty Questions for Research Data Management’, the DCC’s checklist leaflet, and a very basic data management plan template based on one developed by DataTrain. Participants can make use of whichever of these they find most helpful.If it seems appropriate, this may be followed by a brief discussion session, in which participants are invited to give feedback on their experience of trying to draft a data management plan.
  • The Digital Curation Centre is a national service providing advice and resources to researchers and their institutions. Although their primary focus is (as their name suggests) on longer-term curation and preservation of research data, they offer information relating to the whole data lifecycle.One particularly helpful resource is their online data management planning tool. When building a plan, you can select a template which reflects the requirements of your particular funding body.
  • A final thought on the subject of plans and planning.A research project isn’t – or shouldn’t be – a battle, but President Eisenhower’s words nevertheless have some relevance in this context. It is almost inevitable that unexpected events will arise – it’s very rare that everything goes exactly as anticipated. But although this means you may often have to adapt your plan on the fly, this makes having created a plan in the first place more essential, not less. If you’ve thought through all the relevant issues, you’re less likely to be taken by surprise – and you’ll be better placed to respond when the unexpected does crop up.Public domain image, from http://commons.wikimedia.org/wiki/File:Dwight_D._Eisenhower,_official_Presidential_portrait.jpg
  • DataBank and DataFinder are two forthcoming University of Oxford services. They will be key parts of a larger research data management infrastructure that the University is in the process of developing. These services are being offered in part to enable researchers to comply with funder requirements and the demands of the new University policy.The launch date of these services is still to be determined: at the moment the plans are being reviewed by the relevant University committees.(These screenshots are taken from the development versions – the final versions will look slightly different. It’s also possible the names of the services will change.)
  • DataBank will be the University of Oxford’s institutional data archiveIt is intended to provide a long-term preservation option for datasets without another natural home – where, for example, no suitable national or discipline-based repository is available.Once depositing DPhil data becomes a condition of award for the degree, DataBank may be a suitable place for some Dphil data to be deposited.DOIs (Digital Object Identifiers) can be assigned to datasets deposited in DataBank. A DOI is a unique, permanent identifier for an electronic object such as a document, Web page, or dataset – it can be set to point to wherever the object is currently hosted. This means a DOI can be used to refer to the dataset in publications and so forth, and as long as the DOI metadata is kept updated, it will always send the reader to the right place. (This is preferable to using a URL, as these frequently change.)DataBank will operate in parallel with ORA, which is the University’s archive for research publications. It will be possible to create a link between a publication in ORA and the underlying dataset in DataBankResearchers depositing datasets in DataBank will have control over the availability of their data. They may choose to make a dataset publicly available, or to embargo it for a fixed period (so, for example, the data might become available a year or three years after being placed in DataBank). Sensitive data may be kept hidden permanently; in this case the data owner may choose either to make a record for the data available (so others can see that it exists, and perhaps contact the data owner to ask questions about it), or to make both data and record invisible.
  • DataFinder is a catalogue of datasets held by the University of Oxford and elsewhereDataFinder records will provide information about the nature of the dataset, where it is hosted, and (if details are given by the source) the availability of the data. Records for non-digital data can also be created in DataFinder: in this case, the record will include a description of the data and contact details for the data holder.DataFinder will harvest metadata about datasets from DataBank, and from other repositories or data stores that make their metadata available in a suitable form. These include ORDS, the database service mentioned earlier.This means that if a datasets is deposited in DataBank, a record for it will automatically be created in DataFinder (unless, of course, the DataBank record is set to be invisible).It will also be possible to add records to DataFinder manually, and researchers depositing data elsewhere are strongly encouraged to do this. The aim is for DataFinder to include a comprehensive listing of datasets created or owned by members of the University of Oxford.Once populated, DataFinder will be a substantial resource for researchers who want to find datasets they might be able to reuse in their own research, or who are looking for information about research that has already been conducted.
  • A new University service which will become available later this year is ORDS – the Online Research Database Service. It’s designed to allow academic researchers to create relational databases – so it’s a tool that might be used as an alternative to something like Microsoft Access or FileMaker Pro.The service uses cloud storage – so rather than your database being stored on your own computer, it’s hosted on a server, and you access it via a Web interface. This means you can access it from any computer with Internet access, and also has the advantage of meaning back up is taken care of automatically, without you needing to worry about itThe system is also set up to make collaboration – with people both in and outside Oxford – easy. All members of a project team can access the same version of the database, so there are no worries about whether you’re working with the latest version.If they wish to do so, the service will also allow users to make their databases publicly available. This might happen at the end of a project – or you might want to publish a specific sub-set of the data to accompany a research publication.For the longer term, if ORDS isn’t the most appropriate long-term home for your data, the system will be set up to allow easy transfer to the University’s new data archive (DataBank – more of that later) or elsewhere.ORDS will ultimately be a paid-for service. The service is currently being tested by a group of early adopters, but will become more widely available later in 2014. If you’d be interested in finding out more, please visit the ORDS website.
  • The IT Learning Programme offers an extensive range of IT coursesThese cover learning how to use specific pieces of software, IT-related skills (such as database design or programming), and how to make use of new technologies (such as social media or podcasting)The ITLP Portfolio website offers the course materials which you can use for self-study, and access to a range of other related resources
  • Oxford has a central back-up and archiving service called HFS, provided via IT Services. (You may also sometimes hear people refer to this as TSM – this is the name of the client software used to run back-ups.)The service is free to University staff and postgraduates.You can set up the system to perform automated back-ups of computers connected to the University network (these usually happen overnight). If that’s not convenient, you can run a manual back-up. (If you’ve had trouble with automated back-ups, contact the HFS team and they should be able to help.)Three copies of your data will be made. One of these is stored outside Oxford, so even if there were to be a flood or a fire at IT Services, your data would still be safe.HFS is the gold standard for back-up, and it’s a good idea to use it if you can. But the really important thing is that you have a good back-up routine of some description – multiple copies, regularly updated, and stored securely in multiple places.
  • IT Services has a team of people who provide support to researchers. They can assist with various aspects of the technical side of a research project throughout the project lifecycle – planning, setting up, doing the work, and what happens at the end of the project. If you need some help setting up a database, building a website, or working out where and how to store your data, the Research Support Team may be able to help.The earlier in the research process you seek advice, the better – preferably while things are still in the planning stages.You can find more information on the team’s website, http://blogs.it.ox.ac.uk/acit-rs-team/about/, or by emailing researchsupport@it.ox.ac.uk
  • The University of Oxford has a central Research Data Management website, which provides a central information source on this subject. A copy of the University Policy on the Management of Research Data and Records can be downloaded from here.At time of writing, the website is being redesigned – the new version should be launched shortly.
  • The Research Skills Toolkit website provides an overview of lots of useful software and services, plus other tools and resources for researchers. It includes a substantial section on managing information. The Toolkit team also holds a series of hands-on workshops each year.The site provides a guide to software, tools, University services, and other things that are useful to know about. There’s a substantial section on information management.The site is hosted on WebLearn, and you’ll need to log in using your SSO credentials – the same username and password you use for Nexus email.
  • Research Data MANTRA is a series of free interactive online training modules covering key research data management issues.The modules are designed for postgraduates and early career researchers. The course describes itself as being particularly geared towards people working in geosciences, social and political sciences, and clinical psychology, but don’t be put off by this – in fact much of the course material is relevant to all research disciplines.
  • Introduction to Research Data Management - 2014-01-27 - Social Sciences Division, University of Oxford

    1. 1. Introduction to research data management Slides provided by the Research Support Team, IT Services, University of Oxford
    2. 2. WHAT IS RESEARCH DATA MANAGEMENT? Introduction to research data management
    3. 3. What is data? “A reinterpretable representation of information in a formalized manner suitable for communication, interpretation, or processing.” Digital Curation Centre Slide adapted from the PrePARe Project Introduction to research data management
    4. 4. What is data? Any information you use in your research Slide adapted from the PrePARe Project Introduction to research data management
    5. 5. What is data management?  How you organize, structure, store, and care for the information used or generated during a research project  It includes:  How you deal with information on a day-to-day basis over the lifetime of a project  What happens to data in the longer term – what you do with it after the project concludes Introduction to research data management
    6. 6. Why spend time and effort on this?     So you can work efficiently and effectively Because your data is precious To enable data re-use and sharing To meet funders‟ and institutional requirements Introduction to research data management
    7. 7. University of Oxford policy Introduced July 2012 Introduction to research data management
    8. 8. University of Oxford policy    The full policy can be viewed on the University of Oxford Research Data Management website Research data is the information needed „to support or validate a research project‟s observations, findings or outputs‟ Research data should be:  Accurate, complete, identifiable, retrievable, and securely stored  Able to be made available to others Introduction to research data management
    9. 9. University of Oxford policy  Research data should be retained for „as long as they are of continuing value to the researcher and the wider research community‟ – but a minimum of three years   Specific requirements from funders take precedence Researchers are responsible for:   Planning for the ongoing custodianship of their data   Developing and documenting clear data management procedures Ensuring that legal, ethical, and funding body requirements are met Policy applies to University staff and doctoral students  Depositing relevant research data may ultimately become a condition of award for doctorates Introduction to research data management
    10. 10. Funders‟ requirements  Funding bodies are taking an increasing interest in what happens to research data  You may be required to make your data publicly available at the end of a project  Check the small print in your grant conditions  Many funders require a data management plan as part of grant applications  Oxford‟s RDM website provides a summary of requirements Introduction to research data management
    11. 11. Thinking ahead is vital   It‟s easy to think of long term data management as something only relevant to the end of a project But many aspects of it need planning from the beginning Introduction to research data management
    12. 12. DOCUMENTATION AND METADATA Introduction to research data management
    13. 13. Documentation and metadata  Documentation is the contextual information required to make data intelligible and aid interpretation A  users‟ guide to your data Metadata is similar, but usually more structured  Conforms  Machine to set standards readable Introduction to research data management
    14. 14. Make material understandable What‟s obvious now might not be in a few months, years, decades… MAKE SURE YOU CAN UNDERSTAND IT LATER Adapted from „Clay Tablets with Linear B Script‟ by Dennis, via Flickr: http://www.flickr.com/photos/archer10/5692813531/ Slide adapted from the PrePARe Project Introduction to research data management
    15. 15. Make material verifiable and reusable Image by woodleywonderworks , via Flickr: http://www.flickr.com/photos/wwworks/4588700881/ • Detailing methods helps people understand what you did • And helps make your work reproducible • Provide context to minimize the risk of misunderstanding or misuse Slide adapted from the PrePARe Project Introduction to research data management
    16. 16. Documentation – what to include • Who created it, when and why • • • • Description of the item Methodology and methods Units of measurement Definitions of jargon, acronyms and code • References to related data Slide adapted from the PrePARe Project Introduction to research data management
    17. 17. Metadata – data about data  A formal, structured description of a dataset  Used by archives to create catalogue records Introduction to research data management
    18. 18. Missing metadata – or the riddle of the sixth toe    This painting shows Georgiana, Duchess of Devonshire as Diana … or maybe Cynthia She has six toes – but no one knows why Public domain image from Wikimedia Commons: http://commons.wikimedia.org/wiki/File:Georgiana_Cavendish,_Duchess_of_Devonshire_as_Diana.jpg Introduction to research data management
    19. 19. For discussion  What data management challenges have you encountered?  What strategies have you personally found useful?  Be ready to feed back to the group Introduction to research data management
    20. 20. WHAT HAPPENS AT THE END OF THE PROJECT? Introduction to research data management
    21. 21. Data archiving  The best way of ensuring long-term preservation of your data is depositing it in an archive or repository   DataBib provides a catalogue: http://databib.org/   A number of national disciplinary archives exist – e.g. the UK Data Archive: http://www.dataarchive.ac.uk/ Oxford will soon have its own data archive If possible, make it available for re-use Introduction to research data management
    22. 22. Why share data? Reputation  Get credit for high quality research  Recognition for contribution to research community  Open data leads to increased citations  Of the data itself  Of associated papers Slide adapted from the PrePARe Project Introduction to research data management
    23. 23. Why share data? Reuse  Reduces duplication of effort  Allows public research funding to be used more effectively  Contexts not currently envisaged  Extend research beyond your discipline Slide adapted from the PrePARe Project Introduction to research data management
    24. 24. Why share data? Be a trailblazer!  A paradigm shift in how research outputs are viewed is occurring  Data outputs are of increasing importance – and are likely to become even more so   Major journals are increasingly looking to publish datasets alongside articles Be at the forefront of an important shift in the academic world Introduction to research data management
    25. 25. Video by NYU Health Sciences Libraries: http://www.youtube.com/watch?v=N2zK3sAtr-4 Introduction to research data management
    26. 26. Data sharing – concerns  Ethical concerns  Confidential  Legal concerns  Third  or sensitive data party data Professional concerns  Intended publication  Commercial issues (e.g. patent protection) Introduction to research data management
    27. 27. Data sharing – concerns • Redact or embargo if there is good reason • Planning ahead can reduce difficulties Slide adapted from the PrePARe Project Introduction to research data management
    28. 28. Data licensing  A licence clarifies the conditions for accessing and making use of a dataset  User knows what‟s allowed without asking further permission  Doesn‟t exclude possibility of specific requests to go beyond the terms of the licence  For databases, structure and content may be covered by separate rights Introduction to research data management
    29. 29. Data licences - examples  Creative Common licences   Six different flavours, plus CC0 public domain dedication   Widely used and recognized http://creativecommons.org/ Open Data Commons  Specifically designed for datasets  Recognizes the structure/content distinction  http://opendatacommons.org/ Introduction to research data management
    30. 30. Data licensing - guidance  „How to License Research Data‟ A guide from the Digital Curation Centre http://www.dcc.ac.uk/resources/how-guides/license-research-data Introduction to research data management
    31. 31. DATA MANAGEMENT PLANNING Introduction to research data management
    32. 32. Data management plans  A document which may be created in the early stages of a project  While  An  planning, applying for funding, or setting up initial plan may be expanded later Details plans and expectations for data  Nature of data and its creation or acquisition  Storage and security  Preservation and sharing Introduction to research data management
    33. 33. Exercise  Using the resources available, have a go at drafting a data management plan for your own research  If there are questions you can‟t answer at this stage, make a note of  What you need to find out  Decisions you need to make Introduction to research data management
    34. 34. Digital Curation Centre  A national service providing advice and resources  Create a data management plan using the DMP online tool http://www.dcc.ac.uk/ https://dmponline.dcc.ac.uk/ Introduction to research data management
    35. 35. „In preparing for battle, I have always found that plans are useless but planning is indispensable.‟ Dwight D. Eisenhower Introduction to research data management
    36. 36. UNIVERSITY SERVICES Introduction to research data management
    37. 37. DataBank and DataFinder   Two forthcoming University of Oxford services Launch date TBC Introduction to research data management
    38. 38. DataBank   University of Oxford‟s institutional data archive Long term preservation for datasets without another natural home    Datasets will be assigned DOIs Will work alongside ORA, the University archive for research publications   In some cases, may a suitable home for DPhil data Possible to link publications in ORA to datasets in DataBank Depositors can opt to make datasets publicly available, embargoed for a fixed period, or hidden Introduction to research data management
    39. 39. DataFinder  A catalogue of datasets   Will harvest metadata from DataBank and other compatible data stores    Information on the nature, location, and availability of the data So anything in DataBank will have a record in DataFinder Researchers depositing data elsewhere strongly encouraged to add a record to DataFinder Should provide a substantial resource for researchers seeking datasets for reuse Introduction to research data management
    40. 40. ORDS – Online Research Database Service  Specifically designed for academic research data  Cloud-hosted and automatically backed up  Web interface makes collaboration straightforward  If desired, databases can easily be made public  Designed to permit easy archiving  Currently being used by a small group of test users – will become more widely available later in 2014  http://ords.ox.ac.uk/ Introduction to research data management
    41. 41. IT Learning Programme  Over 200 different IT courses  Covering software, skills, and new technologies http://www.oucs.ox.ac.uk/itlp/  ITLP Portfolio offers course materials and other resources http://portfolio.it.ox.ac.uk/ Introduction to research data management
    42. 42. IT Services: Data Back-up on the HFS HFS is Oxford‟s central back-up and archiving service  Free of charge to University staff and postgraduates  Automated back-ups of machines connected to University network  Copies kept in multiple places  Introduction to research data management
    43. 43. IT Services: Research Support Team  Can assist with technical aspects of research projects at all stages of the project lifecycle  But the earlier you seek advice, the better  For more information, see our website: http://blogs.it.ox.ac.uk/acit-rs-team/about/  Or email us on researchsupport@it.ox.ac.uk Introduction to research data management
    44. 44. FURTHER INFORMATION AND RESOURCES Introduction to research data management
    45. 45. Research data management website    Oxford‟s central advisory website Covers data management planning, back-up and security, data sharing and archiving, funder requirements, etc. University policy is available http://www.admin.ox.ac.uk/rdm/ Introduction to research data management
    46. 46. Research Skills Toolkit  Website and handson workshops  A guide to software, University services, and other tools and resources for research  Requires SSO login http://www.skillstoolkit.ox.ac.uk/ Introduction to research data management
    47. 47. Research Data MANTRA  Free online interactive training modules  Aimed at postgraduates and early career researchers http://datalib.edina.ac.uk/mantra/ Introduction to research data management
    48. 48. Any questions? Ask now, or email us on researchsupport@it.ox.ac.uk Introduction to research data management
    49. 49. Rights and re-use  This slideshow was developed by the IT Services Research Support Team, University of Oxford, based on a presentation originally prepared as part of the DaMaRO Project  With the exception of clip art used with permission from Microsoft, the slideshow is made available under a Creative Commons Attribution Non-Commercial Share-Alike License  Parts of this slideshow draw on teaching materials produced by the PrePARe Project, DATUM for Health, and DataTrain Archaeology  Within the terms of this licence, we actively encourage sharing, adaptation, and re-use of this material Introduction to research data management

    ×