2. Data management planning – for whom?
For yourself – to make it easier to find things back and
know what they are
For your colleagues
For the SUSPLACE program
For your funder
4. What data should you store?
Raw data
Final data
Papers
but also
Intermediate data
Drafts of papers
Methods
Equipment and materials
Research notes
...
5. What do you choose to store?
Everything you need to be able to do your work
Everything your colleagues need to do their work
Everything required by your funding organisation
Everything required by your journal
Everything necessary to reproduce your results
6. Short term storage – what are the issues?
Space
Access
● From where?
● By who?
Versioning
Backups
Finding it again!
7. Storage: where?
Storage
solutions
Advantages Disadvantages Suitable for
Personal computer
/laptop
• Always available
• Portable
• What if it
breaks/is stolen?
• What if you are
ill or away?
Temporary storage
Network drive
Managed file
servers
• Regularly
backed up and
maintained
• Stored securely
• Stored centrally
• Costs
• May not be
accessible from
everywhere/by
everyone
Master copy (if
enough space is
provided)
External storage
devices – USB,
flash etc.
• Low cost
• Portable
• Easily damaged
or lost
• Insecure
Temporary storage
Cloud services –
Dropbox, Figshare,
SkyDrive etc.
• Automatic sync
(some services)
• Easy access
• Is it secure?
• No control over
backup
procedure
Data sharing
Question: are there
agreements for
SUSPLACE?
8. Storage during research: basic tips
Versioning
● use a file in one (online) location as the “master”, and do
all your modifications and processing on copies of that
master
● When you have consolidated your changes and do not
want to lose them, replace the master file by the
consolidated file
● Keep track of ‘milestone files’
9. Folder structure
DO:
Stabile and scalable
Interaction with filenames. Folder? Or element in
filename?
9
Project_Files Pictures
??
UB_users_mktproj_01032015.tif =Projectfile (picture)
Project_Files
Pictures
UB_users_mktproj_20150103.tif =Projectfile (picture)
taken from: Data management Workshop For Researchers
by Tessa Pronk (Utrecht University Library)
If you use for example Atlas.ti or
NVIVO for qualitative data, it takes
care of some of this
10. Folder structure
DO:
Stabile and scalable
Interaction with filenames. Folder? Or element in
filename?
DON'T:
Too flat or deep structure
Folders with overlapping content
10
taken from: Data management Workshop For Researchers
by Tessa Pronk (Utrecht University Library)
11. Example: folder structure
11From: ‘Setting up an Organised Folder Structure for Research Projects’
Posted June 4, 2014 Blog by Nikola Vukovic
don't forget the folder with your
literature (and Endnote or
Mendeley libraries)!
12. Filename conventions
DO:
Note in a separate document what element codes in your
filename mean
Keep short and relevant, about 25 characters.
Go from generic to specific (handy with sorting and
finding)
Use ‘_’ or ‘-’
12
Use fixed elements in your filename:
Version number, date, description content, project
number, name researcher/team.
taken from: Data management Workshop For Researchers
by Tessa Pronk (Utrecht University Library)
13. How would you name the file?
13
?
a. MA_NTC023_20141031.xls
b.MA@NTC#23~20141031.xls
c. MicroArrayData_NetherlandsToxicogenomicsCentreP
roject023_20141031.xls
d.microarrayntc02320141031.xls
e. MA_NTC023_31102014.xls
f. MA/NTC/Project23/OCT31st/data.xls
14. Filename conventions
DO:
Note in a separate document what element codes in your
filename mean
Keep short and relevant, about 25 characters.
Go from generic to specific (handy with sorting and
finding)
Use ‘_’ or ‘-’
14
Use fixed elements in your filename:
Version number, date, description content, project
number, name researcher/team.
taken from: Data management Workshop For Researchers
by Tessa Pronk (Utrecht University Library)
15. Filename conventions
DON'T:
Use special characters (&%$#) or points or whitespace.
Name your files 'new_version' 'newer_version',
'newest_version'.
Duplicate files in different folders
Trust computer-metadata with your file
15
TIP: In most operating systems
‘Batch renaming software’ exist
16. very good vs. less good
16
?
a. MA_NTC023_20141031.xls
b.MA@NTC#23~20141031.xls
c. MicroArrayData_NetherlandsToxicogenomicsCentreP
roject023_20141031.xls
d.microarrayntc02320141031.xls
e. MA_NTC023_31102014.xls
f. MA/NTC/Project23/OCT31st/data.xls
17. Long term or .....
For WUR: contact our data librarian
(datamanagement.support@wur.nl)
● support with storage in DANS-EASY and 3TU
● advice on other repositories
find a suitable discipline-specific repository
● provided by journal (e.g. Dryad)
● search re3data.org
use a free generic repository
● figshare
● Mendeley.Data
● Harvard Dataverse
● Zenodo
17
Help! I need a DOI for my
manuscript!
19. documentation
document your dataset on a project, file and parameter
level
add a readme file
● describe the data that each file contains;
● define column headings and row labels, data codes
(including missing data) and measurement units for
tabular data;
● list whether associated data files are available and if so,
where they're available;
● list whom to contact with questions
describe the data collection process/method in a
methodology file (or refer to the publication)
19
more info
20. For yourself
For data processing and analysis
Help in writing reports and papers
Reference for the future
● Will you still understand it in 2 months, 6 months, 2
years..?
24. Thank you for
your participation!
More info?
Go to: Wageningen UR Data
Management Support Hub
Or contact us via:
datamanagement.support@wur.nl
And say your from WUR-coordinated SUSPLACE
24
27. Example
Study to examine the effects of diet on health
- Conducted over 3 years by 3 researchers – Peter, Lisa
and Anna
There are many ways to organise the data. We will look at
three:
- By researcher
- By year
- By activity
28. Example
It is now the summer holidays in 2016. Peter and Anna
are on holiday, and Lisa has received some urgent
questions from the reviewers. They need to know:
the procedure used to produce the high protein diet
which bureau measured the data
what sort of preprocessing was carried out on the data.
30. Example – Organising by activity
Easy to navigate through, for each question you
quickly find the right folder
- even if you had no prior knowledge.
31. Example – Organising by activity
Still need to do quite a lot of detective work to find the
information
– have to rely on good names, guesswork, and ...
...read through the content of the files.
32. Descriptions and links
Enter a brief description for each activity (folder)
It may help to identify types of files (e.g. dataset,
procedure, sample, document)
Linking to items produced in other activities allows you
to:
● follow the workflow
● reuse items
● avoid problems due to multiple copies
33. Example – Organising by activity plus
descriptions and links
Easy to navigate through, for each question you
quickly find the right folder
- even if you had no prior knowledge.
Descriptions help you to find and understand the
data
Links make the whole process traceable
Editor's Notes
But you don’t want to go too far in what you are storing – takes time and get information overload. So what should you choose?
Increasingly, this data is stored electronically
Note with the cloud – your data is subject to the laws of the country in which the server is physically located. Some cloud providers now offer the option to choose.
Often you will end up using a combination – for example copying the data from the network drive to your laptop to work on at home. Then you end up with different versions!
Milestone files – for example the version which you submitted with a paper, or which you sent to your partners. You could consider using a version control system.
good: a
b: symbols
c: too long
d: hard to distinguish different parts of file name, but everything is there
e: OK, but date not converted to international format, better for sorting
f: folderstructure → year is missing; you need folder structure to understand what is in the file
good: a
b: symbols
c: too long
d: hard to distinguish different parts of file name, but everything is there
e: OK, but date not converted to international format, better for sorting
f: folderstructure → year is missing; you need folder structure to understand what is in the file
Colleague: Research goes in 7 year cycles. Recently we got a request for a new project, and he pulled out a file documenting something almost identical.... from 21 years ago!
Research is ideas, hypotheses, materials, methods, data, conclusions... Some or all of these can be present in a single lab notebook, but they may also be stored in other documents. All are necessary. Your lab notebook should allow another researcher to replicate your work – if they need other information than what is in your lab notebook, then your notebook needs to be linked to this other information.
Need to be able to follow the process to understand the work – so need some structure to see how they relate to one another.
For provenance, you also need to know who did what- this is also useful so that others know who they should come to with questions.
Other project information such as budgets, resource planning etc falls outside this remit, but it may be useful to you or your colleagues (especially the project leader or supervisor) if you can also link to this information.
Whether you use one of these, or something else, or no specialised software at all, one thing remains – the structure has to come from you. Experience with Tiffany users has shown us that this – not inputting data into the software itself – is the hardest part.
This structure is not only useful for others after your research is done, it will help you and your colleagues to find and understand your research. Also, the structure is best created while you are doing the research, not afterwards (although that is also possible, just harder).
Although good tools and software will help you, you don’t need specialised lab notebook software to produce good, well-structured data and documentation. For this example we simply use files stored in folders. With a little time and effort, even such a simple system will help you a great deal.
Organising by time or by person are both very logical ways of organising the data. However both require knowledge of when something was done and by who. They are very inaccessible to anyone who wasn’t involved, and quite awkward for someone who was involved. To understand it you can always read the labnotes of the involved people – how long will this take? Will they be understandable to an outsider? Will they still be understandable to the researcher?
This extra structure and metadata doesn’t demand much extra time.
Much more logical – when you are looking for interesting papers, what do you look at – the date, the person who wrote it – or the title and abstract?