A basic course on research data management for PhD students. The course consists of 4 parts. The course was given at Eindhoven University of Technology (TUe), 24-01-2017
Measures of Central Tendency: Mean, Median and Mode
A basic course on Research data management, part 1: what and why
1. A basic course on Research data management
part 1: what and why
PROOF course Information Literacy and
Research Data Management
TU/e, 24-01-2017
l.osinski@tue.nl, TU/e IEC/Library
Available under CC BY-SA license, which permits copying
and redistributing the material in any medium or format &
adapting the material for any purpose, provided the original
author and source are credited & you distribute the
adapted material under the same license as the original
2. Research data management [RDM]
what #1
Essence of RDM: “… tracking back to what you did 7
years ago and recovering it (...) immediately in a re-
usable manner.” (Henry Rzepa)
3. Research data management [RDM]
what #2
RDM: caring for your data with the purpose to:
1. share them with others
a. for reasons of reuse - in the same context or in a different context
b. for reasons of reproducibility checks scientific integrity
2. protect their mere existence
RDM = good data practices1,2,3,4,5,6 that make your data understandable, easy
to work with, and available to other scientists
1. Dynamic ecology (2016), Ten commandments for good data management. https://dynamicecology.wordpress.com/2016/08/22/ten-commandments-for-
good-data-management/
2. Borer, E.T., Seabloom, E.W., Jones, M.B., et al. (2009) Some simple guidelines for effective data management, Bulletin of the Ecological Society of America,
90(2), p. 205-214. doi: 10.1890/0012-9623-90.2.205
3. Hook, L.A., Santhana Vannan, S.K., Beaty, T.W. et al. Best practices for preparing environmental data sets to share and archive. Available online
http://daac.ornl.gov/PI/BestPractices-2010.pdf . doi: 10.3334/ORNLDAAC/BestPractices-2010
4. White, E.P., Baldridge, E., Brym, T. et al. (2013) Nine simple ways to make it easier to (re)use your data, Ideas in Ecology and Evolution, 6(2), p. 1-10. doi:
10.4033/iee.2013.6b.6.f
5. Goodman, A., Pepe, A., Blocker, A.W., et al. (2014) Ten simple rules for the care and feeding of scientific data, PLOS Computional Biology, 10(4),
e10033542. doi: 10.1371/journal.pcbi.1003542
6. Sandve, G.K., et. al. (2013), Ten simple rules for reproducible computational research, PLOS Computational Biology, 9(10), e1003285. doi:
10.1371/journal.pcbi.1003285
4. Source: Research Data
Netherlands / Marina Noordegraaf
Topics
1. Research data management [RDM]: what and why
a. data management plan
2. Sharing your data, or making your data findable and
accessible
a. data protection: file naming, organizing data, back up…
b. data sharing: via collaboration platforms, data archives
3. Caring for your data, or making your data re-usable and
interoperable
a. metadata, tidy data, licenses
5. During your research
Because you work together with other researchers collaborative
science
After your research
Because of re-using results: data-driven science open science
Because of scientific integrity: validating data analysis by
reproducibility checks requires data and the code that is used to
clean, process and analyze the data and to produce the final
outputs
Because your data are unique / not easily repeatable (long term
observational data)
Because you benefit from it: increases your visibility and enhances
the trustworthiness / credibility of your research
Why sharing research data? #1
6. Data sharing is increasingly required by:
+ Journals [here, here, here, here]
+ Professional organizations [VSNU, KNAW]
+ Universities, including TU/e
+ Research funders [NWO, ZonMW, EC]
data management plan
Why sharing research data? #2
7. EC: Horizon 2020 #1
Open research data (ORD) pilot
“The ORD pilot aims to improve and maximise access to and re-use of
research data generated by Horizon 2020…”
“The ORD pilot applies primarily to the data needed to validate the results
presented in scientific publications. Other data can also be provided…”
“A data management plan (DMP) is required for all projects participating in
the extended ORD pilot…”
“Participating in the ORD pilot does not necessarily mean opening up all your
research data. Rather, the ORD Pilot follows the principle “as open as possible,
as closed as necessary” and focuses on encouraging sound data management
as an essential part of research best practice.”
8. Source: Research Data Netherlands /
Marina Noordegraaf
EC: Horizon 2020 #2
sound research data management
Sound research data management is data management following
the FAIR principles. All research data should be:
Findable: easy to find by both humans and computer systems;
Accessible: stored for long term with well-defined license and access
conditions (open access when possible);
Interoperable: ready to be combined with other datasets by humans as well as
computer systems;
Reusable: ready to be used for future research and to be processed further
using computational methods.
9. Source: Research Data Netherlands /
Marina Noordegraaf
EC: Horizon 2020 #3
requirements
The conditions set by Horizon 2020 with regard to research data
management, come down to two requirements:
1. Formulate a data management plan, and;
2. Deposit research data in a data repository
10. The DMP is a set of questions along the FAIR guidelines about:
1. The handling of research data during and after the project
2. What data sets the project will collect, process and/or generate
3. Whether and how the data sets will be findable/discoverable, re-useable
and shared/made open access
4. How data will be curated and preserved
5. What measures are taken to safeguard and protect (sensitive) data
EC Horizon 2020 #4
data management plan
DMP template Horizon 2020 (via DMPOnline): recommended but voluntary
DMP template by 4TU.Centre of Research Data
Examples of H2020 DMPs: http://www.dcc.ac.uk/resources/data-
management-plans/guidance-examples
11. Research data management
discussion topics and questions
Storage and back-up
What sort of data do you use? Are you creating new data or are you working with pre-existing
data?
Where do you store your research data? Is there a back-up? Where?
Are data selections made? Not everything is to be stored but…?
Metadata and documentation (information to let you find, use and understand the data)
Do you describe your research data? Who measured or collected what, when, how? Other
context information?
Are you content with the way you document or describe your research data? Do you succeed
in finding the right (version of your) research data?
Can other researchers understand and (re-)use your research data (during and after
research)? Should they be able to?
Access and re-use
Who can access your research data?
What will happen to your research data when you leave TU/e?
Would you consider publishing your research data, i.e. to make them public available?
12. Research data management
which of these statements is true?
Storage and back-up
1. My research data is stored safely and securely, including regular back ups?
Metadata and documentation
2. I keep metadata with my data: who measured/collected what, when, how
Access and re-use
3. My colleagues are able to access and use my data
4. Other researchers are able to access and use my data
5. My nearest colleagues and I are the only ones who can understand my
data
6. Anyone should be able to use my data when I have finished with it
13. Reasons not to share your data
Preparing my data for sharing takes time and effort
But research data management also increases your research efficiency
My data are confidential
But you can anonymize or pseudonymize your data
My data still need to yield publications
But you can publish your data under an embargo and by publishing your data you
establish priority and you can get credits for it
My data can be misused or misinterpret
But the best defense against malicious use is to refer to an archival copy of your
data which is guaranteed exactly as you mean it to be
My data are only interesting for me
But sharing your data may be required by a funder /
journal or your data may be requested to validate your
results
14. 1. Website IEC/Library [TU/e]: https://www.tue.nl/en/university/library/
2. Figshare support, The importance of data management for research: https://youtu.be/Ae205CNrk6w
3. Henry Rzepa, Collaborative FAIR data sharing: http://www.ch.imperial.ac.uk/rzepa/blog/?p=16292
4. Dynamic ecology (2016), ten commandments for good data management.
https://dynamicecology.wordpress.com/2016/08/22/ten-commandments-for-good-data-management/
5. Borer, E.T., Seabloom, E.W., Jones, M.B., et al. (2009) Some simple guidelines for effective data
management, Bulletin of the Ecological Society of America, 90(2), p. 205-214. doi: 10.1890/0012-9623-
90.2.205
6. Hook, L.A., Santhana Vannan, S.K., Beaty, T.W. et al. Best practices for preparing environmental data sets
to share and archive. doi: 10.3334/ORNLDAAC/BestPractices-2010
7. White, E.P., Baldridge, E., Brym, T. et al. (2013) Nine simple ways to make it easier to (re)use your data,
Ideas in Ecology and Evolution, 6(2), p. 1-10. doi: 10.4033/iee.2013.6b.6.f
8. Goodman, A., Pepe, A., Blocker, A.W., et al. (2014) Ten simple rules for the care and feeding of scientific
data, PLOS Computional Biology, 10(4), e10033542. doi: 10.1371/journal.pcbi.1003542
9. Sandve, G.K., et. al. (2013), Ten simple rules for reproducible computational research, PLOS Computational
Biology, 9(10), e1003285. doi: 10.1371/journal.pcbi.1003285
10. Data sharing increases visibility: http://dx.doi.org/10.7717/peerj.175
11. Data sharing enhances trustworthiness: http://dx.doi.org/10.1371/journal.pone.0026828
URL’s of mentioned webpages
in order of appearance #1
15. 12. Data availability policy journals: http://www.nap.edu/openbook.php?record_id=10613&page=33
13. Data availability policy American Economic Review: https://www.aeaweb.org/aer/data.php
15. Data availability policy PLoS: http://journals.plos.org/plosone/s/data-availability
16. Data availability policy Nature: http://www.nature.com/authors/policies/availability.html
17. VSNU Code of Scientific Conduct (Dutch, revision 2014):
http://www.vsnu.nl/files/documenten/Domeinen/Onderzoek/Code_wetenschapsbeoefening_2004_(2014)
.pdf
18. KNAW responsible research data management: https://www.knaw.nl/en/news/publications/responsible-
research-data-management-and-the-prevention-of-scientific-misconduct?set_language=en
19. Radboud University research data policy: http://www.ru.nl/research-information-services/institutional-
policy/policy-research-data-management/
20. TU/e Code of Scientific Conduct: http://www.tue.nl/en/university/about-the-university/integrity/scientific-
integrity/
21. NWO and research data: http://www.nwo.nl/en/policies/open+science/data+management
21. ZonMW Toegang tot data: http://www.zonmw.nl/nl/programmas/programma-detail/toegang-tot-data-
ttdata/algemeen/
22. Horizon 2020 Guidelines on data management:
http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-
mgt_en.pdf
URL’s of mentioned webpages
in order of appearance #2
16. 23. Data management plan Horizon 2020: https://dmponline.dcc.ac.uk/
24. Data management plan template (4TU.ResearchData): http://researchdata.4tu.nl/en/planning-
research/data-management-plan/
25. Emilio M. Bruna (04-09-2014), The opportunity cost of my #OpenScience was 36 hours + $690 (UPDATED) .
http://brunalab.org/blog/2014/09/04/the-opportunity-cost-of-my-openscience-was-35-hours-690/
21. Rouder, Jeffrey N., The what, why, and how of born-open data, Behavior Research Methods, vol. 48(2016),
p. 1062-1069.. http://dx.doi.org/10.3758/s13428-015-0630-z
URL’s of mentioned webpages
in order of appearance #2