SlideShare a Scribd company logo
A basic course on Research data management
part 1: what and why
PROOF course Information Literacy and
Research Data Management
TU/e, 19-09-2017
l.osinski@tue.nl, TU/e IEC/Library
Available under CC BY-SA license, which permits copying
and redistributing the material in any medium or format &
adapting the material for any purpose, provided the original
author and source are credited & you distribute the
adapted material under the same license as the original
Research data management [RDM]
what #1
Essence of RDM: “… tracking back to what you did 7
years ago and recovering it (...) immediately in a re-
usable manner.” (Henry Rzepa)
Research data management [RDM]
what #2
RDM: caring for your data with the purpose to:
1. protect their mere existence: data loss, data authenticity (RDM basics)
2. share them with others
a. for reasons of reuse: in the same context or in a different context; during
research and after research
b. for reasons of reproducibility checks  scientific integrity; data quality
RDM = good data practices1,2,3,4,5,6 that make your data understandable, easy
to work with, and available to other scientists
1. Dynamic ecology (2016), Ten commandments for good data management. https://dynamicecology.wordpress.com/2016/08/22/ten-commandments-for-
good-data-management/
2. Borer, E.T., Seabloom, E.W., Jones, M.B., et al. (2009) Some simple guidelines for effective data management, Bulletin of the Ecological Society of America,
90(2), p. 205-214. doi: 10.1890/0012-9623-90.2.205
3. Hook, L.A., Santhana Vannan, S.K., Beaty, T.W. et al. Best practices for preparing environmental data sets to share and archive. Available online
http://daac.ornl.gov/PI/BestPractices-2010.pdf . doi: 10.3334/ORNLDAAC/BestPractices-2010
4. White, E.P., Baldridge, E., Brym, T. et al. (2013) Nine simple ways to make it easier to (re)use your data, Ideas in Ecology and Evolution, 6(2), p. 1-10. doi:
10.4033/iee.2013.6b.6.f
5. Goodman, A., Pepe, A., Blocker, A.W., et al. (2014) Ten simple rules for the care and feeding of scientific data, PLOS Computional Biology, 10(4),
e10033542. doi: 10.1371/journal.pcbi.1003542
6. Sandve, G.K., et. al. (2013), Ten simple rules for reproducible computational research, PLOS Computational Biology, 9(10), e1003285. doi:
10.1371/journal.pcbi.1003285
Source: Research Data
Netherlands / Marina
Noordegraaf
Outline
1. Research data management [RDM]: what and why
a. data management plan
b. discussion
2. Sharing your data, or making your data findable and accessible
a. data protection: back up, file naming, organizing data
b. data sharing: via collaboration platforms, data archives
3. Caring for your data, or making your data usable and interoperable
a. tidy data
b. metadata/documentation
c. licenses
d. open data formats
 Because you work together with other researchers  collaborative science
 Because of re-using results: data-driven science  open science
 Because of scientific integrity: validating data analysis by reproducibility checks
requires data and the code that is used to clean, process and analyze the data and
to produce the final outputs
Additional reasons
 Because your data are unique / not easily repeatable
(long term observational data)
 Because you benefit from it: increases your visibility and
enhances the trustworthiness / credibility of your
research
Why sharing research data? #1
 Data sharing is increasingly required by:
+ Journals [here, here, here, here]
+ Professional organizations [VSNU, KNAW]
+ Universities, including TU/e
+ Research funders [NWO, ZonMW, EC]
data management plan
Why sharing research data? #2
because you have to…
EC: Horizon 2020 #1
Open research data (ORD) pilot: why?
 “The ORD pilot aims to improve and maximise access to and re-use of
research data generated by Horizon 2020…”
 “The ORD pilot applies primarily to the data needed to validate the results
presented in scientific publications. Other data can also be provided…”
 “A data management plan (DMP) is required for all projects participating in
the extended ORD pilot…”
“Participating in the ORD pilot does not necessarily mean opening up all your
research data. Rather, the ORD Pilot follows the principle “as open as possible,
as closed as necessary” and focuses on encouraging sound data management
as an essential part of research best practice.” (my underlining)
EC: Horizon 2020 #2
how? sound research data management
Sound research data management is data management following
the FAIR principles. All research data should be:
Findable: easy to find by both humans and computer systems;
Accessible: stored for long term with well-defined license and access
conditions (open access when possible);
Interoperable: ready to be combined with other datasets by humans as well as
computer systems;
Reusable: ready to be used for future research and to be processed further
using computational methods.
Source: Research Data Netherlands /
Marina Noordegraaf
EC: Horizon 2020 #3
requirements
The conditions set by Horizon 2020 with regard to research data
management, come down to two requirements:
1. Formulate a data management plan, and;
2. Deposit research data in a data repository
The DMP is a set of questions along the FAIR principles about:
1. What research data sets the project will collect, process and/or generate
2. The handling of these data sets during and after the project
3. Whether and how data sets will be findable/discoverable, re-useable and
shared/made open access
4. How data will be curated and preserved
5. What measures are taken to safeguard and protect (sensitive) data
EC Horizon 2020 #4
data management plan
 DMP template Horizon 2020 (via DMPOnline): recommended but voluntary
 ZonMw template (via DMP online)
 DMP template by 4TU.Centre of Research Data
 Examples of H2020 DMPs: http://www.dcc.ac.uk/resources/data-
management-plans/guidance-examples
Research data management
discussion topics and questions
Storage and back-up
 What sort of data do you use? Are you creating new data or are you working with pre-existing
data?
 Where do you store your research data? Is there a back-up? Where?
 Are data selections made? Not everything is to be stored but…?
Metadata and documentation (information to let you find, use and understand the data)
 Do you describe your research data? Who measured or collected what, when, how? Other
context information?
 Are you content with the way you document or describe your research data? Do you succeed
in finding the right (version of your) research data?
 Can other researchers understand and (re-)use your research data (during and after
research)? Should they be able to?
Access and re-use
 Who can access your research data?
 What will happen to your research data when you leave TU/e?
 Would you consider publishing your research data, i.e. to make them public available?
Research data management
which of these statements is true?
Storage and back-up
1. My research data is stored safely and securely, including regular back ups?
Metadata and documentation
2. I keep metadata with my data: who measured/collected what, when, how
Access and re-use
3. My colleagues are able to access and use my data
4. Other researchers are able to access and use my data
5. My nearest colleagues and I are the only ones who can understand my
data
6. Anyone should be able to use my data when I have finished with it
Reasons not to share your data
 Preparing my data for sharing takes time and effort
But research data management also increases your research efficiency
 My data are confidential
But you can anonymize or pseudonymize your data
 My data still need to yield publications
But you can publish your data under an embargo and by publishing your data you
establish priority and you can get credits for it
 My data can be misused or misinterpret
But the best defense against malicious use is to refer to an archival copy of your
data which is guaranteed exactly as you mean it to be
 My data are only interesting for me
But sharing your data may be required by a funder /
journal or your data may be requested to validate your
results
1. Website IEC/Library [TU/e]: https://www.tue.nl/en/university/library/
2. Figshare support, The importance of data management for research: https://youtu.be/Ae205CNrk6w
3. Henry Rzepa, Collaborative FAIR data sharing: http://www.ch.imperial.ac.uk/rzepa/blog/?p=16292
4. Dynamic ecology (2016), ten commandments for good data management.
https://dynamicecology.wordpress.com/2016/08/22/ten-commandments-for-good-data-management/
5. Borer, E.T., Seabloom, E.W., Jones, M.B., et al. (2009) Some simple guidelines for effective data
management, Bulletin of the Ecological Society of America, 90(2), p. 205-214. doi: 10.1890/0012-9623-
90.2.205
6. Hook, L.A., Santhana Vannan, S.K., Beaty, T.W. et al. Best practices for preparing environmental data sets
to share and archive. doi: 10.3334/ORNLDAAC/BestPractices-2010
7. White, E.P., Baldridge, E., Brym, T. et al. (2013) Nine simple ways to make it easier to (re)use your data,
Ideas in Ecology and Evolution, 6(2), p. 1-10. doi: 10.4033/iee.2013.6b.6.f
8. Goodman, A., Pepe, A., Blocker, A.W., et al. (2014) Ten simple rules for the care and feeding of scientific
data, PLOS Computional Biology, 10(4), e10033542. doi: 10.1371/journal.pcbi.1003542
9. Sandve, G.K., et. al. (2013), Ten simple rules for reproducible computational research, PLOS Computational
Biology, 9(10), e1003285. doi: 10.1371/journal.pcbi.1003285
10. Data sharing increases visibility: http://dx.doi.org/10.7717/peerj.175
11. Data sharing enhances trustworthiness: http://dx.doi.org/10.1371/journal.pone.0026828
URL’s of mentioned webpages
in order of appearance #1
12. Data availability policy journals: http://www.nap.edu/openbook.php?record_id=10613&page=33
13. Data availability policy American Economic Review: https://www.aeaweb.org/aer/data.php
15. Data availability policy PLoS: http://journals.plos.org/plosone/s/data-availability
16. Data availability policy Nature: http://www.nature.com/authors/policies/availability.html
17. VSNU Code of Scientific Conduct (Dutch, revision 2014):
http://www.vsnu.nl/files/documenten/Domeinen/Onderzoek/Code_wetenschapsbeoefening_2004_(2014)
.pdf
18. KNAW responsible research data management: https://www.knaw.nl/en/news/publications/responsible-
research-data-management-and-the-prevention-of-scientific-misconduct?set_language=en
19. Radboud University research data policy: http://www.ru.nl/research-information-services/institutional-
policy/policy-research-data-management/
20. TU/e Code of Scientific Conduct: http://www.tue.nl/en/university/about-the-university/integrity/scientific-
integrity/
21. NWO and research data: http://www.nwo.nl/en/policies/open+science/data+management
21. ZonMW Toegang tot data: https://www.zonmw.nl/en/research-and-results/access-to-data/
22. Horizon 2020 Guidelines on data management:
http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-
mgt_en.pdf
URL’s of mentioned webpages
in order of appearance #2
23. About FAIR: Mons, B. et al., Cloudy, increasingly FAIR: revisiting the FAIR Data guiding principles for the
European Open Science Cloud: http://dx.doi.org/10.3233/ISU-170824
24. Template data management plan Horizon 2020: https://dmponline.dcc.ac.uk/
25. ZonMW data management plan template: https://www.zonmw.nl/en/research-and-results/access-to-
data/format-data-management-plan/
26. Data management plan template (4TU.ResearchData): http://researchdata.4tu.nl/en/planning-
research/data-management-plan/
27. Examples of Horizon 2020 data management plans: http://www.dcc.ac.uk/resources/data-management-
plans/guidance-examples
28. Emilio M. Bruna (04-09-2014), The opportunity cost of my #OpenScience was 36 hours + $690 (UPDATED) .
http://brunalab.org/blog/2014/09/04/the-opportunity-cost-of-my-openscience-was-35-hours-690/
28. Rouder, Jeffrey N., The what, why, and how of born-open data, Behavior Research Methods, vol. 48(2016),
p. 1062-1069. http://dx.doi.org/10.3758/s13428-015-0630-z (see p. 1063: “It was a pain to document the
data; it was a pain to format the data”)
URL’s of mentioned webpages
in order of appearance #2
A basic course on Research data management
part 2: protecting and organizing
your data
PROOF course Information Literacy and
Research Data Management
TU/e, 07-03-2017
l.osinski@tue.nl, TU/e IEC/Library
Available under CC BY-SA license, which permits copying
and redistributing the material in any medium or format &
adapting the material for any purpose, provided the original
author and source are credited & you distribute the
adapted material under the same license as the original
Research data management
 Sharing your data, or making your data findable and accessible
with good data practices
→ protecting your data: back up, access control; file naming, organizing
data, versioning
+ sharing your data via collaboration platforms and archives
 Caring for your data, or making your data usable and
interoperable with good data practices
+ tidy data
+ metadata/documentation
+ licenses
+ open data formats
Research data management
what was it again
Be safe
+ storage, backup  data safety, protecting against loss: use local
ICT infrastructure (departmental servers, including SURFdrive) as
much as possible
+ access control  data security, protecting against unauthorized
use: with DataverseNL for example
Be organized, or: you (and others) should be able to tell what’s in
a file without opening it
+ file-naming, organizing data in folders, versioning
Protecting your data
good data practices during your research
“…we can copy everything and do not manage it well.” (Indra Sihar)
File-naming #1
be consistent and aim for concise but informative names
How you organize and name your files has a big impact on your
ability to find those files later and to understand what they contain.
Good file names are consistent (use file-naming conventions), unique
(distinguishes a file from files with similar subjects as well as different
versions of the file) and meaningful (use descriptive names).
File-naming conventions help you find your data, help others to find
your data and help track which version of a file is most current
 Avoid using special characters in a file name:  / : * ? < > | [ ] & $
 Use hyphens or underscores instead of periods or spaces to
separate logical elements in a file name
 Avoid very long names: usually 25 characters is sufficient length
 Names should include all necessary descriptive information:
initials researcher, project number, procedure/method…
 Names are independent of where it is stored (not the same
names in different folders)
 Include dates (format YYYYMMDD) and a version number on files
 Add a readme.txt to each folder in which the file naming and its
meaning is explained
Source: Best practices for file naming (Stanford University Libraries)
File naming #2
think about the ordering of elements within a filename
 Order by date:
2013-04-12_interview-recording_THD.mp3
2013-04-12_interview-transcript_THD.docx
2012-12-15_interview-recording_MBD.mp3
2012-12-15_interview-transcript_MBD.docx
 Order by subject:
MBD_interview-recording_2012-12-15.mp3
MBD_interview-transcript_2012-12-15.docx
THD_interview-recording_2013-04-12.mp3
THD_interview-transcript_2013-04-12.docx
 Order by type:
Interview-recording_MBD_2012-12-15.mp3
Interview-recording_THD_2013-04-12.mp3
Interview-transcript_MBD_2012-12-15.docx
Interview-transcript_THD_2013-04-12.docx
 Forced order with numbering:
01_THD_interview-recording_2013-04-12.mp3
02_THD_interview-transcript_2013-04-12.docx
03_MBD_interview-recording_2012-12-15.mp3
04_MBD_interview-transcript_2012-12-15.docx
<
File organization
PAGE 2220-9-2017
Beatriz Ramirez, Data management plan for the PhD project:
development and application of a monitoring system to assess the
impacts of climate and land cover changes on eco-hydrological
processes in an eastern Andes catchment area
Source: Haselager, dr. G.J.T.
(Radboud University Nijmegen);
Aken, prof. dr. M.A.G. van (Utrecht
University) (2000): Personality and
Family Relationships. DANS.
http://dx.doi.org/10.17026/dans-
xk5-y7vc .
Organizing your data in folders #1
based on the TIER documentation protocol (http://www.projecttier.org/)
Guiding principles of TIER documentation protocol
1. keep your raw or original data raw
+ save your raw data read-only in its original format in a separate folder
+ make a working copy of your raw data (input data, used for
processing and analysis)
2. keep the command files (files containing code written in the syntax of the
(statistical) software you use for the study) apart from the data
3. keep the analysis files (the fully cleaned and processed data files that you
use to generate the results reported in your paper) in a separate folder
4. store the metadata (codebook, description of variables, etc.) in a separate
folder, apart from the data itself
Organizing your data in folders #2
based on the TIER documentation protocol (http://www.projecttier.org/)
1. Main project folder (name of your research project/working title of your
paper)
1.1. Original data and metadata
1.1.1. Original data
1.1.2. Metadata
1.1.2.1. Supplements
1.2. Processing and analysis files
1.2.1. Importable data files
1.2.2. Command files
1.2.3. Analysis files
1.3. Documents
1.4. Literature
1. Main project folder (name of your research project/working title of your
paper)
1.1. Original data and metadata
1.1.1. Original data (raw data, obtained/gathered data)
Any data that were necessary for any part of the processing and/or
analysis you reported in you paper.
Copies of all your original data files, saved in exactly the format it was
when you first obtained it. The name of the original data file may be
changed
Keep these data read only!
1.1.2. Metadata
1.1.2.1. Supplements
Organizing your data in folders #3
based on the TIER documentation protocol
1. Main project folder (name of your research project/working title of your paper)
1.1. Original data and metadata
1.1.1. Original data
1.1.2. Metadata
The Metadata Guide: document that provides information about each of your
original data files. Applies especially to obtained data files
 A bibliographic citation of the original data files, including the date you
downloaded or obtained the original data files and unique identifiers that
have been assigned to the original data files.
 Information about how to obtain a copy of the original data file
 Whatever additional information to understand and use the data in the
original data file
1.1.2.1. Supplements
Additional information about an original data file that’s not written by
yourself but that is found in existing supplementary documents, such as
users’ guides and code books that accompany the original data file
Organizing your data in folders #4
based on the TIER documentation protocol
Organizing your data in folders #5
based on the TIER documentation protocol
1. Main project folder (name of your research project/working title of your paper)
1.1. Original data and metadata
1.1.1. Original data
1.1.2. Metadata
1.1.2.1. Supplements
1.2. Processing and analysis files
1.2.1. Importable data files (the data you work with, input data, suitable for
processing and analysis)
A corresponding version for each of the original data files. This version can be identical
to the original version, or in some cases it will be a modified version.
For example modifications required to allow your software to read the file (converting
the file to another format, removing unusable data or explanatory notes from a table)
 The original and importable versions of a data file should be given different names
 The importable data file should be as nearly as identical as possible to the original
 The changes you make to your original data files to create the corresponding
importable data files should be described in a Readme file
1.2.2. Command files
1.2.3. Analysis files
Organizing your data in folders #6
based on the TIER documentation protocol
1. Main project folder (name of your research project/working title of your paper)
1.1. Original data and metadata
1.1.1. Original data
1.1.2. Metadata
1.1.2.1. Supplements
1.2. Processing and analysis files
1.2.1. Importable data files
1.2.2. Command files
One or more files containing code written in the syntax of the (statistical) software you use
for the study
 Importing phase: commands to import or read the files and save them in a format that
suits your software
 Processing phase: commands that execute all the processing required to transform the
importable version of your files into the final data files that you will use in your analysis
(i.e. cleaning, recoding, joining two or more data files, dropping variables or cases,
generating new variables)
 Generating the results: commands that open the analysis data file(s), and then
generate the results reported in your paper.
1.2.3. Analysis files
Organizing your data in folders #7
based on the TIER documentation protocol
1. Main project folder (name of your research project/working title of your paper)
1.1. Original data and metadata
1.1.1. Original data
1.1.2. Metadata
1.1.2.1. Supplements
1.2. Processing and analysis files
1.2.1. Importable data files
1.2.2. Command files
1.2.3. Analysis files
 The fully cleaned and processed data files that you use to generate the
results reported in your paper in your paper
 The Data Appendix: codebook for your analysis data files: brief description
of the analysis data file(s), a complete definition of each variable (including
coding and/or units of measurement), the name of the original data files
from which the variable was extracted, the number of valid observations for
the variable, and the number of cases with missing values
Organizing your data in folders #8
based on the TIER documentation protocol
1. Main project folder (name of your research project/working title of your paper)
1.1. Original data and metadata
1.1.1. Original data
1.1.2. Metadata
1.1.2.1. Supplements
1.2. Processing and analysis files
1.2.1. Importable data files
1.2.2. Command files
1.2.3. Analysis files
1.3. Documents
 An electronic copy of your complete final paper
 The Readme-file for your replication documentation
 What statistical software or other computer programs are needed to run the
command files
 Explain the structure of the hierarchy of folders in which the documentation is
stored
 Describe precisely any changes you made to your original data files to create
the corresponding importable data files
 Step-by-step instructions for using your documentation to replicate the
statistical results reported in your paper
1.4. Literature
 Retrieved relevant literature
1. Storage, back up of data: http://www.data-archive.ac.uk/create-manage/storage
2. Local ICT infrastructure: https://intranet.tue.nl/en/university/services/ict-services/ict-service-
catalog/management-services/data-management-storage/ (TU/e intranet)
3. SURFdrive (at TU/e): https://intranet.tue.nl/en/university/services/ict-services/ict-service-
catalog/management-services/data-management-surfdrive
4. DataverseNL: https://dataverse.nl/dvn/
5. Version control: http://www.data-archive.ac.uk/create-manage/format/versions
6. Best practices for file naming: http://library.stanford.edu/research/data-management-services/data-best-
practices/best-practices-file-naming
8. File organization: Haselager, dr. G.J.T. , Aken, prof. dr. M.A.G. van (2000): Personality and Family
Relationships. DANS. http://dx.doi.org/10.17026/dans-xk5-y7vc (Data guide, p. 24-26)
9. Best practices: file names and folder structures (Leiden example):
http://blogs.library.leiden.edu/researchdata/2016/06/03/best-practices-file-names-and-folder-
structures/#more-284
10. Beatriz Ramirez, Data management plan for the PhD project: development and application of a monitoring
system to assess the impacts of climate and land cover changes on eco-hydrological processes in an
eastern Andes catchment area: http://www.wageningenur.nl/web/file?uuid=3f974938-79a0-421f-b1ad-
95eef49d777c&owner=c057b578-4a6a-4449-881b-17fff17e2f1a (see Figure 1 for folder structure)
11. TIER documentation protocol: http://www.projecttier.org/
URL’s of mentioned webpages
in order of appearance
A basic course on Research data management
part 3: sharing your data
PROOF course Information Literacy and
Research Data Management
TU/e, 07-03-2017
l.osinski@tue.nl, TU/e IEC/Library
Available under CC BY-SA license, which permits copying
and redistributing the material in any medium or format &
adapting the material for any purpose, provided the original
author and source are credited & you distribute the
adapted material under the same license as the original
Research data management
 Sharing your data, or making your data findable and accessible
with good data practices
+ protecting your data: back up, access control; file naming, organizing
data, versioning
→ sharing your data via collaboration platforms and archives
 Caring for your data, or making your data usable and
interoperable with good data practices
+ tidy data
+ metadata/documentation
+ licenses
+ open data formats
Research data management
what was it again
During research After researchInstitutionDisciplin
Local
ICT
services
Overview research data sharing
and storage services
Data sharing per se is pretty straightforward
General data sharing platforms:
 SURFdrive [TU/e only]: Dutch academic Dropbox, 100 Gb, maximum data transfer 16 Gb
every TUe employee can use SURFdrive
 Google Drive, Dropbox, Beehub…
DataverseNL [TU/e only]: data sharing platform for active research data [based on Harvard’s
Dataverse Project] where you may:
 store your data in an organized and safe way
 clearly describe your data
 version control of your data
 arrange access to your data
 get recognition for your data
 [collaborate on your data]
Various disciplinary initiatives: Open Science Framework, OpenML, RodRep, CRCNS…
SURF Filesender [secure data transfer up to 500 Gb!, WeTransfer up to 2 Gb]
Sharing your data
collaboration or sharing platforms (during your research)
Storage and backup of data through DANS [Dutch
Archiving and Networking Services]
Data transfer: up to 2 Gb per dataset
Dataverse via 4TU.ResearchData: up to 50 Gb free
How to create an account:
 Go to: https://dataverse.nl/
 Click ‘Log in’ (at the top right); under Institutional account click SURFconext
 Select Eindhoven University of Technology and log on with your TU/e username and
password
 When asked for it, give permission to share your data by answering Yes or click this
Tab
 When asked to create an account, answer Yes or click this Tab.
 When you succeeded to create an account, your username is the prefix of your
email address
You now have a user account with DataverseNL.
If you click 4TU dataverse  Eindhoven dataverse  Add data you can create and
publish data sets, upload files and assign access rights to data sets or files.
However, before you proceed, contact me (for more options) or first use the demo
version: https://act.dataverse.nl
Sharing your data
DataverseNL
If you are interested in using DataverseNL, please contact me (Leon Osinski)
On request
“I'd like to thank E.J. Masicampo and Daniel LaLande for sharing and allowing me to share
their data…”
Daniël Lakens (2014), What p-hacking really looks like: A comment on Masicampo & LaLande (2012)
On a (personal) website
“Let me start by saying that the reason why I put all excel files online, including all the
detailed excel formulas about data constructions and adjustments, is precisely because I
want to promote an open and transparent debate about these important and sensitive
measurement issues.”
Thomas Piketty, My response to the Financial Times, HuffPost The Blog, 29-05-2014 ;
originally published as Addendum: Response to FT, 28-05-2014
A data journal
Journal of open psychology data, Geoscience data journal,
Data in brief, Scientific data, Data reports
Sharing your data
after your research has ended
Source: www.aukeherrema.nl
Choose a repository where other researchers in your discipline are sharing their data, for example
LXcat (for plasma data), TurBase (for turbulence data) or GenBank (for genetic sequence data)
Overview of research data repositories: Re3data.org
Use a repository that at least assigns a persistent identifier to your data (DOI) and requires that
you provide adequate metadata
 General or multidisciplinary repositories: Zenodo, Figshare, DANS, Dryad, B2SHARE
 4TU.ResearchData
+ small medium sized data sets, long tail data
+ static data, ‘frozen’ data sets, ‘milestone’ data sets
+ preferably nonproprietary data formats suitable for long term preservation
+ DOI’s [ persistent identifier for citability and retrievability ]
+ open access
+ long-term availability, Data Seal of Approval
+ Data Citation Index (Thomson Reuters)
+ self-upload (single data sets < 3Gb)
+ special collections of related data sets
Sharing your data
after your research has ended, by publishing and archiving them in an established
repository
Link your data to your publication
Sharing your data
link your data to your publication
1. Overview research data storage and sharing services: http://dataservices.silk.co/
2. DataverseNL: https://www.dataverse.nl/dvn/
3. Harvard’s Dataverse Project: http://dataverse.org/
4. Open Science Framework: https://cos.io/osf/
5. OpenML: http://www.openml.org
6. RodRep: http://www.rodrep.com/
7. CRCNS: http://crcns.org/
8. SURFdrive: https://www.surfdrive.nl/
9. Google Drive: https://www.google.com/drive/
10. Dropbox: https://www.dropbox.com/
11. Beehub: https://beehub.nl/system/
12. SURF filesender: https://filesender.surfnet.nl/
12. Data on request (blog post Daniel Lakens): http://daniellakens.blogspot.nl/2014/09/what-p-hacking-really-
looks-like.html
13. Data on personal website (Thomas Piketty): http://piketty.pse.ens.fr/en/capital21c2
14. Overview of (better known) data journals: http://proj.badc.rl.ac.uk/preparde/blog/DataJournalsList
URL’s of mentioned webpages
in order of appearance #1
15. Data journal: Journal of Open Psychology Data: http://openpsychologydata.metajnl.com/
16. Data journal: Geoscience Data Journal: http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)2049-6060
17. Data journal: Data in brief: http://www.journals.elsevier.com/data-in-brief
18. Data journal: Scientific data: http://www.nature.com/sdata/
19. Data journal: Data reports: http://www.frontiersin.org/news/Data_Reports_a_new_type_of_peer-
reviewed_article_in_Frontiers_journals/1051?utm_source=FRN&utm_medium=ECOM&utm_campaign=T
WT_FRN_1502_datareport
20. Research data catalogue: Re3data.org: http://service.re3data.org/search/results?term=
21. Publishing data: Zenodo: http://www.zenodo.org/
22. Publishing data: Figshare: http://www.figshare.com
23. Publishing data: DANS: http://www.dans.knaw.nl/en
23. Publishing data: Dryad: http://datadryad.org/
24. Publishing data: B2SHARE: https://b2share.eudat.eu/
25. Publishing data: 4TU.ResearchData: https://data.4tu.nl/
26. Long tail research data: http://www.nature.com/neuro/journal/v17/n11/fig_tab/nn.3838_F1.html
URL’s of mentioned webpages
in order of appearance #2
27. Preferred data formats 4TU.ResearchData: http://researchdata.4tu.nl/en/publishing-research/data-
description-and-formats/
28. Data Seal of Approval: http://www.datasealofapproval.org
29. Data Citation Index (Thomson Reuters): http://wokinfo.com/products_tools/multidisciplinary/dci/
30. Self upload 4TU.ResearchData: https://data.4tu.nl/account/login/?next=/upload/
31. Data sets underlying PhD thesis Joos Buijs: http://dx.doi.org/10.4121/uuid:26aba40d-8b2d-435b-b5af-
6d4bfbd7a270
32. PhD thesis Joos Buijs: http://dx.doi.org/10.6100/IR780920
URL’s of mentioned webpages
in order of appearance #3
A basic course on Research data management
part 4: caring for your data, or
making data usable
PROOF course Information Literacy and
Research Data Management
TU/e, 07-03-2017
l.osinski@tue.nl, TU/e IEC/Library
Available under CC BY-SA license, which permits copying
and redistributing the material in any medium or format &
adapting the material for any purpose, provided the original
author and source are credited & you distribute the
adapted material under the same license as the original
Research data management
 Sharing your data, or making your data findable and accessible
with good data practices
+ protecting your data: back up, access control; file naming, organizing
data, versioning
+ sharing your data via collaboration platforms and archives
→ Caring for your data, or making your data usable and
interoperable with good data practices
+ tidy data
+ metadata/documentation
+ licenses
+ open data formats
Research data management
what was it again
Before data can be reusable, it has first to be usable
Tidy data is about structure of a table / data set.
Tidy data ≠ clean data. It’s a step towards clean data
+ Each variable you measure is in one column
+ Column headers are variable names
+ Each observation is in a different row
+ Every cell contains only one piece of information
Tidy data
making your data easy to handle for computers
Tidy data allow your data to be easily:
+ imported by data management systems
+ analyzed by analysis software
+ visualized, modelled, transformed
+ combined with other data (interoperability)
Tidy data
why
Tidy data versus messy data
1. More than one variable in a
single column (‘clumped data’)
2. Column headers are values, or:
one variable over many columns
(‘wide data’)
3. Variables are in rows and
columns
4. More pieces of information in
one cell (cells are highlighted or
colored; values and
measurement units in one cell)
1. Each variable you measure
is in one column
2. Column headers are
variable names
3. Each observation is in a
different row
4. Every cell contains only one
piece of information
Tidy data Messy data
patient_id drug_a drug_b
1 67 56
2 80 90
3 64 50
4 85 75
Tidy data versus messy data
example
‘Wide’ data: one variable
over many columns Tidy data
patient_id drug heart_rate
1 a 67
2 a 80
3 a 64
4 a 85
1 b 56
2 b 90
3 b 50
4 b 75
What is the nature of the “unusual episode” to which this table refers?
What is the nature of the “unusual episode” to which this table refers?
Different columns contain
measurements of the same variable:
easier to read and interpret but
difficult to add data (columns) to the
records (rows)
Class Sex Age Survived Freq
1 1st Male Child No 0
2 2nd Male Child No 0
3 3rd Male Child No 35
4 Crew Male Child No 0
5 1st Female Child No 0
6 2nd Female Child No 0
7 3rd Female Child No 17
8 Crew Female Child No 0
9 1st Male Adult No 118
10 2nd Male Adult No 154
11 3rd Male Adult No 387
12 Crew Male Adult No 670
13 1st Female Adult No 4
14 2nd Female Adult No 13
15 3rd Female Adult No 89
16 Crew Female Adult No 3
17 1st Male Child Yes 5
18 2nd Male Child Yes 11
19 3rd Male Child Yes 13
20 Crew Male Child Yes 0
21 1st Female Child Yes 1
22 2nd Female Child Yes 13
The same data in a tidy structure (variables
in columns and observations in rows)
“The problem is that people like to view data in a totally different way than
a computer likes to process it.” (Kien Leong)
Tools for tidying data
OpenRefine
 download OpenRefine: http://openrefine.org/download.html
 runs on your computer (not in the cloud), inside the Firefox browser (not in
IE), no web connection is needed
 captures all steps done to your raw data ; original dataset is not modified;
steps are easily reversed;
R, TidyR package
 scripted language (R (free), Matlab, SAS…) to process data (tidying,
cleaning, etc.), run the analysis and to produce final outputs
versus
 Excel: data provenance and documentation of data processing with a
graphical user interface is bad because it doesn’t leaves a record
The table or data set itself
+ columns: use clear, descriptive variable names (no hard to
understand abbreviations), avoid special characters (can cause
problems with some software)
+ rows: if possible, use standard names within cells (derived
from a taxonomy, for example: standard species name, CAS
registry for chemical substances, standard date formats, …)
+ try to avoid coding categorical or ordinal data as numbers
+ missing data: use NA
Documentation / metadata
making your data understandable for humans #1
The table or data set as a whole
A description (documentation) that at least mentions:
+ size of the data set: number of observations and variables
+ information about the variables and its measurement units
(code book)
+ what’s included and excluded in the data set, why data are
missing
+ description of how you collected the data (study design), data
manipulation steps (provenance)
+ when your data consists of multiple files organized in a folder
structure, an explanation of the structure and naming of the
files
Documentation / metadata
making your data understandable for humans #2
“Research outputs that are poorly documented are like canned goods with the label
removed (…)” (Carly Strasser)
Documentation / metadata
metadata standards
Sometimes there are metadata standards for the
documentation of your data set but where no standard
exists, a simple readme file can be good enough
Raw data:
https://www.amstat.org/publicatio
ns/jse/datasets/titanic.dat.txt
Documentation accompanying the
data:
https://www.amstat.org/publicatio
ns/jse/datasets/titanic.txt
 Size (number of observations
and variables)
 Description
 Provenance
 Variable descriptions
Based on:
The "Unusual Episode" Data
Revisited / by Robert J. MacG.
Dawson, in: Journal of Statistics
Education vol. 3(1995), issue 3
1. Morphological
Measurements of Galapagos
Finches
http://dx.doi.org/10.5061/dry
ad.152
 Use of standard names
(taxonomy, species)
 Variable names clear
enough? WingL must be
wing length but what is
N.Ubkl?
 Units of measurement?
Based on:
Looking after datasets / by
Antony Unwin, 01-09-2015,
http://blog.revolutionanalytics
.com/2015/09/looking-after-
datasets.html
Documentation / metadata
making your data findable for humans and search engines
Descriptive metadata for discovery and identification of
your data mainly
+ creator
+ title
+ short description + key words
+ date(s) of data collection
+ publication year
+ related publications
+ DOI (assigned by data archive)
+ etc.
When uploading your
data in a data archive
like 4TU.ResearchData,
you will be asked to
enter these metadata
A DOI is assigned by
the data archive
User license
making clear that other people are allowed to use your data
Let other people know in advance what they are
allowed to do with your data by attaching a user license
to it
+ Creative Commons license for data sets
+ GNU General Public License (GPL) for software
+ License selector
Open data formats
ensuring the ‘longevity’ of your data
+ with open (non-proprietary) data formats it is best
ensured that the data will remain usable and ‘legible’
for computers in the future
+ are easy to use in a variety of software, like .csv for
tabular data
+ check the data formats that are supported by a data
archive like 4TU.ResearchData
Usable data
recommended reading
These 3 papers give a good summary of this module
+ Eugene Barsky (2017), Good enough research data
management: a very brief guide
+ Shannon E. Ellies, Jeffrey T. Leek (2017), How to share
data for collaboration
+ Greg Wilson, et al. (2017), Good enough practices in
scientific computing
Data Coach [ website ]
TU/e data librarians (rdmsupport@tue.nl)
Leon Osinski, Sjef Öllers
Recommended reading
Van den Eynden, Veerle e.a. (2011), Managing and sharing data: best
practice for researchers, UK Data Archive
Strasser, Carly (2015), Research data management, NISO
Recommended online course
Essentials 4 data support [English & Dutch]
Support
1. Tidy data: https://www.jstatsoft.org/article/view/v059i10
2. The “Unusual Episode Data“ revisited:
https://www.amstat.org/publications/jse/v3n3/datasets.dawson.html
3. OpenRefine: http://openrefine.org
4. TidyR: http://tidyr.tidyverse.org/
5. R: https://www.r-project.org/
6. Metadata standards: http://rd-alliance.github.io/metadata-directory/
7. Raw Titanic data: https://www.amstat.org/publications/jse/datasets/titanic.dat.txt
8. Documentation to Titanic data: https://www.amstat.org/publications/jse/datasets/titanic.txt
9. Morphological Measurements of Galapagos Finches: http://dx.doi.org/10.5061/dryad.152
10. Looking after data sets: http://blog.revolutionanalytics.com/2015/09/looking-after-datasets.html
11. Descriptive metadata 4TU.ResearchData: http://researchdata.4tu.nl/en/publishing-research/uploading-
data/
12. Creative Commons licenses: https://creativecommons.org/
13. GNU General Public License: https://www.gnu.org/licenses/gpl-3.0.en.html
URL’s of mentioned webpages
in order of appearance #1
14. License selector: https://ufal.github.io/public-license-selector/
15. Preferred data formats of 4TU.ResearchData: http://researchdata.4tu.nl/en/publishing-research/data-
description-and-formats/
16. Eugene Barsky (2017), Good enough research data management: a very brief guide
17. Shannon E. Ellies, Jeffrey T. Leek (2017), How to share data for collaboration
18. Greg Wilson, et al. (2017), Good enough practices in scientific computing
19. TU/e Data Coach: http://www.tue.nl/datacoach
20. Van den Eynden, Veerle e.a. (2011), Managing and sharing data: best practice for researchers, UK Data
Archive
21. Carly Strasser, Research data management:
http://www.niso.org/apps/group_public/download.php/15375/PrimerRDM-2015-0727.pdf
22. Online course ‘Essentials for data support’: http://datasupport.researchdata.nl/en/
URL’s of mentioned webpages
in order of appearance #2

More Related Content

What's hot

MANTRA Research Data Lifecycle
MANTRA Research Data LifecycleMANTRA Research Data Lifecycle
MANTRA Research Data Lifecycle
EDINA, University of Edinburgh
 
Managing data throughout the research lifecycle
Managing data throughout the research lifecycleManaging data throughout the research lifecycle
Managing data throughout the research lifecycle
Marieke Guy
 
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Natsuko Nicholls
 
Research data life cycle
Research data life cycleResearch data life cycle
Research data life cycle
University of Arizona
 
A basic course on Research data management, part 4: caring for your data, or ...
A basic course on Research data management, part 4: caring for your data, or ...A basic course on Research data management, part 4: caring for your data, or ...
A basic course on Research data management, part 4: caring for your data, or ...
Leon Osinski
 
Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...
Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...
Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...
Research Support Team, IT Services, University of Oxford
 
Research Data Management and Librarians
Research Data Management and LibrariansResearch Data Management and Librarians
Research Data Management and Librarians
Johann van Wyk
 
The Donders Repository
The Donders RepositoryThe Donders Repository
The Donders Repository
Robert Oostenveld
 
Introduction to data management
Introduction to data managementIntroduction to data management
Introduction to data management
Cunera Buys
 
University of Bath Research Data Management training for researchers
University of Bath Research Data Management training for researchersUniversity of Bath Research Data Management training for researchers
University of Bath Research Data Management training for researchers
Jez Cope
 
Using Open Science to advance science - advancing open data
Using Open Science to advance science - advancing open data Using Open Science to advance science - advancing open data
Using Open Science to advance science - advancing open data
Robert Oostenveld
 
Research Data Management: An Overview - 2014-05-12 - Humanities Division, Uni...
Research Data Management: An Overview - 2014-05-12 - Humanities Division, Uni...Research Data Management: An Overview - 2014-05-12 - Humanities Division, Uni...
Research Data Management: An Overview - 2014-05-12 - Humanities Division, Uni...
Research Support Team, IT Services, University of Oxford
 
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
Amanda Whitmire
 
Research data management : Open Research Data pilot, data management (plans),...
Research data management : Open Research Data pilot, data management (plans),...Research data management : Open Research Data pilot, data management (plans),...
Research data management : Open Research Data pilot, data management (plans),...
Leon Osinski
 
DataONE Education Module 03: Data Management Planning
DataONE Education Module 03: Data Management PlanningDataONE Education Module 03: Data Management Planning
DataONE Education Module 03: Data Management Planning
DataONE
 
Introduction to RDM for Geoscience PhD Students
Introduction to RDM for Geoscience PhD StudentsIntroduction to RDM for Geoscience PhD Students
Introduction to RDM for Geoscience PhD Students
EDINA, University of Edinburgh
 
Data Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach DataData Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach Data
cunera
 
DataONE Education Module 01: Why Data Management?
DataONE Education Module 01: Why Data Management?DataONE Education Module 01: Why Data Management?
DataONE Education Module 01: Why Data Management?
DataONE
 
Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521
Amanda Whitmire
 
The Brain Imaging Data Structure and its use for fNIRS
The Brain Imaging Data Structure and its use for fNIRSThe Brain Imaging Data Structure and its use for fNIRS
The Brain Imaging Data Structure and its use for fNIRS
Robert Oostenveld
 

What's hot (20)

MANTRA Research Data Lifecycle
MANTRA Research Data LifecycleMANTRA Research Data Lifecycle
MANTRA Research Data Lifecycle
 
Managing data throughout the research lifecycle
Managing data throughout the research lifecycleManaging data throughout the research lifecycle
Managing data throughout the research lifecycle
 
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
 
Research data life cycle
Research data life cycleResearch data life cycle
Research data life cycle
 
A basic course on Research data management, part 4: caring for your data, or ...
A basic course on Research data management, part 4: caring for your data, or ...A basic course on Research data management, part 4: caring for your data, or ...
A basic course on Research data management, part 4: caring for your data, or ...
 
Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...
Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...
Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...
 
Research Data Management and Librarians
Research Data Management and LibrariansResearch Data Management and Librarians
Research Data Management and Librarians
 
The Donders Repository
The Donders RepositoryThe Donders Repository
The Donders Repository
 
Introduction to data management
Introduction to data managementIntroduction to data management
Introduction to data management
 
University of Bath Research Data Management training for researchers
University of Bath Research Data Management training for researchersUniversity of Bath Research Data Management training for researchers
University of Bath Research Data Management training for researchers
 
Using Open Science to advance science - advancing open data
Using Open Science to advance science - advancing open data Using Open Science to advance science - advancing open data
Using Open Science to advance science - advancing open data
 
Research Data Management: An Overview - 2014-05-12 - Humanities Division, Uni...
Research Data Management: An Overview - 2014-05-12 - Humanities Division, Uni...Research Data Management: An Overview - 2014-05-12 - Humanities Division, Uni...
Research Data Management: An Overview - 2014-05-12 - Humanities Division, Uni...
 
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
 
Research data management : Open Research Data pilot, data management (plans),...
Research data management : Open Research Data pilot, data management (plans),...Research data management : Open Research Data pilot, data management (plans),...
Research data management : Open Research Data pilot, data management (plans),...
 
DataONE Education Module 03: Data Management Planning
DataONE Education Module 03: Data Management PlanningDataONE Education Module 03: Data Management Planning
DataONE Education Module 03: Data Management Planning
 
Introduction to RDM for Geoscience PhD Students
Introduction to RDM for Geoscience PhD StudentsIntroduction to RDM for Geoscience PhD Students
Introduction to RDM for Geoscience PhD Students
 
Data Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach DataData Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach Data
 
DataONE Education Module 01: Why Data Management?
DataONE Education Module 01: Why Data Management?DataONE Education Module 01: Why Data Management?
DataONE Education Module 01: Why Data Management?
 
Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521
 
The Brain Imaging Data Structure and its use for fNIRS
The Brain Imaging Data Structure and its use for fNIRSThe Brain Imaging Data Structure and its use for fNIRS
The Brain Imaging Data Structure and its use for fNIRS
 

Viewers also liked

Auteursrecht in academische omgeving: DPO Professionaliseringsbijeenkomst, 23...
Auteursrecht in academische omgeving: DPO Professionaliseringsbijeenkomst, 23...Auteursrecht in academische omgeving: DPO Professionaliseringsbijeenkomst, 23...
Auteursrecht in academische omgeving: DPO Professionaliseringsbijeenkomst, 23...
Leon Osinski
 
Csci360 08-subprograms
Csci360 08-subprogramsCsci360 08-subprograms
Csci360 08-subprograms
Boniface Mwangi
 
Philosophy of man 11
Philosophy of man 11Philosophy of man 11
Philosophy of man 11
CD Balubayan
 
Generics in .NET, C++ and Java
Generics in .NET, C++ and JavaGenerics in .NET, C++ and Java
Generics in .NET, C++ and Java
Sasha Goldshtein
 
Oops
OopsOops
9 subprograms
9 subprograms9 subprograms
9 subprograms
jigeno
 
Collections in-csharp
Collections in-csharpCollections in-csharp
Collections in-csharp
Lakshmi Mareddy
 
Raspuns MS Subprogram FIV 2016
Raspuns MS Subprogram FIV 2016Raspuns MS Subprogram FIV 2016
Raspuns MS Subprogram FIV 2016
Asociatia SOS Infertilitatea - www.vremcopii.ro
 
Research data management
Research data managementResearch data management
Research data management
Leon Osinski
 
3963066 pl-sql-notes-only
3963066 pl-sql-notes-only3963066 pl-sql-notes-only
3963066 pl-sql-notes-only
Ashwin Kumar
 
Research Data Management: Part 1, Principles & Responsibilities
Research Data Management: Part 1, Principles & ResponsibilitiesResearch Data Management: Part 1, Principles & Responsibilities
Research Data Management: Part 1, Principles & Responsibilities
AmyLN
 
16 exception handling - i
16 exception handling - i16 exception handling - i
16 exception handling - i
Ravindra Rathore
 
Oracle database 12c sql worshop 1 activity guide
Oracle database 12c sql worshop 1 activity guideOracle database 12c sql worshop 1 activity guide
Oracle database 12c sql worshop 1 activity guide
Otto Paiz
 
16 logical programming
16 logical programming16 logical programming
16 logical programming
jigeno
 
A basic course on Reseach data management, part 2: protecting and organizing ...
A basic course on Reseach data management, part 2: protecting and organizing ...A basic course on Reseach data management, part 2: protecting and organizing ...
A basic course on Reseach data management, part 2: protecting and organizing ...
Leon Osinski
 
C# Generics
C# GenericsC# Generics
C# Generics
Rohit Vipin Mathews
 
Compiler Components and their Generators - Lexical Analysis
Compiler Components and their Generators - Lexical AnalysisCompiler Components and their Generators - Lexical Analysis
Compiler Components and their Generators - Lexical Analysis
Guido Wachsmuth
 
Delegates and events
Delegates and events   Delegates and events
Delegates and events
Gayathri Ganesh
 
Building Surveys in Qualtrics for Efficient Analytics
Building Surveys in Qualtrics for Efficient AnalyticsBuilding Surveys in Qualtrics for Efficient Analytics
Building Surveys in Qualtrics for Efficient Analytics
Shalin Hai-Jew
 
48742447 11g-sql-fundamentals-ii-additional-practices-and-solutions
48742447 11g-sql-fundamentals-ii-additional-practices-and-solutions48742447 11g-sql-fundamentals-ii-additional-practices-and-solutions
48742447 11g-sql-fundamentals-ii-additional-practices-and-solutions
Ashwin Kumar
 

Viewers also liked (20)

Auteursrecht in academische omgeving: DPO Professionaliseringsbijeenkomst, 23...
Auteursrecht in academische omgeving: DPO Professionaliseringsbijeenkomst, 23...Auteursrecht in academische omgeving: DPO Professionaliseringsbijeenkomst, 23...
Auteursrecht in academische omgeving: DPO Professionaliseringsbijeenkomst, 23...
 
Csci360 08-subprograms
Csci360 08-subprogramsCsci360 08-subprograms
Csci360 08-subprograms
 
Philosophy of man 11
Philosophy of man 11Philosophy of man 11
Philosophy of man 11
 
Generics in .NET, C++ and Java
Generics in .NET, C++ and JavaGenerics in .NET, C++ and Java
Generics in .NET, C++ and Java
 
Oops
OopsOops
Oops
 
9 subprograms
9 subprograms9 subprograms
9 subprograms
 
Collections in-csharp
Collections in-csharpCollections in-csharp
Collections in-csharp
 
Raspuns MS Subprogram FIV 2016
Raspuns MS Subprogram FIV 2016Raspuns MS Subprogram FIV 2016
Raspuns MS Subprogram FIV 2016
 
Research data management
Research data managementResearch data management
Research data management
 
3963066 pl-sql-notes-only
3963066 pl-sql-notes-only3963066 pl-sql-notes-only
3963066 pl-sql-notes-only
 
Research Data Management: Part 1, Principles & Responsibilities
Research Data Management: Part 1, Principles & ResponsibilitiesResearch Data Management: Part 1, Principles & Responsibilities
Research Data Management: Part 1, Principles & Responsibilities
 
16 exception handling - i
16 exception handling - i16 exception handling - i
16 exception handling - i
 
Oracle database 12c sql worshop 1 activity guide
Oracle database 12c sql worshop 1 activity guideOracle database 12c sql worshop 1 activity guide
Oracle database 12c sql worshop 1 activity guide
 
16 logical programming
16 logical programming16 logical programming
16 logical programming
 
A basic course on Reseach data management, part 2: protecting and organizing ...
A basic course on Reseach data management, part 2: protecting and organizing ...A basic course on Reseach data management, part 2: protecting and organizing ...
A basic course on Reseach data management, part 2: protecting and organizing ...
 
C# Generics
C# GenericsC# Generics
C# Generics
 
Compiler Components and their Generators - Lexical Analysis
Compiler Components and their Generators - Lexical AnalysisCompiler Components and their Generators - Lexical Analysis
Compiler Components and their Generators - Lexical Analysis
 
Delegates and events
Delegates and events   Delegates and events
Delegates and events
 
Building Surveys in Qualtrics for Efficient Analytics
Building Surveys in Qualtrics for Efficient AnalyticsBuilding Surveys in Qualtrics for Efficient Analytics
Building Surveys in Qualtrics for Efficient Analytics
 
48742447 11g-sql-fundamentals-ii-additional-practices-and-solutions
48742447 11g-sql-fundamentals-ii-additional-practices-and-solutions48742447 11g-sql-fundamentals-ii-additional-practices-and-solutions
48742447 11g-sql-fundamentals-ii-additional-practices-and-solutions
 

Similar to A basic course on Research data management: part 1 - part 4

Research data management : [part of] PROOF course Finding and controlling sci...
Research data management : [part of] PROOF course Finding and controlling sci...Research data management : [part of] PROOF course Finding and controlling sci...
Research data management : [part of] PROOF course Finding and controlling sci...
Leon Osinski
 
Managing and Sharing Research Data: Good practices for an ideal world...in th...
Managing and Sharing Research Data: Good practices for an ideal world...in th...Managing and Sharing Research Data: Good practices for an ideal world...in th...
Managing and Sharing Research Data: Good practices for an ideal world...in th...
Martin Donnelly
 
Simon hodson
Simon hodsonSimon hodson
Data about data management
Data about data managementData about data management
Data about data management
Caroline Ondracek
 
Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)
Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)
Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)
dri_ireland
 
Managing and Sharing Research Data - Workshop at UiO - December 04, 2017
Managing and Sharing Research Data - Workshop at UiO - December 04, 2017Managing and Sharing Research Data - Workshop at UiO - December 04, 2017
Managing and Sharing Research Data - Workshop at UiO - December 04, 2017
Michel Heeremans
 
Research data management during and after your research ; an introduction / L...
Research data management during and after your research ; an introduction / L...Research data management during and after your research ; an introduction / L...
Research data management during and after your research ; an introduction / L...
Leon Osinski
 
Research-Data-Management-and-your-PhD
Research-Data-Management-and-your-PhDResearch-Data-Management-and-your-PhD
Research-Data-Management-and-your-PhD
University of Liverpool Library
 
INCLUSION OF DATA ARCHIVES IN DATA MANAGEMENT PLAN
INCLUSION OF DATA ARCHIVES IN DATA MANAGEMENT PLANINCLUSION OF DATA ARCHIVES IN DATA MANAGEMENT PLAN
INCLUSION OF DATA ARCHIVES IN DATA MANAGEMENT PLAN
Arhiv družboslovnih podatkov
 
Research Data Management at the University of Edinburgh
Research Data Management at the University of EdinburghResearch Data Management at the University of Edinburgh
Research Data Management at the University of Edinburgh
EDINA, University of Edinburgh
 
Research Data Management: Policy Development
Research Data Management: Policy DevelopmentResearch Data Management: Policy Development
Research Data Management: Policy Development
Robin Rice
 
Open Data: Strategies for Research Data Management (and Planning)
Open Data: Strategies for Research Data  Management (and Planning)Open Data: Strategies for Research Data  Management (and Planning)
Open Data: Strategies for Research Data Management (and Planning)
Martin Donnelly
 
Survey of research data management practices up2010digschol2011
Survey of research data management practices up2010digschol2011Survey of research data management practices up2010digschol2011
Survey of research data management practices up2010digschol2011
heila1
 
Edin casestudy-ou-rr-2011
Edin casestudy-ou-rr-2011Edin casestudy-ou-rr-2011
Edin casestudy-ou-rr-2011
Robin Rice
 
Data management planning in the Australian funding landscape by Sarah Olesen
Data management planning in the Australian funding landscape by Sarah OlesenData management planning in the Australian funding landscape by Sarah Olesen
Data management planning in the Australian funding landscape by Sarah Olesen
Marta Ribeiro
 
Research Data Management Services at UWA (November 2015)
Research Data Management Services at UWA (November 2015)Research Data Management Services at UWA (November 2015)
Research Data Management Services at UWA (November 2015)
Katina Toufexis
 
Research Data Management and your PhD
Research Data Management and your PhDResearch Data Management and your PhD
Research Data Management and your PhD
University of Liverpool Library
 
Data Management for Postgraduate students by Lynn Woolfrey
Data Management for Postgraduate students by Lynn WoolfreyData Management for Postgraduate students by Lynn Woolfrey
Data Management for Postgraduate students by Lynn Woolfrey
pvhead123
 
Research Data Management Services at UWA (July 2015)
Research Data Management Services at UWA (July 2015)Research Data Management Services at UWA (July 2015)
Research Data Management Services at UWA (July 2015)
Katina Toufexis
 
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
ARDC
 

Similar to A basic course on Research data management: part 1 - part 4 (20)

Research data management : [part of] PROOF course Finding and controlling sci...
Research data management : [part of] PROOF course Finding and controlling sci...Research data management : [part of] PROOF course Finding and controlling sci...
Research data management : [part of] PROOF course Finding and controlling sci...
 
Managing and Sharing Research Data: Good practices for an ideal world...in th...
Managing and Sharing Research Data: Good practices for an ideal world...in th...Managing and Sharing Research Data: Good practices for an ideal world...in th...
Managing and Sharing Research Data: Good practices for an ideal world...in th...
 
Simon hodson
Simon hodsonSimon hodson
Simon hodson
 
Data about data management
Data about data managementData about data management
Data about data management
 
Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)
Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)
Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)
 
Managing and Sharing Research Data - Workshop at UiO - December 04, 2017
Managing and Sharing Research Data - Workshop at UiO - December 04, 2017Managing and Sharing Research Data - Workshop at UiO - December 04, 2017
Managing and Sharing Research Data - Workshop at UiO - December 04, 2017
 
Research data management during and after your research ; an introduction / L...
Research data management during and after your research ; an introduction / L...Research data management during and after your research ; an introduction / L...
Research data management during and after your research ; an introduction / L...
 
Research-Data-Management-and-your-PhD
Research-Data-Management-and-your-PhDResearch-Data-Management-and-your-PhD
Research-Data-Management-and-your-PhD
 
INCLUSION OF DATA ARCHIVES IN DATA MANAGEMENT PLAN
INCLUSION OF DATA ARCHIVES IN DATA MANAGEMENT PLANINCLUSION OF DATA ARCHIVES IN DATA MANAGEMENT PLAN
INCLUSION OF DATA ARCHIVES IN DATA MANAGEMENT PLAN
 
Research Data Management at the University of Edinburgh
Research Data Management at the University of EdinburghResearch Data Management at the University of Edinburgh
Research Data Management at the University of Edinburgh
 
Research Data Management: Policy Development
Research Data Management: Policy DevelopmentResearch Data Management: Policy Development
Research Data Management: Policy Development
 
Open Data: Strategies for Research Data Management (and Planning)
Open Data: Strategies for Research Data  Management (and Planning)Open Data: Strategies for Research Data  Management (and Planning)
Open Data: Strategies for Research Data Management (and Planning)
 
Survey of research data management practices up2010digschol2011
Survey of research data management practices up2010digschol2011Survey of research data management practices up2010digschol2011
Survey of research data management practices up2010digschol2011
 
Edin casestudy-ou-rr-2011
Edin casestudy-ou-rr-2011Edin casestudy-ou-rr-2011
Edin casestudy-ou-rr-2011
 
Data management planning in the Australian funding landscape by Sarah Olesen
Data management planning in the Australian funding landscape by Sarah OlesenData management planning in the Australian funding landscape by Sarah Olesen
Data management planning in the Australian funding landscape by Sarah Olesen
 
Research Data Management Services at UWA (November 2015)
Research Data Management Services at UWA (November 2015)Research Data Management Services at UWA (November 2015)
Research Data Management Services at UWA (November 2015)
 
Research Data Management and your PhD
Research Data Management and your PhDResearch Data Management and your PhD
Research Data Management and your PhD
 
Data Management for Postgraduate students by Lynn Woolfrey
Data Management for Postgraduate students by Lynn WoolfreyData Management for Postgraduate students by Lynn Woolfrey
Data Management for Postgraduate students by Lynn Woolfrey
 
Research Data Management Services at UWA (July 2015)
Research Data Management Services at UWA (July 2015)Research Data Management Services at UWA (July 2015)
Research Data Management Services at UWA (July 2015)
 
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
 

More from Leon Osinski

Articles and research data : DML Update, 08-10-2020
Articles and research data : DML Update, 08-10-2020Articles and research data : DML Update, 08-10-2020
Articles and research data : DML Update, 08-10-2020
Leon Osinski
 
PROOF course Writing articles and abstracts in English, part: Copyright in ac...
PROOF course Writing articles and abstracts in English, part: Copyright in ac...PROOF course Writing articles and abstracts in English, part: Copyright in ac...
PROOF course Writing articles and abstracts in English, part: Copyright in ac...
Leon Osinski
 
Research data management: course OGO Quantitative research (21-11-2018)
Research data management: course OGO Quantitative research (21-11-2018)Research data management: course OGO Quantitative research (21-11-2018)
Research data management: course OGO Quantitative research (21-11-2018)
Leon Osinski
 
How to make your research data open : presentation held at the VU Open Scienc...
How to make your research data open : presentation held at the VU Open Scienc...How to make your research data open : presentation held at the VU Open Scienc...
How to make your research data open : presentation held at the VU Open Scienc...
Leon Osinski
 
Discussion CC licenses for data
Discussion CC licenses for dataDiscussion CC licenses for data
Discussion CC licenses for data
Leon Osinski
 
Research data management: course 0HV90, Behavioral Research Methods
Research data management: course 0HV90, Behavioral Research MethodsResearch data management: course 0HV90, Behavioral Research Methods
Research data management: course 0HV90, Behavioral Research Methods
Leon Osinski
 
Be open: what funders want you to do with your publications and research data
Be open: what funders want you to do with your publications and research dataBe open: what funders want you to do with your publications and research data
Be open: what funders want you to do with your publications and research data
Leon Osinski
 
A basic course on Research data management, part 3: sharing your data
A basic course on Research data management, part 3: sharing your dataA basic course on Research data management, part 3: sharing your data
A basic course on Research data management, part 3: sharing your data
Leon Osinski
 
How to get FUN out of sharing your data : FUN meeting, 02-04-2015 by Leon Osi...
How to get FUN out of sharing your data : FUN meeting, 02-04-2015 by Leon Osi...How to get FUN out of sharing your data : FUN meeting, 02-04-2015 by Leon Osi...
How to get FUN out of sharing your data : FUN meeting, 02-04-2015 by Leon Osi...
Leon Osinski
 
( Dutch ) Dataverse Network : Workshop (Dutch) Dataverse Network voor 3TU.Dat...
( Dutch ) Dataverse Network : Workshop (Dutch) Dataverse Network voor 3TU.Dat...( Dutch ) Dataverse Network : Workshop (Dutch) Dataverse Network voor 3TU.Dat...
( Dutch ) Dataverse Network : Workshop (Dutch) Dataverse Network voor 3TU.Dat...
Leon Osinski
 
3TU.Datacentrum: presentation for OpenML Workshop (III) at Eindhoven, 22-10-2...
3TU.Datacentrum: presentation for OpenML Workshop (III) at Eindhoven, 22-10-2...3TU.Datacentrum: presentation for OpenML Workshop (III) at Eindhoven, 22-10-2...
3TU.Datacentrum: presentation for OpenML Workshop (III) at Eindhoven, 22-10-2...
Leon Osinski
 
Horizon 2020 and research data : info meeting Horizon 2020 @ TUe, 07-10-2014 ...
Horizon 2020 and research data : info meeting Horizon 2020 @ TUe, 07-10-2014 ...Horizon 2020 and research data : info meeting Horizon 2020 @ TUe, 07-10-2014 ...
Horizon 2020 and research data : info meeting Horizon 2020 @ TUe, 07-10-2014 ...
Leon Osinski
 
Copyright and citation issues : PROOF course Writing articles and abstracts /...
Copyright and citation issues : PROOF course Writing articles and abstracts /...Copyright and citation issues : PROOF course Writing articles and abstracts /...
Copyright and citation issues : PROOF course Writing articles and abstracts /...
Leon Osinski
 
Be prepared to share your research data / Leon Osinski
Be prepared to share your research data / Leon OsinskiBe prepared to share your research data / Leon Osinski
Be prepared to share your research data / Leon Osinski
Leon Osinski
 
Onderzoeksdata-bepalingen van financiers van universitair onderzoek in NL: Ma...
Onderzoeksdata-bepalingen van financiers van universitair onderzoek in NL: Ma...Onderzoeksdata-bepalingen van financiers van universitair onderzoek in NL: Ma...
Onderzoeksdata-bepalingen van financiers van universitair onderzoek in NL: Ma...
Leon Osinski
 
OA beleid subscriptie-uitgevers / Saskia Woutersen-Windhouwer, Leon Osinski
OA beleid subscriptie-uitgevers / Saskia Woutersen-Windhouwer, Leon OsinskiOA beleid subscriptie-uitgevers / Saskia Woutersen-Windhouwer, Leon Osinski
OA beleid subscriptie-uitgevers / Saskia Woutersen-Windhouwer, Leon Osinski
Leon Osinski
 
Wat als alle artikelen open access beschikbaar zijn? / Leon Osinski
Wat als alle artikelen open access beschikbaar zijn? / Leon OsinskiWat als alle artikelen open access beschikbaar zijn? / Leon Osinski
Wat als alle artikelen open access beschikbaar zijn? / Leon Osinski
Leon Osinski
 
Open access : recente ontwikkelingen / Leon Osinski
Open access : recente ontwikkelingen / Leon OsinskiOpen access : recente ontwikkelingen / Leon Osinski
Open access : recente ontwikkelingen / Leon Osinski
Leon Osinski
 
Copyright and your thesis / Leon Osinski
Copyright and your thesis / Leon OsinskiCopyright and your thesis / Leon Osinski
Copyright and your thesis / Leon Osinski
Leon Osinski
 

More from Leon Osinski (19)

Articles and research data : DML Update, 08-10-2020
Articles and research data : DML Update, 08-10-2020Articles and research data : DML Update, 08-10-2020
Articles and research data : DML Update, 08-10-2020
 
PROOF course Writing articles and abstracts in English, part: Copyright in ac...
PROOF course Writing articles and abstracts in English, part: Copyright in ac...PROOF course Writing articles and abstracts in English, part: Copyright in ac...
PROOF course Writing articles and abstracts in English, part: Copyright in ac...
 
Research data management: course OGO Quantitative research (21-11-2018)
Research data management: course OGO Quantitative research (21-11-2018)Research data management: course OGO Quantitative research (21-11-2018)
Research data management: course OGO Quantitative research (21-11-2018)
 
How to make your research data open : presentation held at the VU Open Scienc...
How to make your research data open : presentation held at the VU Open Scienc...How to make your research data open : presentation held at the VU Open Scienc...
How to make your research data open : presentation held at the VU Open Scienc...
 
Discussion CC licenses for data
Discussion CC licenses for dataDiscussion CC licenses for data
Discussion CC licenses for data
 
Research data management: course 0HV90, Behavioral Research Methods
Research data management: course 0HV90, Behavioral Research MethodsResearch data management: course 0HV90, Behavioral Research Methods
Research data management: course 0HV90, Behavioral Research Methods
 
Be open: what funders want you to do with your publications and research data
Be open: what funders want you to do with your publications and research dataBe open: what funders want you to do with your publications and research data
Be open: what funders want you to do with your publications and research data
 
A basic course on Research data management, part 3: sharing your data
A basic course on Research data management, part 3: sharing your dataA basic course on Research data management, part 3: sharing your data
A basic course on Research data management, part 3: sharing your data
 
How to get FUN out of sharing your data : FUN meeting, 02-04-2015 by Leon Osi...
How to get FUN out of sharing your data : FUN meeting, 02-04-2015 by Leon Osi...How to get FUN out of sharing your data : FUN meeting, 02-04-2015 by Leon Osi...
How to get FUN out of sharing your data : FUN meeting, 02-04-2015 by Leon Osi...
 
( Dutch ) Dataverse Network : Workshop (Dutch) Dataverse Network voor 3TU.Dat...
( Dutch ) Dataverse Network : Workshop (Dutch) Dataverse Network voor 3TU.Dat...( Dutch ) Dataverse Network : Workshop (Dutch) Dataverse Network voor 3TU.Dat...
( Dutch ) Dataverse Network : Workshop (Dutch) Dataverse Network voor 3TU.Dat...
 
3TU.Datacentrum: presentation for OpenML Workshop (III) at Eindhoven, 22-10-2...
3TU.Datacentrum: presentation for OpenML Workshop (III) at Eindhoven, 22-10-2...3TU.Datacentrum: presentation for OpenML Workshop (III) at Eindhoven, 22-10-2...
3TU.Datacentrum: presentation for OpenML Workshop (III) at Eindhoven, 22-10-2...
 
Horizon 2020 and research data : info meeting Horizon 2020 @ TUe, 07-10-2014 ...
Horizon 2020 and research data : info meeting Horizon 2020 @ TUe, 07-10-2014 ...Horizon 2020 and research data : info meeting Horizon 2020 @ TUe, 07-10-2014 ...
Horizon 2020 and research data : info meeting Horizon 2020 @ TUe, 07-10-2014 ...
 
Copyright and citation issues : PROOF course Writing articles and abstracts /...
Copyright and citation issues : PROOF course Writing articles and abstracts /...Copyright and citation issues : PROOF course Writing articles and abstracts /...
Copyright and citation issues : PROOF course Writing articles and abstracts /...
 
Be prepared to share your research data / Leon Osinski
Be prepared to share your research data / Leon OsinskiBe prepared to share your research data / Leon Osinski
Be prepared to share your research data / Leon Osinski
 
Onderzoeksdata-bepalingen van financiers van universitair onderzoek in NL: Ma...
Onderzoeksdata-bepalingen van financiers van universitair onderzoek in NL: Ma...Onderzoeksdata-bepalingen van financiers van universitair onderzoek in NL: Ma...
Onderzoeksdata-bepalingen van financiers van universitair onderzoek in NL: Ma...
 
OA beleid subscriptie-uitgevers / Saskia Woutersen-Windhouwer, Leon Osinski
OA beleid subscriptie-uitgevers / Saskia Woutersen-Windhouwer, Leon OsinskiOA beleid subscriptie-uitgevers / Saskia Woutersen-Windhouwer, Leon Osinski
OA beleid subscriptie-uitgevers / Saskia Woutersen-Windhouwer, Leon Osinski
 
Wat als alle artikelen open access beschikbaar zijn? / Leon Osinski
Wat als alle artikelen open access beschikbaar zijn? / Leon OsinskiWat als alle artikelen open access beschikbaar zijn? / Leon Osinski
Wat als alle artikelen open access beschikbaar zijn? / Leon Osinski
 
Open access : recente ontwikkelingen / Leon Osinski
Open access : recente ontwikkelingen / Leon OsinskiOpen access : recente ontwikkelingen / Leon Osinski
Open access : recente ontwikkelingen / Leon Osinski
 
Copyright and your thesis / Leon Osinski
Copyright and your thesis / Leon OsinskiCopyright and your thesis / Leon Osinski
Copyright and your thesis / Leon Osinski
 

Recently uploaded

The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
History of Stoke Newington
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
Celine George
 
B. Ed Syllabus for babasaheb ambedkar education university.pdf
B. Ed Syllabus for babasaheb ambedkar education university.pdfB. Ed Syllabus for babasaheb ambedkar education university.pdf
B. Ed Syllabus for babasaheb ambedkar education university.pdf
BoudhayanBhattachari
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
adhitya5119
 
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxBeyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
EduSkills OECD
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
Dr. Mulla Adam Ali
 
Leveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit InnovationLeveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit Innovation
TechSoup
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
RAHUL
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
Nicholas Montgomery
 
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
Nguyen Thanh Tu Collection
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
HajraNaeem15
 
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
siemaillard
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
eBook.com.bd (প্রয়োজনীয় বাংলা বই)
 
Temple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation resultsTemple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation results
Krassimira Luka
 
ZK on Polkadot zero knowledge proofs - sub0.pptx
ZK on Polkadot zero knowledge proofs - sub0.pptxZK on Polkadot zero knowledge proofs - sub0.pptx
ZK on Polkadot zero knowledge proofs - sub0.pptx
dot55audits
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
TechSoup
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
GeorgeMilliken2
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
haiqairshad
 
IGCSE Biology Chapter 14- Reproduction in Plants.pdf
IGCSE Biology Chapter 14- Reproduction in Plants.pdfIGCSE Biology Chapter 14- Reproduction in Plants.pdf
IGCSE Biology Chapter 14- Reproduction in Plants.pdf
Amin Marwan
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
Katrina Pritchard
 

Recently uploaded (20)

The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
 
B. Ed Syllabus for babasaheb ambedkar education university.pdf
B. Ed Syllabus for babasaheb ambedkar education university.pdfB. Ed Syllabus for babasaheb ambedkar education university.pdf
B. Ed Syllabus for babasaheb ambedkar education university.pdf
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
 
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxBeyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
 
Leveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit InnovationLeveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit Innovation
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
 
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
 
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
 
Temple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation resultsTemple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation results
 
ZK on Polkadot zero knowledge proofs - sub0.pptx
ZK on Polkadot zero knowledge proofs - sub0.pptxZK on Polkadot zero knowledge proofs - sub0.pptx
ZK on Polkadot zero knowledge proofs - sub0.pptx
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
 
IGCSE Biology Chapter 14- Reproduction in Plants.pdf
IGCSE Biology Chapter 14- Reproduction in Plants.pdfIGCSE Biology Chapter 14- Reproduction in Plants.pdf
IGCSE Biology Chapter 14- Reproduction in Plants.pdf
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
 

A basic course on Research data management: part 1 - part 4

  • 1. A basic course on Research data management part 1: what and why PROOF course Information Literacy and Research Data Management TU/e, 19-09-2017 l.osinski@tue.nl, TU/e IEC/Library Available under CC BY-SA license, which permits copying and redistributing the material in any medium or format & adapting the material for any purpose, provided the original author and source are credited & you distribute the adapted material under the same license as the original
  • 2. Research data management [RDM] what #1 Essence of RDM: “… tracking back to what you did 7 years ago and recovering it (...) immediately in a re- usable manner.” (Henry Rzepa)
  • 3. Research data management [RDM] what #2 RDM: caring for your data with the purpose to: 1. protect their mere existence: data loss, data authenticity (RDM basics) 2. share them with others a. for reasons of reuse: in the same context or in a different context; during research and after research b. for reasons of reproducibility checks  scientific integrity; data quality RDM = good data practices1,2,3,4,5,6 that make your data understandable, easy to work with, and available to other scientists 1. Dynamic ecology (2016), Ten commandments for good data management. https://dynamicecology.wordpress.com/2016/08/22/ten-commandments-for- good-data-management/ 2. Borer, E.T., Seabloom, E.W., Jones, M.B., et al. (2009) Some simple guidelines for effective data management, Bulletin of the Ecological Society of America, 90(2), p. 205-214. doi: 10.1890/0012-9623-90.2.205 3. Hook, L.A., Santhana Vannan, S.K., Beaty, T.W. et al. Best practices for preparing environmental data sets to share and archive. Available online http://daac.ornl.gov/PI/BestPractices-2010.pdf . doi: 10.3334/ORNLDAAC/BestPractices-2010 4. White, E.P., Baldridge, E., Brym, T. et al. (2013) Nine simple ways to make it easier to (re)use your data, Ideas in Ecology and Evolution, 6(2), p. 1-10. doi: 10.4033/iee.2013.6b.6.f 5. Goodman, A., Pepe, A., Blocker, A.W., et al. (2014) Ten simple rules for the care and feeding of scientific data, PLOS Computional Biology, 10(4), e10033542. doi: 10.1371/journal.pcbi.1003542 6. Sandve, G.K., et. al. (2013), Ten simple rules for reproducible computational research, PLOS Computational Biology, 9(10), e1003285. doi: 10.1371/journal.pcbi.1003285
  • 4. Source: Research Data Netherlands / Marina Noordegraaf Outline 1. Research data management [RDM]: what and why a. data management plan b. discussion 2. Sharing your data, or making your data findable and accessible a. data protection: back up, file naming, organizing data b. data sharing: via collaboration platforms, data archives 3. Caring for your data, or making your data usable and interoperable a. tidy data b. metadata/documentation c. licenses d. open data formats
  • 5.  Because you work together with other researchers  collaborative science  Because of re-using results: data-driven science  open science  Because of scientific integrity: validating data analysis by reproducibility checks requires data and the code that is used to clean, process and analyze the data and to produce the final outputs Additional reasons  Because your data are unique / not easily repeatable (long term observational data)  Because you benefit from it: increases your visibility and enhances the trustworthiness / credibility of your research Why sharing research data? #1
  • 6.  Data sharing is increasingly required by: + Journals [here, here, here, here] + Professional organizations [VSNU, KNAW] + Universities, including TU/e + Research funders [NWO, ZonMW, EC] data management plan Why sharing research data? #2 because you have to…
  • 7. EC: Horizon 2020 #1 Open research data (ORD) pilot: why?  “The ORD pilot aims to improve and maximise access to and re-use of research data generated by Horizon 2020…”  “The ORD pilot applies primarily to the data needed to validate the results presented in scientific publications. Other data can also be provided…”  “A data management plan (DMP) is required for all projects participating in the extended ORD pilot…” “Participating in the ORD pilot does not necessarily mean opening up all your research data. Rather, the ORD Pilot follows the principle “as open as possible, as closed as necessary” and focuses on encouraging sound data management as an essential part of research best practice.” (my underlining)
  • 8. EC: Horizon 2020 #2 how? sound research data management Sound research data management is data management following the FAIR principles. All research data should be: Findable: easy to find by both humans and computer systems; Accessible: stored for long term with well-defined license and access conditions (open access when possible); Interoperable: ready to be combined with other datasets by humans as well as computer systems; Reusable: ready to be used for future research and to be processed further using computational methods.
  • 9. Source: Research Data Netherlands / Marina Noordegraaf EC: Horizon 2020 #3 requirements The conditions set by Horizon 2020 with regard to research data management, come down to two requirements: 1. Formulate a data management plan, and; 2. Deposit research data in a data repository
  • 10. The DMP is a set of questions along the FAIR principles about: 1. What research data sets the project will collect, process and/or generate 2. The handling of these data sets during and after the project 3. Whether and how data sets will be findable/discoverable, re-useable and shared/made open access 4. How data will be curated and preserved 5. What measures are taken to safeguard and protect (sensitive) data EC Horizon 2020 #4 data management plan  DMP template Horizon 2020 (via DMPOnline): recommended but voluntary  ZonMw template (via DMP online)  DMP template by 4TU.Centre of Research Data  Examples of H2020 DMPs: http://www.dcc.ac.uk/resources/data- management-plans/guidance-examples
  • 11. Research data management discussion topics and questions Storage and back-up  What sort of data do you use? Are you creating new data or are you working with pre-existing data?  Where do you store your research data? Is there a back-up? Where?  Are data selections made? Not everything is to be stored but…? Metadata and documentation (information to let you find, use and understand the data)  Do you describe your research data? Who measured or collected what, when, how? Other context information?  Are you content with the way you document or describe your research data? Do you succeed in finding the right (version of your) research data?  Can other researchers understand and (re-)use your research data (during and after research)? Should they be able to? Access and re-use  Who can access your research data?  What will happen to your research data when you leave TU/e?  Would you consider publishing your research data, i.e. to make them public available?
  • 12. Research data management which of these statements is true? Storage and back-up 1. My research data is stored safely and securely, including regular back ups? Metadata and documentation 2. I keep metadata with my data: who measured/collected what, when, how Access and re-use 3. My colleagues are able to access and use my data 4. Other researchers are able to access and use my data 5. My nearest colleagues and I are the only ones who can understand my data 6. Anyone should be able to use my data when I have finished with it
  • 13. Reasons not to share your data  Preparing my data for sharing takes time and effort But research data management also increases your research efficiency  My data are confidential But you can anonymize or pseudonymize your data  My data still need to yield publications But you can publish your data under an embargo and by publishing your data you establish priority and you can get credits for it  My data can be misused or misinterpret But the best defense against malicious use is to refer to an archival copy of your data which is guaranteed exactly as you mean it to be  My data are only interesting for me But sharing your data may be required by a funder / journal or your data may be requested to validate your results
  • 14. 1. Website IEC/Library [TU/e]: https://www.tue.nl/en/university/library/ 2. Figshare support, The importance of data management for research: https://youtu.be/Ae205CNrk6w 3. Henry Rzepa, Collaborative FAIR data sharing: http://www.ch.imperial.ac.uk/rzepa/blog/?p=16292 4. Dynamic ecology (2016), ten commandments for good data management. https://dynamicecology.wordpress.com/2016/08/22/ten-commandments-for-good-data-management/ 5. Borer, E.T., Seabloom, E.W., Jones, M.B., et al. (2009) Some simple guidelines for effective data management, Bulletin of the Ecological Society of America, 90(2), p. 205-214. doi: 10.1890/0012-9623- 90.2.205 6. Hook, L.A., Santhana Vannan, S.K., Beaty, T.W. et al. Best practices for preparing environmental data sets to share and archive. doi: 10.3334/ORNLDAAC/BestPractices-2010 7. White, E.P., Baldridge, E., Brym, T. et al. (2013) Nine simple ways to make it easier to (re)use your data, Ideas in Ecology and Evolution, 6(2), p. 1-10. doi: 10.4033/iee.2013.6b.6.f 8. Goodman, A., Pepe, A., Blocker, A.W., et al. (2014) Ten simple rules for the care and feeding of scientific data, PLOS Computional Biology, 10(4), e10033542. doi: 10.1371/journal.pcbi.1003542 9. Sandve, G.K., et. al. (2013), Ten simple rules for reproducible computational research, PLOS Computational Biology, 9(10), e1003285. doi: 10.1371/journal.pcbi.1003285 10. Data sharing increases visibility: http://dx.doi.org/10.7717/peerj.175 11. Data sharing enhances trustworthiness: http://dx.doi.org/10.1371/journal.pone.0026828 URL’s of mentioned webpages in order of appearance #1
  • 15. 12. Data availability policy journals: http://www.nap.edu/openbook.php?record_id=10613&page=33 13. Data availability policy American Economic Review: https://www.aeaweb.org/aer/data.php 15. Data availability policy PLoS: http://journals.plos.org/plosone/s/data-availability 16. Data availability policy Nature: http://www.nature.com/authors/policies/availability.html 17. VSNU Code of Scientific Conduct (Dutch, revision 2014): http://www.vsnu.nl/files/documenten/Domeinen/Onderzoek/Code_wetenschapsbeoefening_2004_(2014) .pdf 18. KNAW responsible research data management: https://www.knaw.nl/en/news/publications/responsible- research-data-management-and-the-prevention-of-scientific-misconduct?set_language=en 19. Radboud University research data policy: http://www.ru.nl/research-information-services/institutional- policy/policy-research-data-management/ 20. TU/e Code of Scientific Conduct: http://www.tue.nl/en/university/about-the-university/integrity/scientific- integrity/ 21. NWO and research data: http://www.nwo.nl/en/policies/open+science/data+management 21. ZonMW Toegang tot data: https://www.zonmw.nl/en/research-and-results/access-to-data/ 22. Horizon 2020 Guidelines on data management: http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data- mgt_en.pdf URL’s of mentioned webpages in order of appearance #2
  • 16. 23. About FAIR: Mons, B. et al., Cloudy, increasingly FAIR: revisiting the FAIR Data guiding principles for the European Open Science Cloud: http://dx.doi.org/10.3233/ISU-170824 24. Template data management plan Horizon 2020: https://dmponline.dcc.ac.uk/ 25. ZonMW data management plan template: https://www.zonmw.nl/en/research-and-results/access-to- data/format-data-management-plan/ 26. Data management plan template (4TU.ResearchData): http://researchdata.4tu.nl/en/planning- research/data-management-plan/ 27. Examples of Horizon 2020 data management plans: http://www.dcc.ac.uk/resources/data-management- plans/guidance-examples 28. Emilio M. Bruna (04-09-2014), The opportunity cost of my #OpenScience was 36 hours + $690 (UPDATED) . http://brunalab.org/blog/2014/09/04/the-opportunity-cost-of-my-openscience-was-35-hours-690/ 28. Rouder, Jeffrey N., The what, why, and how of born-open data, Behavior Research Methods, vol. 48(2016), p. 1062-1069. http://dx.doi.org/10.3758/s13428-015-0630-z (see p. 1063: “It was a pain to document the data; it was a pain to format the data”) URL’s of mentioned webpages in order of appearance #2
  • 17. A basic course on Research data management part 2: protecting and organizing your data PROOF course Information Literacy and Research Data Management TU/e, 07-03-2017 l.osinski@tue.nl, TU/e IEC/Library Available under CC BY-SA license, which permits copying and redistributing the material in any medium or format & adapting the material for any purpose, provided the original author and source are credited & you distribute the adapted material under the same license as the original
  • 18. Research data management  Sharing your data, or making your data findable and accessible with good data practices → protecting your data: back up, access control; file naming, organizing data, versioning + sharing your data via collaboration platforms and archives  Caring for your data, or making your data usable and interoperable with good data practices + tidy data + metadata/documentation + licenses + open data formats Research data management what was it again
  • 19. Be safe + storage, backup  data safety, protecting against loss: use local ICT infrastructure (departmental servers, including SURFdrive) as much as possible + access control  data security, protecting against unauthorized use: with DataverseNL for example Be organized, or: you (and others) should be able to tell what’s in a file without opening it + file-naming, organizing data in folders, versioning Protecting your data good data practices during your research “…we can copy everything and do not manage it well.” (Indra Sihar)
  • 20. File-naming #1 be consistent and aim for concise but informative names How you organize and name your files has a big impact on your ability to find those files later and to understand what they contain. Good file names are consistent (use file-naming conventions), unique (distinguishes a file from files with similar subjects as well as different versions of the file) and meaningful (use descriptive names). File-naming conventions help you find your data, help others to find your data and help track which version of a file is most current  Avoid using special characters in a file name: / : * ? < > | [ ] & $  Use hyphens or underscores instead of periods or spaces to separate logical elements in a file name  Avoid very long names: usually 25 characters is sufficient length  Names should include all necessary descriptive information: initials researcher, project number, procedure/method…  Names are independent of where it is stored (not the same names in different folders)  Include dates (format YYYYMMDD) and a version number on files  Add a readme.txt to each folder in which the file naming and its meaning is explained Source: Best practices for file naming (Stanford University Libraries)
  • 21. File naming #2 think about the ordering of elements within a filename  Order by date: 2013-04-12_interview-recording_THD.mp3 2013-04-12_interview-transcript_THD.docx 2012-12-15_interview-recording_MBD.mp3 2012-12-15_interview-transcript_MBD.docx  Order by subject: MBD_interview-recording_2012-12-15.mp3 MBD_interview-transcript_2012-12-15.docx THD_interview-recording_2013-04-12.mp3 THD_interview-transcript_2013-04-12.docx  Order by type: Interview-recording_MBD_2012-12-15.mp3 Interview-recording_THD_2013-04-12.mp3 Interview-transcript_MBD_2012-12-15.docx Interview-transcript_THD_2013-04-12.docx  Forced order with numbering: 01_THD_interview-recording_2013-04-12.mp3 02_THD_interview-transcript_2013-04-12.docx 03_MBD_interview-recording_2012-12-15.mp3 04_MBD_interview-transcript_2012-12-15.docx <
  • 22. File organization PAGE 2220-9-2017 Beatriz Ramirez, Data management plan for the PhD project: development and application of a monitoring system to assess the impacts of climate and land cover changes on eco-hydrological processes in an eastern Andes catchment area Source: Haselager, dr. G.J.T. (Radboud University Nijmegen); Aken, prof. dr. M.A.G. van (Utrecht University) (2000): Personality and Family Relationships. DANS. http://dx.doi.org/10.17026/dans- xk5-y7vc .
  • 23. Organizing your data in folders #1 based on the TIER documentation protocol (http://www.projecttier.org/) Guiding principles of TIER documentation protocol 1. keep your raw or original data raw + save your raw data read-only in its original format in a separate folder + make a working copy of your raw data (input data, used for processing and analysis) 2. keep the command files (files containing code written in the syntax of the (statistical) software you use for the study) apart from the data 3. keep the analysis files (the fully cleaned and processed data files that you use to generate the results reported in your paper) in a separate folder 4. store the metadata (codebook, description of variables, etc.) in a separate folder, apart from the data itself
  • 24. Organizing your data in folders #2 based on the TIER documentation protocol (http://www.projecttier.org/) 1. Main project folder (name of your research project/working title of your paper) 1.1. Original data and metadata 1.1.1. Original data 1.1.2. Metadata 1.1.2.1. Supplements 1.2. Processing and analysis files 1.2.1. Importable data files 1.2.2. Command files 1.2.3. Analysis files 1.3. Documents 1.4. Literature
  • 25. 1. Main project folder (name of your research project/working title of your paper) 1.1. Original data and metadata 1.1.1. Original data (raw data, obtained/gathered data) Any data that were necessary for any part of the processing and/or analysis you reported in you paper. Copies of all your original data files, saved in exactly the format it was when you first obtained it. The name of the original data file may be changed Keep these data read only! 1.1.2. Metadata 1.1.2.1. Supplements Organizing your data in folders #3 based on the TIER documentation protocol
  • 26. 1. Main project folder (name of your research project/working title of your paper) 1.1. Original data and metadata 1.1.1. Original data 1.1.2. Metadata The Metadata Guide: document that provides information about each of your original data files. Applies especially to obtained data files  A bibliographic citation of the original data files, including the date you downloaded or obtained the original data files and unique identifiers that have been assigned to the original data files.  Information about how to obtain a copy of the original data file  Whatever additional information to understand and use the data in the original data file 1.1.2.1. Supplements Additional information about an original data file that’s not written by yourself but that is found in existing supplementary documents, such as users’ guides and code books that accompany the original data file Organizing your data in folders #4 based on the TIER documentation protocol
  • 27. Organizing your data in folders #5 based on the TIER documentation protocol 1. Main project folder (name of your research project/working title of your paper) 1.1. Original data and metadata 1.1.1. Original data 1.1.2. Metadata 1.1.2.1. Supplements 1.2. Processing and analysis files 1.2.1. Importable data files (the data you work with, input data, suitable for processing and analysis) A corresponding version for each of the original data files. This version can be identical to the original version, or in some cases it will be a modified version. For example modifications required to allow your software to read the file (converting the file to another format, removing unusable data or explanatory notes from a table)  The original and importable versions of a data file should be given different names  The importable data file should be as nearly as identical as possible to the original  The changes you make to your original data files to create the corresponding importable data files should be described in a Readme file 1.2.2. Command files 1.2.3. Analysis files
  • 28. Organizing your data in folders #6 based on the TIER documentation protocol 1. Main project folder (name of your research project/working title of your paper) 1.1. Original data and metadata 1.1.1. Original data 1.1.2. Metadata 1.1.2.1. Supplements 1.2. Processing and analysis files 1.2.1. Importable data files 1.2.2. Command files One or more files containing code written in the syntax of the (statistical) software you use for the study  Importing phase: commands to import or read the files and save them in a format that suits your software  Processing phase: commands that execute all the processing required to transform the importable version of your files into the final data files that you will use in your analysis (i.e. cleaning, recoding, joining two or more data files, dropping variables or cases, generating new variables)  Generating the results: commands that open the analysis data file(s), and then generate the results reported in your paper. 1.2.3. Analysis files
  • 29. Organizing your data in folders #7 based on the TIER documentation protocol 1. Main project folder (name of your research project/working title of your paper) 1.1. Original data and metadata 1.1.1. Original data 1.1.2. Metadata 1.1.2.1. Supplements 1.2. Processing and analysis files 1.2.1. Importable data files 1.2.2. Command files 1.2.3. Analysis files  The fully cleaned and processed data files that you use to generate the results reported in your paper in your paper  The Data Appendix: codebook for your analysis data files: brief description of the analysis data file(s), a complete definition of each variable (including coding and/or units of measurement), the name of the original data files from which the variable was extracted, the number of valid observations for the variable, and the number of cases with missing values
  • 30. Organizing your data in folders #8 based on the TIER documentation protocol 1. Main project folder (name of your research project/working title of your paper) 1.1. Original data and metadata 1.1.1. Original data 1.1.2. Metadata 1.1.2.1. Supplements 1.2. Processing and analysis files 1.2.1. Importable data files 1.2.2. Command files 1.2.3. Analysis files 1.3. Documents  An electronic copy of your complete final paper  The Readme-file for your replication documentation  What statistical software or other computer programs are needed to run the command files  Explain the structure of the hierarchy of folders in which the documentation is stored  Describe precisely any changes you made to your original data files to create the corresponding importable data files  Step-by-step instructions for using your documentation to replicate the statistical results reported in your paper 1.4. Literature  Retrieved relevant literature
  • 31. 1. Storage, back up of data: http://www.data-archive.ac.uk/create-manage/storage 2. Local ICT infrastructure: https://intranet.tue.nl/en/university/services/ict-services/ict-service- catalog/management-services/data-management-storage/ (TU/e intranet) 3. SURFdrive (at TU/e): https://intranet.tue.nl/en/university/services/ict-services/ict-service- catalog/management-services/data-management-surfdrive 4. DataverseNL: https://dataverse.nl/dvn/ 5. Version control: http://www.data-archive.ac.uk/create-manage/format/versions 6. Best practices for file naming: http://library.stanford.edu/research/data-management-services/data-best- practices/best-practices-file-naming 8. File organization: Haselager, dr. G.J.T. , Aken, prof. dr. M.A.G. van (2000): Personality and Family Relationships. DANS. http://dx.doi.org/10.17026/dans-xk5-y7vc (Data guide, p. 24-26) 9. Best practices: file names and folder structures (Leiden example): http://blogs.library.leiden.edu/researchdata/2016/06/03/best-practices-file-names-and-folder- structures/#more-284 10. Beatriz Ramirez, Data management plan for the PhD project: development and application of a monitoring system to assess the impacts of climate and land cover changes on eco-hydrological processes in an eastern Andes catchment area: http://www.wageningenur.nl/web/file?uuid=3f974938-79a0-421f-b1ad- 95eef49d777c&owner=c057b578-4a6a-4449-881b-17fff17e2f1a (see Figure 1 for folder structure) 11. TIER documentation protocol: http://www.projecttier.org/ URL’s of mentioned webpages in order of appearance
  • 32. A basic course on Research data management part 3: sharing your data PROOF course Information Literacy and Research Data Management TU/e, 07-03-2017 l.osinski@tue.nl, TU/e IEC/Library Available under CC BY-SA license, which permits copying and redistributing the material in any medium or format & adapting the material for any purpose, provided the original author and source are credited & you distribute the adapted material under the same license as the original
  • 33. Research data management  Sharing your data, or making your data findable and accessible with good data practices + protecting your data: back up, access control; file naming, organizing data, versioning → sharing your data via collaboration platforms and archives  Caring for your data, or making your data usable and interoperable with good data practices + tidy data + metadata/documentation + licenses + open data formats Research data management what was it again
  • 34. During research After researchInstitutionDisciplin Local ICT services Overview research data sharing and storage services Data sharing per se is pretty straightforward
  • 35. General data sharing platforms:  SURFdrive [TU/e only]: Dutch academic Dropbox, 100 Gb, maximum data transfer 16 Gb every TUe employee can use SURFdrive  Google Drive, Dropbox, Beehub… DataverseNL [TU/e only]: data sharing platform for active research data [based on Harvard’s Dataverse Project] where you may:  store your data in an organized and safe way  clearly describe your data  version control of your data  arrange access to your data  get recognition for your data  [collaborate on your data] Various disciplinary initiatives: Open Science Framework, OpenML, RodRep, CRCNS… SURF Filesender [secure data transfer up to 500 Gb!, WeTransfer up to 2 Gb] Sharing your data collaboration or sharing platforms (during your research) Storage and backup of data through DANS [Dutch Archiving and Networking Services] Data transfer: up to 2 Gb per dataset Dataverse via 4TU.ResearchData: up to 50 Gb free
  • 36. How to create an account:  Go to: https://dataverse.nl/  Click ‘Log in’ (at the top right); under Institutional account click SURFconext  Select Eindhoven University of Technology and log on with your TU/e username and password  When asked for it, give permission to share your data by answering Yes or click this Tab  When asked to create an account, answer Yes or click this Tab.  When you succeeded to create an account, your username is the prefix of your email address You now have a user account with DataverseNL. If you click 4TU dataverse  Eindhoven dataverse  Add data you can create and publish data sets, upload files and assign access rights to data sets or files. However, before you proceed, contact me (for more options) or first use the demo version: https://act.dataverse.nl Sharing your data DataverseNL If you are interested in using DataverseNL, please contact me (Leon Osinski)
  • 37. On request “I'd like to thank E.J. Masicampo and Daniel LaLande for sharing and allowing me to share their data…” Daniël Lakens (2014), What p-hacking really looks like: A comment on Masicampo & LaLande (2012) On a (personal) website “Let me start by saying that the reason why I put all excel files online, including all the detailed excel formulas about data constructions and adjustments, is precisely because I want to promote an open and transparent debate about these important and sensitive measurement issues.” Thomas Piketty, My response to the Financial Times, HuffPost The Blog, 29-05-2014 ; originally published as Addendum: Response to FT, 28-05-2014 A data journal Journal of open psychology data, Geoscience data journal, Data in brief, Scientific data, Data reports Sharing your data after your research has ended Source: www.aukeherrema.nl
  • 38. Choose a repository where other researchers in your discipline are sharing their data, for example LXcat (for plasma data), TurBase (for turbulence data) or GenBank (for genetic sequence data) Overview of research data repositories: Re3data.org Use a repository that at least assigns a persistent identifier to your data (DOI) and requires that you provide adequate metadata  General or multidisciplinary repositories: Zenodo, Figshare, DANS, Dryad, B2SHARE  4TU.ResearchData + small medium sized data sets, long tail data + static data, ‘frozen’ data sets, ‘milestone’ data sets + preferably nonproprietary data formats suitable for long term preservation + DOI’s [ persistent identifier for citability and retrievability ] + open access + long-term availability, Data Seal of Approval + Data Citation Index (Thomson Reuters) + self-upload (single data sets < 3Gb) + special collections of related data sets Sharing your data after your research has ended, by publishing and archiving them in an established repository
  • 39. Link your data to your publication Sharing your data link your data to your publication
  • 40. 1. Overview research data storage and sharing services: http://dataservices.silk.co/ 2. DataverseNL: https://www.dataverse.nl/dvn/ 3. Harvard’s Dataverse Project: http://dataverse.org/ 4. Open Science Framework: https://cos.io/osf/ 5. OpenML: http://www.openml.org 6. RodRep: http://www.rodrep.com/ 7. CRCNS: http://crcns.org/ 8. SURFdrive: https://www.surfdrive.nl/ 9. Google Drive: https://www.google.com/drive/ 10. Dropbox: https://www.dropbox.com/ 11. Beehub: https://beehub.nl/system/ 12. SURF filesender: https://filesender.surfnet.nl/ 12. Data on request (blog post Daniel Lakens): http://daniellakens.blogspot.nl/2014/09/what-p-hacking-really- looks-like.html 13. Data on personal website (Thomas Piketty): http://piketty.pse.ens.fr/en/capital21c2 14. Overview of (better known) data journals: http://proj.badc.rl.ac.uk/preparde/blog/DataJournalsList URL’s of mentioned webpages in order of appearance #1
  • 41. 15. Data journal: Journal of Open Psychology Data: http://openpsychologydata.metajnl.com/ 16. Data journal: Geoscience Data Journal: http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)2049-6060 17. Data journal: Data in brief: http://www.journals.elsevier.com/data-in-brief 18. Data journal: Scientific data: http://www.nature.com/sdata/ 19. Data journal: Data reports: http://www.frontiersin.org/news/Data_Reports_a_new_type_of_peer- reviewed_article_in_Frontiers_journals/1051?utm_source=FRN&utm_medium=ECOM&utm_campaign=T WT_FRN_1502_datareport 20. Research data catalogue: Re3data.org: http://service.re3data.org/search/results?term= 21. Publishing data: Zenodo: http://www.zenodo.org/ 22. Publishing data: Figshare: http://www.figshare.com 23. Publishing data: DANS: http://www.dans.knaw.nl/en 23. Publishing data: Dryad: http://datadryad.org/ 24. Publishing data: B2SHARE: https://b2share.eudat.eu/ 25. Publishing data: 4TU.ResearchData: https://data.4tu.nl/ 26. Long tail research data: http://www.nature.com/neuro/journal/v17/n11/fig_tab/nn.3838_F1.html URL’s of mentioned webpages in order of appearance #2
  • 42. 27. Preferred data formats 4TU.ResearchData: http://researchdata.4tu.nl/en/publishing-research/data- description-and-formats/ 28. Data Seal of Approval: http://www.datasealofapproval.org 29. Data Citation Index (Thomson Reuters): http://wokinfo.com/products_tools/multidisciplinary/dci/ 30. Self upload 4TU.ResearchData: https://data.4tu.nl/account/login/?next=/upload/ 31. Data sets underlying PhD thesis Joos Buijs: http://dx.doi.org/10.4121/uuid:26aba40d-8b2d-435b-b5af- 6d4bfbd7a270 32. PhD thesis Joos Buijs: http://dx.doi.org/10.6100/IR780920 URL’s of mentioned webpages in order of appearance #3
  • 43. A basic course on Research data management part 4: caring for your data, or making data usable PROOF course Information Literacy and Research Data Management TU/e, 07-03-2017 l.osinski@tue.nl, TU/e IEC/Library Available under CC BY-SA license, which permits copying and redistributing the material in any medium or format & adapting the material for any purpose, provided the original author and source are credited & you distribute the adapted material under the same license as the original
  • 44. Research data management  Sharing your data, or making your data findable and accessible with good data practices + protecting your data: back up, access control; file naming, organizing data, versioning + sharing your data via collaboration platforms and archives → Caring for your data, or making your data usable and interoperable with good data practices + tidy data + metadata/documentation + licenses + open data formats Research data management what was it again Before data can be reusable, it has first to be usable
  • 45. Tidy data is about structure of a table / data set. Tidy data ≠ clean data. It’s a step towards clean data + Each variable you measure is in one column + Column headers are variable names + Each observation is in a different row + Every cell contains only one piece of information Tidy data making your data easy to handle for computers
  • 46. Tidy data allow your data to be easily: + imported by data management systems + analyzed by analysis software + visualized, modelled, transformed + combined with other data (interoperability) Tidy data why
  • 47. Tidy data versus messy data 1. More than one variable in a single column (‘clumped data’) 2. Column headers are values, or: one variable over many columns (‘wide data’) 3. Variables are in rows and columns 4. More pieces of information in one cell (cells are highlighted or colored; values and measurement units in one cell) 1. Each variable you measure is in one column 2. Column headers are variable names 3. Each observation is in a different row 4. Every cell contains only one piece of information Tidy data Messy data
  • 48. patient_id drug_a drug_b 1 67 56 2 80 90 3 64 50 4 85 75 Tidy data versus messy data example ‘Wide’ data: one variable over many columns Tidy data patient_id drug heart_rate 1 a 67 2 a 80 3 a 64 4 a 85 1 b 56 2 b 90 3 b 50 4 b 75
  • 49. What is the nature of the “unusual episode” to which this table refers?
  • 50. What is the nature of the “unusual episode” to which this table refers? Different columns contain measurements of the same variable: easier to read and interpret but difficult to add data (columns) to the records (rows)
  • 51. Class Sex Age Survived Freq 1 1st Male Child No 0 2 2nd Male Child No 0 3 3rd Male Child No 35 4 Crew Male Child No 0 5 1st Female Child No 0 6 2nd Female Child No 0 7 3rd Female Child No 17 8 Crew Female Child No 0 9 1st Male Adult No 118 10 2nd Male Adult No 154 11 3rd Male Adult No 387 12 Crew Male Adult No 670 13 1st Female Adult No 4 14 2nd Female Adult No 13 15 3rd Female Adult No 89 16 Crew Female Adult No 3 17 1st Male Child Yes 5 18 2nd Male Child Yes 11 19 3rd Male Child Yes 13 20 Crew Male Child Yes 0 21 1st Female Child Yes 1 22 2nd Female Child Yes 13 The same data in a tidy structure (variables in columns and observations in rows) “The problem is that people like to view data in a totally different way than a computer likes to process it.” (Kien Leong)
  • 52. Tools for tidying data OpenRefine  download OpenRefine: http://openrefine.org/download.html  runs on your computer (not in the cloud), inside the Firefox browser (not in IE), no web connection is needed  captures all steps done to your raw data ; original dataset is not modified; steps are easily reversed; R, TidyR package  scripted language (R (free), Matlab, SAS…) to process data (tidying, cleaning, etc.), run the analysis and to produce final outputs versus  Excel: data provenance and documentation of data processing with a graphical user interface is bad because it doesn’t leaves a record
  • 53. The table or data set itself + columns: use clear, descriptive variable names (no hard to understand abbreviations), avoid special characters (can cause problems with some software) + rows: if possible, use standard names within cells (derived from a taxonomy, for example: standard species name, CAS registry for chemical substances, standard date formats, …) + try to avoid coding categorical or ordinal data as numbers + missing data: use NA Documentation / metadata making your data understandable for humans #1
  • 54. The table or data set as a whole A description (documentation) that at least mentions: + size of the data set: number of observations and variables + information about the variables and its measurement units (code book) + what’s included and excluded in the data set, why data are missing + description of how you collected the data (study design), data manipulation steps (provenance) + when your data consists of multiple files organized in a folder structure, an explanation of the structure and naming of the files Documentation / metadata making your data understandable for humans #2 “Research outputs that are poorly documented are like canned goods with the label removed (…)” (Carly Strasser)
  • 55. Documentation / metadata metadata standards Sometimes there are metadata standards for the documentation of your data set but where no standard exists, a simple readme file can be good enough
  • 56. Raw data: https://www.amstat.org/publicatio ns/jse/datasets/titanic.dat.txt Documentation accompanying the data: https://www.amstat.org/publicatio ns/jse/datasets/titanic.txt  Size (number of observations and variables)  Description  Provenance  Variable descriptions Based on: The "Unusual Episode" Data Revisited / by Robert J. MacG. Dawson, in: Journal of Statistics Education vol. 3(1995), issue 3
  • 57. 1. Morphological Measurements of Galapagos Finches http://dx.doi.org/10.5061/dry ad.152  Use of standard names (taxonomy, species)  Variable names clear enough? WingL must be wing length but what is N.Ubkl?  Units of measurement? Based on: Looking after datasets / by Antony Unwin, 01-09-2015, http://blog.revolutionanalytics .com/2015/09/looking-after- datasets.html
  • 58. Documentation / metadata making your data findable for humans and search engines Descriptive metadata for discovery and identification of your data mainly + creator + title + short description + key words + date(s) of data collection + publication year + related publications + DOI (assigned by data archive) + etc. When uploading your data in a data archive like 4TU.ResearchData, you will be asked to enter these metadata A DOI is assigned by the data archive
  • 59. User license making clear that other people are allowed to use your data Let other people know in advance what they are allowed to do with your data by attaching a user license to it + Creative Commons license for data sets + GNU General Public License (GPL) for software + License selector
  • 60. Open data formats ensuring the ‘longevity’ of your data + with open (non-proprietary) data formats it is best ensured that the data will remain usable and ‘legible’ for computers in the future + are easy to use in a variety of software, like .csv for tabular data + check the data formats that are supported by a data archive like 4TU.ResearchData
  • 61. Usable data recommended reading These 3 papers give a good summary of this module + Eugene Barsky (2017), Good enough research data management: a very brief guide + Shannon E. Ellies, Jeffrey T. Leek (2017), How to share data for collaboration + Greg Wilson, et al. (2017), Good enough practices in scientific computing
  • 62. Data Coach [ website ] TU/e data librarians (rdmsupport@tue.nl) Leon Osinski, Sjef Öllers Recommended reading Van den Eynden, Veerle e.a. (2011), Managing and sharing data: best practice for researchers, UK Data Archive Strasser, Carly (2015), Research data management, NISO Recommended online course Essentials 4 data support [English & Dutch] Support
  • 63. 1. Tidy data: https://www.jstatsoft.org/article/view/v059i10 2. The “Unusual Episode Data“ revisited: https://www.amstat.org/publications/jse/v3n3/datasets.dawson.html 3. OpenRefine: http://openrefine.org 4. TidyR: http://tidyr.tidyverse.org/ 5. R: https://www.r-project.org/ 6. Metadata standards: http://rd-alliance.github.io/metadata-directory/ 7. Raw Titanic data: https://www.amstat.org/publications/jse/datasets/titanic.dat.txt 8. Documentation to Titanic data: https://www.amstat.org/publications/jse/datasets/titanic.txt 9. Morphological Measurements of Galapagos Finches: http://dx.doi.org/10.5061/dryad.152 10. Looking after data sets: http://blog.revolutionanalytics.com/2015/09/looking-after-datasets.html 11. Descriptive metadata 4TU.ResearchData: http://researchdata.4tu.nl/en/publishing-research/uploading- data/ 12. Creative Commons licenses: https://creativecommons.org/ 13. GNU General Public License: https://www.gnu.org/licenses/gpl-3.0.en.html URL’s of mentioned webpages in order of appearance #1
  • 64. 14. License selector: https://ufal.github.io/public-license-selector/ 15. Preferred data formats of 4TU.ResearchData: http://researchdata.4tu.nl/en/publishing-research/data- description-and-formats/ 16. Eugene Barsky (2017), Good enough research data management: a very brief guide 17. Shannon E. Ellies, Jeffrey T. Leek (2017), How to share data for collaboration 18. Greg Wilson, et al. (2017), Good enough practices in scientific computing 19. TU/e Data Coach: http://www.tue.nl/datacoach 20. Van den Eynden, Veerle e.a. (2011), Managing and sharing data: best practice for researchers, UK Data Archive 21. Carly Strasser, Research data management: http://www.niso.org/apps/group_public/download.php/15375/PrimerRDM-2015-0727.pdf 22. Online course ‘Essentials for data support’: http://datasupport.researchdata.nl/en/ URL’s of mentioned webpages in order of appearance #2

Editor's Notes

  1. Introducing myself and IEC/Library
  2. Question: what do you think f this video? What is, according to Henri Rzepa, the essence of research data management? Does that corresponds with your idea of RDM? Besides sharing your data for re-use, RDM is also about reproducibility. To do a reproducibility check of your results, raw data are needed and an overview of all the steps you have done with your data (from raw data to cleaned data to processed data to analysed data to published data – the figures and tables in your paper). Where did the data come from? Re-use of data is future oriented; reproducibility is past oriented; is about the quality of your data. This course is especially about 1: making data available to others  data sharing requires research data management! RDM is especially about data sharing, not only after your research but also during your research. Your promotor wants to take quick look at your data, your colleague needs some of your data, etc. Quality control or quality assurance of your data: a. protecting against data loss ; b. protecting data authenticity (ensuring that data has not changed after its creation)
  3. Open data = full provenance of where the data comes from + clear copyright statements licences and/or waiver (Egon Willighagen) RDM kent een kennisdelingskant (sharing) en een activiteitenkant (caring) Sharing = via archivering/preservering vooral Caring = data usable en traceable maken Scientific integrity/reproducibility: how did you arrive at your results? Tracing from final outputs like a graph or a figure to the original raw data set Quality control or quality assurance of your data: a. protecting against data loss ; b. protecting data authenticity (ensuring that data has not changed after its creation) This course is especially about 1: making data available to others  data sharing requires research data management! RDM is especially about data sharing, not only after your research but also during your research. Your promotor wants to take quick look at your data, your colleague needs some of your data, etc.
  4. The definition or description of RDM leads to the topics of this training
  5. The first three reasons follow from the description or definition of RDM “Toegang tot ruwe data is belangrijk voor vervolgonderzoek, replicatieonderzoek en integriteitsonderzoek.” During your research: RDM  data sharing ; also merging another person’s data with your data  allows collaboration But: [DMP Driessen]: “There is nothing worse than dig through the data of someone else”!  So, be clear, use standard / quality metadata. Don’t think: only I have to understand it. After your research: doesn’t necessarily mean open access; open access when possible; usability of data precedes openness of data Because data are an asset, worth sharing in order to be reused or built on by others: data-driven science: progress of science not only by building on the same data but especially by combining or merging data from different sources. Not all data are useful for re-use ; Because data provides the evidence for a published paper; data can be asked for by others in view of verifying or replicating your results (scientific integrity). Validating results by replicating them asks for data. UPSIDE: Uniform Principle of Sharing Integral Data and Materials Expeditiously. Reproducibility = being able to go from data to figures/results! Because data are unique and/or valuable 4 kinds of data: observational data (from sensors but also from surveys or field counts): one-time phenomena. In many cases these data cannot be replicated and should therefore be retained experimental data: data from clinical trials, pharmaceutical testing, psychological experiments but also from high-throughput machines like an accelerator. In some cases it is not feasible or ethical to replicate data collecting. Preservation is particularly important for these experimental data computational data generated from (computational) simulations. Data can be regenerated by rerunning the simulation. Nevertheless, preservation over the medium term can be needed: for subsequent analysis (visualization, data mining) and computer time for very large-scale computations can be expensive and/or not available within a short time frame [ reference data sets: mapping the human genome, documenting proteins, longitudinal data on economic and social status ] Trustworthiness and credibility of science: “We found the reluctance to share data to be associated with weaker evidence (against the null hypothesis of no effect) and a higher prevalence of apparent errors in the reporting of statistical results.” The weaker the evidence, the greater the reluctance to share data ; the stronger the evidence, the more willingness to share data. Onderzoeksdata worden minder snel gedeeld als de bewijsvoering zwak is; sneller gedeeld als het bewijs sterk is.
  6. Because journals, funders, universities or code of conducts demand data to be accessible and reusable. If research funders set conditions with regard to data management, this often comes down to the requirement of a data management plan.
  7. ‘Take measures’ = best effort, inspanningsverplichting http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf [Guidelines on data management in Horizon 2020 ] Open research data pilot: ook hergebruik van data ; vooral ingevuld door een DMP [ DMP as an early deliverable within the first six months of the project ] Scope: 7 areas of Horizon 2020 ; €3 billion [ 20% of the overall Horizon 2020 budget 2014-2015 ] Future and emerging technologies Research infrastructures – part e-infrastructures Leadership in enabling and industrial technologies – Information and communication technolgies Societal challenge: ‘Secure, clean and efficient energy’ – part Smart cities and communities Societal challenge: ‘Climate, action, environment, resource efficiency and raw materials’ – except raw materials Societal challenge: ‘Europe in a changing world – inclusive, innovative and reflective societies’ Science with and for society At the proposal submission stage, the information provided is not part of the evaluation. Costs relating to the implementation of the pilot will be eligible 3054 proposals: opt out core areas = 24% ; opt in in other areas = 27% Guidelines on open access to scientific publications and research data in Horizon 2020 (version 1.0, 11 December 2013) Guidelines on data management in Horizon 2020 (version 1.0, 11 december 2013): open research data pilot Open research data pilot / Data management plan [ DMP ] What types of data will the project generate/collect? What standards will be used? How will this data be exploited and/or shared/made accessible for verification and re-use? If data cannot be made available explain why How will this data be curated and preserved?
  8. FAIR data is what Horizon 2020 wants! FAIR data implies sound research data management. Research data management prepares for FAIR Why is sound research data management important? Because it is a key conduit to FAIR data. FAIR data leads to knowledge discovery and innovation
  9. Single DMP for your project to cover its overall approach. However, where there are specific issues for individual datasets (e.g. regarding openness), you shoulds clearly spell this out The DMP is a set of questions about: The handling of research data during and after the project What data sets the project will collect, process and/or generate Whether and how the data sets will be shared/made open access How data will be curated and preserved What measures are taken to safeguard and protect sensitive data
  10. Citaat uit: H2020 programme, Guidelines on FAIR datamanagement in Horizon 2020, version 3.0 (26 July 2016), p. 5 “A data management plan (DMP) is required for all projects participating in the extended ORD pilot…” Participating projects will be required to develop a Data Management Plan (DMP), in which they will specify what data will be open: detailing what data the project will generate, whether and how it will be exploited or made accessible for verification and re-use, and how it will be curated and preserved. The DMP needs to be updated over the course of the project whenever significant changes arise, such as (but not limited to): New data Changes in consortium policies (new innovation potential, decision to file for a patent) Changes in consortium composition and external factors (new mebers joining or old members leaving) The DMP should be updated as a minimum in time with the periodic evaluation/assessment of the project
  11. File naming, organizing data, versioning: gaat over het door jezelf kunnen terugvinden van je data. Als een data file niet meer teruggevonden kan worden is deze ‘verloren’.
  12. This is also about find your own data yourselves! Be organized: design naming schemes for your files and folders Data classification and retention: see DMP Indra Sihar Data classification and retention: if not used, then the data volumes and its costs will grow autonomously and are out-of-control When will what data no longer be useful and can be discarded? Maintaining the integrity of data: this implies protecting the mere existence of data, maintaining quality of data and ensuring that data are accessed only by those authorized to do so. RDM consists of these parts. minimize the risk of data loss or deletion ; protect your data from unauthorized use ; use the correct data. Especially when you edit your data often or collect data through various experiments or tests, identifying the correct data may pose a problem ; RDM enhances the efficiency of your research.
  13. Avoid using special characters because data files can be used for a script! File names should be descriptive, reflect the content and unique (independent of where – in which folder - it is stored.
  14. Zie ook (nog verwerken): http://blogs.library.leiden.edu/researchdata/2016/06/03/best-practices-file-names-and-folder-structures/#more-284
  15. “The first step in making a research project reproducible is to make sure that the files are associated with it are organized” [https://tomwallis.info/2014/01/16/setting-up-a-project-directory/] See also: https://nicercode.github.io/blog/2013-04-05-projects/ en http://dx.doi.org/10.1371/journal.pcbi.1000424 Organizational scheme on the basis of file formats: should all .csv files be grouped together? gebruikersnaam: losinski wachtwoord: srm1248
  16. Other ‘protocols’: https://tomwallis.info/2014/01/16/setting-up-a-project-directory/
  17. Read only ; “keep your raw data raw!”
  18. Subfolders: importing phase, processing phase, generating phase
  19. Misschien in deze folder (Analysis files) een subfolder voor de visualisaties (figuren, tabellen) zelf. Raw data cannot be understood and needs processing; processing gives meaning to raw data, https://twitter.com/TrevorABranch/status/648987799648014336 : “My rule of thumb: every analysis you do on a dataset will have to be redone 10–15 times before publication. Plan accordingly”
  20. Introducing myself and IEC/Library
  21. File naming, organizing data, versioning: gaat over het door jezelf kunnen terugvinden van je data. Als een data file niet meer teruggevonden kan worden is deze ‘verloren’.
  22. Dataverse Network: 2 Gb
  23. Informal peer-to-peer sharing makes it difficult to know which data can be obtained where, requires the right contact, makes managing data access a burden and does not ensure availability of the data in the long-term. Project websites can offer easy immediate storage and dissemination, but will offer less sustainability and it is difficult to control who uses your data and how they use it unless administrative procedures are in place.
  24. 4TU.RD is not about storage! TurBase nog vermeden in de slide Figshare: free till 1 Gb DANS: Dutch, social sciences and humaniora Dryad: not free (90 euro for 10 Gb), only data underlying publications Who knows DOI’s?
  25. Are these data complete? What is missing?
  26. Other = crew
  27. N.Ubkl: distance Nose to Upperbeak = Maxilla = bovenkaak
  28. Fig S2 als voorbeeld (click Save and then Open)
  29. One piece of information: Niet in 1 cel: kolom Adres: Piuslaan 50, 5614 CM Eindhoven maar in aparte kolommen (cellen): Huisnummer: 50; Straat: Piuslaan; Stad: Eindhoven; Postcode: 5614CM.
  30. In Excel: constrain data entries: zodat alleen namen uit een taxonomie ingevoerd kunnen worden ; Categorische of ordinale data: value for sex should be male or female, use ‘poor’, ‘fair’, ‘good’ not 1, 2, 3
  31. losinski srm1248
  32. Anonimiseren: https://research-data-network.readme.io/v1.03/docs/personal-data-resources
  33. R datasets: http://www.inside-r.org/r-doc/datasets R for data science: http://r4ds.had.co.nz/ Quick R: http://www.statmethods.net/index.html
  34. R datasets: http://www.inside-r.org/r-doc/datasets R for data science: http://r4ds.had.co.nz/ Quick R: http://www.statmethods.net/index.html