On National Teacher Day, meet the 2024-25 Kenan Fellows
3TU.Datacentrum: presentation for OpenML Workshop (III) at Eindhoven, 22-10-2014 / Leon Osinski
1. 3TU.Datacentrum
OpenML Workshop (III) @ Eindhoven
TU/e, 22-10-2014
l.osinski@tue.nl, TU/e IEC/Library
Available under CC BY license, which permits
unrestricted use, distribution, and reproduction in
any medium, provided the original author and
source are credited
2. Sharing research data
Why?
It’s expected by research funders, journals, professional organizations and
research evaluators
Because of scientific integrity: reproducibility of results
Because of re-using results: data-driven science
You benefit from it: increases your visibility and enhances the trustworthiness of
your research
How?
On request
Personal website
Publishing / archiving in a repository
International Open Access Week
3. Re-using research data
To be re-used, data should be
Findable: DOI; metadata (to allow discovery)
Accessible: ≠ open access; licenses to use; to humans and machines
Intelligible, assessable: metadata (to allow understandability)
Interoperable: combining across multiple sources
Preserved: long-term availability
Source: Research Data Netherlands /
Marina Noordegraaf
4. 3TU.Datacentrum #1
Findability + citability: 3TU.DC assigns DOI’s; discovery metadata are
mandatory; data sets are indexed by DataCite, Google, Data Citation Index
Accessibility: 3TU.DC = open access; embargo’s (6 months) are allowed
Intelligible, assessable, interoperable: up to the researcher
Preservation: 3TU.DC has quality mark Data Seal of Approval
Source: Research Data Netherlands /
Marina Noordegraaf
5. 3TU.Datacentrum #2
File format support levels
Self-upload of simple data sets (≤ 4 Gb)
Tailor-made solutions
Upload and download statistics
Collections of data sets
Source: Research Data Netherlands /
Marina Noordegraaf
6. DOI’s and OpenML #1
DataCite Netherlands : assigns and distributes DOI’s on behalf of DataCite
to research organizations and data centers in NL
Organizations can register DOI’s for its objects by applying for an account
at DataCite Netherlands
+ Objects need to be persistent, long-term available
+ Objects are preferably open access; restricted access is allowed
+ Objects should be citable (metadata added)
+ Objects must have a public landing page
Source: Research Data Netherlands /
Marina Noordegraaf
7. DOI’s and OpenML #2
Organizations must ensure maintenance and supply of metadata
A contract will be signed to ensure the abovementioned points, after that
the organization will receive its own DOI prefix
Costs: € 1000,- (once-only, subject to changes)
Creating DOIs: manually via web forms ↔ uploading xml resources files
Source: Research Data Netherlands /
Marina Noordegraaf
8. URL’s of mentioned webpages
(in order of appearance)
1. OpenML Workshop (III) @ Eindhoven: http://eindhoven2014.openml.org/
2. Website IEC/Library [TU/e]: http://w3.tue.nl/nl/diensten/bib/
3. Data on request (Reinhart-Rogoff paper): http://dx.doi.org/10.1257/aer.100.2.573
4. Data on personal website (Thomas Piketty): http://piketty.pse.ens.fr/en/capital21c2
5. Publishing data (3TU.Datacentrum): http://data.3tu.org
6. International Open Access Week: http://www.openaccessweek.org
7. DataCite metadata search: http://search.datacite.org/ui
8. Data Citation Index (Thomson Reuters):
http://wokinfo.com/products_tools/multidisciplinary/dci/
9. Data Seal of Approval: http://www.datasealofapproval.org
10. File format support levels:
http://datacentrum.3tu.nl/fileadmin/editor_upload/File_formats/Digital_Preservation_Suppo
rt_levels.pdf
11. DataCite Netherlands: http://datacite.tudelft.nl/info/home/
International Open Access Week
Editor's Notes
Introducing myself and IEC/Library
Open access week
Because data providing the evidence for a published paper can be asked for by others in view of verificating or replicating your results (scientific integrity)
Because journal, funder or code of conduct demand data to be accessible
Because data are unique and / or valuable (non-repeatable observations)
Because data are an asset, worth sharing in order to be reused or built on by others
UPSIDE: Uniform Principle of Sharing Integral Data and Materials Expeditiously
Findable + citeable
Accessibility doesn’t necessarily means open access
Findable: easy to find both by humans and computers based on mandatory description of the metadata that allow researchers to track and trace interesting datasets;
Accessible: stored long term such that they can be easily accessed and/or downloaded with well-defined license and access conditions (Open Access when possible), whether at the level of metadata, or at the level of the actual data content;
Interoperable: ready to be combined (across multiple sources) by humans as well as computers;
Re-Usable: ready to be used for future research and to be processed further using computational methods.
Different levels of accessibility: not accessible, after request, made available on a personal website, published with a DOI; by machines