Research data management: course OGO Quantitative research (21-11-2018)Leon Osinski
Research data management involves three key aspects: 1) protecting data through organized file naming and folder structures, 2) sharing data via collaboration platforms or archives to enable reproducibility and reuse, and 3) caring for data through tidy formatting, thorough metadata and documentation, and use of open standards to ensure understandability and usability.
A basic course on Reseach data management, part 2: protecting and organizing ...Leon Osinski
A basic course on research data management for PhD students. The course consists of 4 parts. The course was given at Eindhoven University of Technology (TUe), 24-01-2017
A basic course on Research data management, part 3: sharing your dataLeon Osinski
A basic course on research data management for PhD students. The course consists of 4 parts. The course was given at Eindhoven University of Technology (TUe), 24-01-2017
Good (enough) research data management practicesLeon Osinski
Slides of a lecture on research data management (RDM), given for 3rd year students (Eindhoven University of Technology, major Psychology & Technology), as part of the course 0HV90 Quantitative Research. At the end of the slides a handy summary 'Research data management basics in a nutshell' is added.
Data Management in the context of Open Science.
Because open access become mandatory for publications and project-funded research data, it is the responsibility of each researcher to be informed and then trained in new practices.
This document provides guidance on managing research data. It discusses planning ahead by considering data needs, formats, volume and ethics. It also covers organizing data through file naming, metadata, references, remote access and safekeeping. Preserving data involves determining what to keep/delete and using long-term storage such as repositories. Reasons for sharing data include scientific integrity, funding mandates and increasing impact, while reasons for not sharing include financial or sensitive personal information.
The document discusses perspectives on metadata from web resources and database systems. It describes how metadata comes in many forms and serves various purposes, such as supporting discovery and identification of information resources on the web (resource metadata), and ensuring consistency and analysis of structured data in databases (metadata in database systems). Resource metadata commonly follows standards and is stored separately from the resources it describes, while database metadata includes both structural metadata describing data organization and content metadata in the form of data dictionaries.
What funders want you to do with your dataLeon Osinski
Funders want researchers to 1) deposit the relevant data from their research in an approved repository to make it FAIR (Findable, Accessible, Interoperable, Reusable), 2) make the data openly available whenever possible, and 3) write a Data Management Plan describing how they will manage their data during and after the project. Funders require depositing data in repositories to enable reuse, making data open access "as open as possible, as closed as necessary", and having a Data Management Plan that addresses reuse according to FAIR principles.
Research data management: course OGO Quantitative research (21-11-2018)Leon Osinski
Research data management involves three key aspects: 1) protecting data through organized file naming and folder structures, 2) sharing data via collaboration platforms or archives to enable reproducibility and reuse, and 3) caring for data through tidy formatting, thorough metadata and documentation, and use of open standards to ensure understandability and usability.
A basic course on Reseach data management, part 2: protecting and organizing ...Leon Osinski
A basic course on research data management for PhD students. The course consists of 4 parts. The course was given at Eindhoven University of Technology (TUe), 24-01-2017
A basic course on Research data management, part 3: sharing your dataLeon Osinski
A basic course on research data management for PhD students. The course consists of 4 parts. The course was given at Eindhoven University of Technology (TUe), 24-01-2017
Good (enough) research data management practicesLeon Osinski
Slides of a lecture on research data management (RDM), given for 3rd year students (Eindhoven University of Technology, major Psychology & Technology), as part of the course 0HV90 Quantitative Research. At the end of the slides a handy summary 'Research data management basics in a nutshell' is added.
Data Management in the context of Open Science.
Because open access become mandatory for publications and project-funded research data, it is the responsibility of each researcher to be informed and then trained in new practices.
This document provides guidance on managing research data. It discusses planning ahead by considering data needs, formats, volume and ethics. It also covers organizing data through file naming, metadata, references, remote access and safekeeping. Preserving data involves determining what to keep/delete and using long-term storage such as repositories. Reasons for sharing data include scientific integrity, funding mandates and increasing impact, while reasons for not sharing include financial or sensitive personal information.
The document discusses perspectives on metadata from web resources and database systems. It describes how metadata comes in many forms and serves various purposes, such as supporting discovery and identification of information resources on the web (resource metadata), and ensuring consistency and analysis of structured data in databases (metadata in database systems). Resource metadata commonly follows standards and is stored separately from the resources it describes, while database metadata includes both structural metadata describing data organization and content metadata in the form of data dictionaries.
What funders want you to do with your dataLeon Osinski
Funders want researchers to 1) deposit the relevant data from their research in an approved repository to make it FAIR (Findable, Accessible, Interoperable, Reusable), 2) make the data openly available whenever possible, and 3) write a Data Management Plan describing how they will manage their data during and after the project. Funders require depositing data in repositories to enable reuse, making data open access "as open as possible, as closed as necessary", and having a Data Management Plan that addresses reuse according to FAIR principles.
The Brain Imaging Data Structure and its use for fNIRSRobert Oostenveld
These slides were prepared for the NIRS toolkit course at the Donders, which due to the Corona crisis has been postponed. The slides present BIDS, explain how fNIRS often involves multiple signals, and relates the two to synchronization and data management
The document provides an overview of the Donders Repository, which aims to securely store original research data, document the research process, and make data accessible to researchers and the public. It describes the procedural design including different roles, collection types, and states. The technical architecture is based on IRODS software and scalable storage. The repository fits into researchers' workflows and supports the timeline of projects from initiation to data sharing. Standards like BIDS help make neuroimaging data FAIR (Findable, Accessible, Interoperable, Reusable).
This document provides an overview of a workshop on good practice in research data management held at the University of Tartu, Estonia. The workshop covered various topics including defining research data, research data management and data management plans, organizing and documenting data, file formats and storage, metadata, security, and sharing and preserving data. The workshop was led by Stuart Macdonald from the University of Edinburgh and included presentations, introductions, and discussions around each of these research data management topics.
Brain Imaging Data Structure and Center for Reproducible NeuroscinceKrzysztof Gorgolewski
This document introduces the Brain Imaging Data Structure (BIDS), a new standardized format for organizing and describing neuroimaging data. BIDS aims to address the heterogeneity in how researchers currently organize their brain imaging data, which causes problems in data sharing and combining data from multiple studies. The key principles of BIDS include adopting existing file formats like NIfTI and JSON, capturing the majority of experimental designs while allowing for extensions, and making the format simple to implement through file naming conventions and folder structures. Tools are being developed to help with validation and conversion of data to the BIDS format to promote its adoption.
This document introduces the Brain Imaging Data Structure (BIDS) standard for organizing neuroimaging data. BIDS aims to standardize how neuroimaging data is organized to facilitate data sharing, processing, and analysis. It specifies a common file and folder structure that encodes metadata in filenames and separate metadata files. Key principles are adopting existing standards where possible, including enough metadata for most experiments while allowing extensions, and making the format simple to implement.
Using Open Science to advance science - advancing open data Robert Oostenveld
This document discusses using open science practices like open data to advance science. It notes the benefits of open data like improved reproducibility and opportunities for data mining. However, sharing neuroimaging and other human subject data presents challenges regarding data size, sensitivity, and privacy regulations. The document promotes using the Brain Imaging Data Structure (BIDS) format to organize data in an open, standardized way. It also discusses the gradient between personal/identifiable data that requires protection and de-identified research data that can be shared, as well as legal constraints and appropriate repositories for sharing data responsibly.
This document discusses best practices for organizing, managing, and publishing research data. It recommends using standardized file naming and folder structures, documenting data through code books and metadata, selecting open formats, and considering issues like data security, versions, and citations. FAIR principles of findable, accessible, interoperable and reusable data are presented. Options in Finland for publishing and archiving research data include repositories like FSD Tietoarkisto and Zenodo. Adopting these practices helps ensure well-organized, documented data that can enable reproducibility and reuse.
This document summarizes a course on data archiving and processing. The course covers archiving theory and practices, data structures and processing, survey documentation, and user guides. It discusses different types of survey designs including cross-sectional, panel, cohort, and retrospective designs. It provides examples of deriving variables, creating analytic files, data linkage for hierarchical and longitudinal data, and exercises for participants. The intended outcomes are understanding the need to archive data, differentiating data types and structures, using software to process and analyze data, and appreciating user guides.
University of Bath Research Data Management training for researchersJez Cope
Slides from a workshop on Research Data Management for research staff and students at the University of Bath.
Part of the Research360 project (http://blogs.bath.ac.uk/research360).
Authors: Cathy Pink and Jez Cope, University of Bath
Data carving using artificial headers info sec conferenceRobert Daniel
This document proposes a new approach to data carving called File Recovery using Artificial Headers (FRAH) that can recover files with corrupted or missing headers. An evaluation of existing data carving tools found they have difficulty recovering fragmented files. FRAH works by inserting an artificial header onto files to circumvent missing headers. Testing showed FRAH could successfully recover files that standard tools could not. However, FRAH has limitations in recovering files where payload data is also missing. Further research is needed to make FRAH more robust.
DataONE Education Module 01: Why Data Management?DataONE
Lesson 1 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
DataONE Education Module 03: Data Management PlanningDataONE
Lesson 3 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
Donders Repository - removing barriers for management and sharing of research...Robert Oostenveld
This is the presentation I gave at the monthly meeting of the Donders Institute PhD council. It shortly explains the Donders Repository, but mainly addresses how to deal with direct and indirectly identifying personal data, with anonymization, pseudomimization and de-identification, and with blurring of research data prior to sharing.
The document discusses methods for tracking the reuse of data from scientific repositories through citation analysis. It outlines initial questions around how data is currently cited and levels of reuse. Methods tested include searching repositories like TreeBASE, Pangaea and ORNL DAAC, as well as databases like ISI Web of Science, Scirus and Google Scholar. Preliminary findings suggest search terms like repository name, DOI and author name had varying effectiveness across sources. Further analysis is needed to solidify conclusions and examine additional repositories, search terms and databases.
Introduction to Research Data Management for postgraduate studentsMarieke Guy
The document provides an introduction to research data management for postgraduate students, outlining what research data is, the research process, what research data management involves and why it is important, and how students can start thinking about good research data management practices. It discusses defining and organizing data, storage and security, and maintaining findable and understandable data throughout the research lifecycle. The goal is to explain the importance of research data management and the roles students play in effective data management.
This document discusses metadata, including what it is, why it is important, common components of metadata records, examples of metadata standards, and tips for writing good metadata. Metadata captures key details about data, such as who created it, when, how, and why, to facilitate discovering, understanding, and reusing the data. Standards provide consistency for computer interpretation and searching. Good metadata includes specific, accurate, and complete details to fully document data.
Research Data Management: What is it and why is the Library & Archives Servic...GarethKnight
This document summarizes research data management and the library and archives service's involvement. It defines research data, explains why data needs to be managed, and outlines the key drivers for data management and publication. It then describes the library and archives service's knowledge of data management, the research data management support service being established, and the guidance, training, and tools being developed to help researchers with data management.
The Brain Imaging Data Structure and its use for fNIRSRobert Oostenveld
These slides were prepared for the NIRS toolkit course at the Donders, which due to the Corona crisis has been postponed. The slides present BIDS, explain how fNIRS often involves multiple signals, and relates the two to synchronization and data management
The document provides an overview of the Donders Repository, which aims to securely store original research data, document the research process, and make data accessible to researchers and the public. It describes the procedural design including different roles, collection types, and states. The technical architecture is based on IRODS software and scalable storage. The repository fits into researchers' workflows and supports the timeline of projects from initiation to data sharing. Standards like BIDS help make neuroimaging data FAIR (Findable, Accessible, Interoperable, Reusable).
This document provides an overview of a workshop on good practice in research data management held at the University of Tartu, Estonia. The workshop covered various topics including defining research data, research data management and data management plans, organizing and documenting data, file formats and storage, metadata, security, and sharing and preserving data. The workshop was led by Stuart Macdonald from the University of Edinburgh and included presentations, introductions, and discussions around each of these research data management topics.
Brain Imaging Data Structure and Center for Reproducible NeuroscinceKrzysztof Gorgolewski
This document introduces the Brain Imaging Data Structure (BIDS), a new standardized format for organizing and describing neuroimaging data. BIDS aims to address the heterogeneity in how researchers currently organize their brain imaging data, which causes problems in data sharing and combining data from multiple studies. The key principles of BIDS include adopting existing file formats like NIfTI and JSON, capturing the majority of experimental designs while allowing for extensions, and making the format simple to implement through file naming conventions and folder structures. Tools are being developed to help with validation and conversion of data to the BIDS format to promote its adoption.
This document introduces the Brain Imaging Data Structure (BIDS) standard for organizing neuroimaging data. BIDS aims to standardize how neuroimaging data is organized to facilitate data sharing, processing, and analysis. It specifies a common file and folder structure that encodes metadata in filenames and separate metadata files. Key principles are adopting existing standards where possible, including enough metadata for most experiments while allowing extensions, and making the format simple to implement.
Using Open Science to advance science - advancing open data Robert Oostenveld
This document discusses using open science practices like open data to advance science. It notes the benefits of open data like improved reproducibility and opportunities for data mining. However, sharing neuroimaging and other human subject data presents challenges regarding data size, sensitivity, and privacy regulations. The document promotes using the Brain Imaging Data Structure (BIDS) format to organize data in an open, standardized way. It also discusses the gradient between personal/identifiable data that requires protection and de-identified research data that can be shared, as well as legal constraints and appropriate repositories for sharing data responsibly.
This document discusses best practices for organizing, managing, and publishing research data. It recommends using standardized file naming and folder structures, documenting data through code books and metadata, selecting open formats, and considering issues like data security, versions, and citations. FAIR principles of findable, accessible, interoperable and reusable data are presented. Options in Finland for publishing and archiving research data include repositories like FSD Tietoarkisto and Zenodo. Adopting these practices helps ensure well-organized, documented data that can enable reproducibility and reuse.
This document summarizes a course on data archiving and processing. The course covers archiving theory and practices, data structures and processing, survey documentation, and user guides. It discusses different types of survey designs including cross-sectional, panel, cohort, and retrospective designs. It provides examples of deriving variables, creating analytic files, data linkage for hierarchical and longitudinal data, and exercises for participants. The intended outcomes are understanding the need to archive data, differentiating data types and structures, using software to process and analyze data, and appreciating user guides.
University of Bath Research Data Management training for researchersJez Cope
Slides from a workshop on Research Data Management for research staff and students at the University of Bath.
Part of the Research360 project (http://blogs.bath.ac.uk/research360).
Authors: Cathy Pink and Jez Cope, University of Bath
Data carving using artificial headers info sec conferenceRobert Daniel
This document proposes a new approach to data carving called File Recovery using Artificial Headers (FRAH) that can recover files with corrupted or missing headers. An evaluation of existing data carving tools found they have difficulty recovering fragmented files. FRAH works by inserting an artificial header onto files to circumvent missing headers. Testing showed FRAH could successfully recover files that standard tools could not. However, FRAH has limitations in recovering files where payload data is also missing. Further research is needed to make FRAH more robust.
DataONE Education Module 01: Why Data Management?DataONE
Lesson 1 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
DataONE Education Module 03: Data Management PlanningDataONE
Lesson 3 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
Donders Repository - removing barriers for management and sharing of research...Robert Oostenveld
This is the presentation I gave at the monthly meeting of the Donders Institute PhD council. It shortly explains the Donders Repository, but mainly addresses how to deal with direct and indirectly identifying personal data, with anonymization, pseudomimization and de-identification, and with blurring of research data prior to sharing.
The document discusses methods for tracking the reuse of data from scientific repositories through citation analysis. It outlines initial questions around how data is currently cited and levels of reuse. Methods tested include searching repositories like TreeBASE, Pangaea and ORNL DAAC, as well as databases like ISI Web of Science, Scirus and Google Scholar. Preliminary findings suggest search terms like repository name, DOI and author name had varying effectiveness across sources. Further analysis is needed to solidify conclusions and examine additional repositories, search terms and databases.
Introduction to Research Data Management for postgraduate studentsMarieke Guy
The document provides an introduction to research data management for postgraduate students, outlining what research data is, the research process, what research data management involves and why it is important, and how students can start thinking about good research data management practices. It discusses defining and organizing data, storage and security, and maintaining findable and understandable data throughout the research lifecycle. The goal is to explain the importance of research data management and the roles students play in effective data management.
This document discusses metadata, including what it is, why it is important, common components of metadata records, examples of metadata standards, and tips for writing good metadata. Metadata captures key details about data, such as who created it, when, how, and why, to facilitate discovering, understanding, and reusing the data. Standards provide consistency for computer interpretation and searching. Good metadata includes specific, accurate, and complete details to fully document data.
Research Data Management: What is it and why is the Library & Archives Servic...GarethKnight
This document summarizes research data management and the library and archives service's involvement. It defines research data, explains why data needs to be managed, and outlines the key drivers for data management and publication. It then describes the library and archives service's knowledge of data management, the research data management support service being established, and the guidance, training, and tools being developed to help researchers with data management.
http://kulibrarians.g.hatena.ne.jp/kulibrarians/20170222
Presentation by Cuna Ekmekcioglu (The University of Edinburgh)
- Creating and Managing Digital Research Data in Creative Arts: An overview (2016)
CC BY-NC-SA 4.0
Documentation and Metdata - VA DM BootcampSherry Lake
This document discusses documentation and metadata for research data. It begins with an overview of why documentation is important at different stages of the research data lifecycle from collection through archiving. Key elements to document include how the data was created, its content and structure, who created and maintains it, and how it can be accessed and cited. The document then discusses common documentation formats like readmes, data dictionaries, and codebooks. It also introduces metadata as structured information that describes resources and explains common metadata standards and tools for creating structured metadata files. Exercises guide creating documentation in these formats for a weather dataset example.
This document provides information about a course on research data management. The course covers topics like data management planning, writing data management plans, data citation, data archiving, and legal issues related to data management. It includes an agenda with presentations, activities, and discussions on these topics led by data librarians and legal advisors. The goal is to teach researchers the importance of proper research data management practices.
This poster presents guidelines for researchers to improve reproducibility in scientific research by better documenting the key entities of research: data, software, workflow, and research output. It recommends documenting data sources and processing steps, writing descriptive code with examples, and using tools like Docker, Jupyter notebooks, LaTeX, and data repositories to capture the experimental environment and research process. Following these guidelines helps researchers communicate and verify their work, allowing others to build on their research findings.
The data science process document outlines the typical steps involved in a data science project including: 1) setting research goals, 2) retrieving data from internal or external sources, 3) preparing data through cleansing and transformation, 4) performing exploratory data analysis, 5) building models using techniques like machine learning or statistics, and 6) presenting and automating results. It also discusses challenges in working with different file formats and the importance of understanding various formats as a data scientist.
This document summarizes a seminar on data management for undergraduate researchers. It discusses what data is, why it needs to be managed, and key aspects of the data management process such as data organization, metadata, storage, and archiving. Topics covered include file naming best practices, version control, documentation, metadata standards, storage options, and long-term archiving. The goal is to help researchers organize and document their data so it can be understood, preserved, and reused.
The document provides guidance on early planning for data management, including becoming familiar with funder requirements, planning for the types and formats of data that will be created, designing a system for taking notes, organizing files through consistent naming schemes and use of folders, adding metadata to files to aid in documentation and discovery, and using RSS feeds to organize web-based information. It also touches on issues like plagiarism, data protection, intellectual property rights, and remote access to and backup of data.
This world have numerous kinds and diversity .This kinds and diversity remain in whole world two two third is aquatic ,fresh water and marine water .This kinds and diversites knowledge and their total knowledge file management is very importance for fisheries science.
This freshwater and marine water has a huge number of vertebrate and invertebrate animals and plants. Thair identify and use is vary importance for fisheries and aquaculture .for that their proper file management is play a useful role in fisheries and aquaculture.
If we went to know the total plant and animals this is not possible to proper file management.
Culturable species and there predator knowledge and file management is vary importance for aquaculture .culturable species habitats and their food habit is very importance for successful aquaculture and also importance in breeding season and behavior and high growth rate fish data .There proper management and for fisheries student study documents is very important. So file management is very importance in fisheries science.
A basic course on Research data management, part 4: caring for your data, or ...Leon Osinski
A basic course on research data management for PhD students. The course consists of 4 parts. The course was given at Eindhoven University of Technology (TUe), 24-01-2017
Research data management during and after your research ; an introduction / L...Leon Osinski
This document outlines a workshop on research data management for PhD students. The workshop covers managing data during research to ensure integrity and allow replication, as well as archiving or publishing data after research. During the workshop, presentations will discuss scientific integrity and data management during research, and data management after research. Discussions will explore topics like dealing with failed experiments, accessibility of data during research, and archiving data after a project is finished. The goal is to provide insight on responsible data practices during and after research.
How to best manage your data to make the most of it for your research - With ODAM Framework (Open Data for Access and Mining) Give an open access to your data and make them ready to be mined
Managing data throughout the research lifecycleMarieke Guy
This document summarizes a presentation about managing data throughout the research lifecycle. It discusses the stages of the research lifecycle, including planning, data creation, documentation, storage, sharing, and preservation. It provides examples of research lifecycle models and addresses key questions to consider at each stage, such as what formats to use, how to document data, where to store it, and how to share and preserve it. The presentation emphasizes making informed decisions about data management and talking to colleagues for support and advice.
This document discusses best practices for data organization, documentation, and metadata. It recommends using open standard file formats that will remain readable over time, consistent file naming conventions with descriptive names, and version control for files. Metadata should include descriptive, technical, and administrative information to document the data and ensure it can be understood and managed. Good documentation involves information on the data collection process and dataset structure.
This presentation discusses managing research data through the data life cycle. It begins with an overview of the research life cycle and embedding the data life cycle within it. Key aspects of data management are then covered, including why manage data, ethical and legal issues, requirements for data sharing and retention, and creating a data management plan. The rest of the presentation delves into each stage of the data life cycle, providing best practices for data collection, organization, security, storage, documentation, processing, analysis, and long-term preservation or sharing. File formats, metadata, repositories, and bibliographic resources are also addressed.
This document provides guidance on managing research data. It discusses planning ahead to consider data needs, formats, and volume. It emphasizes organizing data through file naming, metadata, references, email, and remote access. It stresses preserving data by determining what to keep/delete, using long-term storage such as repositories or archives. Finally, it examines reasons to share data such as scientific integrity, funding mandates, and increasing impact and collaboration.
Similar to Research data management: course 0HV90, Behavioral Research Methods (20)
PROOF course Writing articles and abstracts in English, part: Copyright in ac...Leon Osinski
For this presentation students need to have seen 5 web lectures on copyright. During the presentation, the knowledge gained by the students by looking at the web lectures will be tested on the basis of a number of practical questions.
Research data management at TU EindhovenLeon Osinski
The document discusses research data management at TU Eindhoven. It outlines the long process of developing RDM practices since 2008. It describes the current organization and governance structure for RDM. Key external requirements for RDM from funders, regulations, and integrity standards are also summarized. The document concludes by outlining RDM support services available and the benefits of good RDM practices.
The document discusses the use of Creative Commons licenses for research data. It notes that funders and universities are pushing for open access to research articles and data. However, applying a CC BY license fully transfers copyright to the public domain. For data, researchers must ensure they own the copyright and are authorized to license it. Less restrictive licenses like CC BY-NC still allow commercial reuse with permission. The document debates finding a balance between open access and allowing researchers to control dissemination and potential rewards from their data.
Be open: what funders want you to do with your publications and research dataLeon Osinski
Research funders want researchers to:
1. Publish research articles through open access to make the articles widely available.
2. Deposit the underlying research data in repositories to make the data findable, accessible, interoperable, and reusable (FAIR).
3. Attach open licenses like CC BY to both publications and data to allow for commercial reuse when possible.
A basic course on Research data management: part 1 - part 4Leon Osinski
Slides belonging to a basic course on research data management. The course consists of 4 parts:
Part 1: what and why
1.1 data management plans
Part 2: protecting and organizing your data
2.1 data safety and data security
2.2 file naming, organizing data (TIER documentation protocol)
Part 3: sharing your data
3.1 via collaboration platforms (during research)
3.2 via data archives (after your research)
Part 4: caring for your data, or making data usable
4.1 tidy data
4.2 documentation/metadata
4.3 licenses
4.4 open data formats
A basic course on Research data management, part 1: what and whyLeon Osinski
A basic course on research data management for PhD students. The course consists of 4 parts. The course was given at Eindhoven University of Technology (TUe), 24-01-2017
3TU.Datacentrum: presentation for OpenML Workshop (III) at Eindhoven, 22-10-2...Leon Osinski
This document discusses sharing and reusing research data. It explains that sharing data is expected by funders and important for reproducibility, reusing results, and increasing visibility. To be reusable, data should be findable, accessible, intelligible, interoperable, and preserved. The 3TU.Datacentrum assists with assigning DOIs for citation, makes data openly accessible with some embargo options, and ensures long-term preservation. DOIs are assigned through DataCite Netherlands, which research organizations can register with for a fee.
Horizon 2020 and research data : info meeting Horizon 2020 @ TUe, 07-10-2014 ...Leon Osinski
This document discusses research data management (RDM) and the open data pilot program in Horizon 2020. It provides information on why RDM is important, noting key stakeholders that expect data sharing, and how RDM enables data re-use and integrity. The Horizon 2020 open data pilot program is described, including the seven research areas included in the pilot and funder requirements for a Data Management Plan and depositing data in repositories. Guidance and support resources for participating in the open data pilot are also listed.
Copyright and citation issues : PROOF course Writing articles and abstracts /...Leon Osinski
As an author of scholarly papers, you will use in your paper materials (text fragments, picture, tables, figures) of other people. In most cases this material is copyright-protected which means that in most cases, not always, you have to ask permission to re-use that material and to attribute the source of the material. This is also the first topic of this lecture: you as a user of copyright-protected material.
In the second place, when you’re done writing you want to publish your paper in a journal. In most cases, not always, this goes with a transfer of the copyright that you initially own to a publisher. Transfer of copyright has some consequences and this is the second topic of this presentation: you as a producer of copyright-protected material.
Onderzoeksdata-bepalingen van financiers van universitair onderzoek in NL: Ma...Leon Osinski
Onderzoeksdata-bepalingen van financiers van universitair onderzoek in NL : presentatie Master Class Research Data Management in Nederland, Maastricht, 3/4 april 2014.
UKB Werkgroep Datamanagement,Voorwaarden van Financiers.
Maarten van Bentum, Henk van den Hoogen, Leon Osinski
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Kaxil Naik
Navigating today's data landscape isn't just about managing workflows; it's about strategically propelling your business forward. Apache Airflow has stood out as the benchmark in this arena, driving data orchestration forward since its early days. As we dive into the complexities of our current data-rich environment, where the sheer volume of information and its timely, accurate processing are crucial for AI and ML applications, the role of Airflow has never been more critical.
In my journey as the Senior Engineering Director and a pivotal member of Apache Airflow's Project Management Committee (PMC), I've witnessed Airflow transform data handling, making agility and insight the norm in an ever-evolving digital space. At Astronomer, our collaboration with leading AI & ML teams worldwide has not only tested but also proven Airflow's mettle in delivering data reliably and efficiently—data that now powers not just insights but core business functions.
This session is a deep dive into the essence of Airflow's success. We'll trace its evolution from a budding project to the backbone of data orchestration it is today, constantly adapting to meet the next wave of data challenges, including those brought on by Generative AI. It's this forward-thinking adaptability that keeps Airflow at the forefront of innovation, ready for whatever comes next.
The ever-growing demands of AI and ML applications have ushered in an era where sophisticated data management isn't a luxury—it's a necessity. Airflow's innate flexibility and scalability are what makes it indispensable in managing the intricate workflows of today, especially those involving Large Language Models (LLMs).
This talk isn't just a rundown of Airflow's features; it's about harnessing these capabilities to turn your data workflows into a strategic asset. Together, we'll explore how Airflow remains at the cutting edge of data orchestration, ensuring your organization is not just keeping pace but setting the pace in a data-driven future.
Session in https://budapestdata.hu/2024/04/kaxil-naik-astronomer-io/ | https://dataml24.sessionize.com/session/667627
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
Build applications with generative AI on Google CloudMárton Kodok
We will explore Vertex AI - Model Garden powered experiences, we are going to learn more about the integration of these generative AI APIs. We are going to see in action what the Gemini family of generative models are for developers to build and deploy AI-driven applications. Vertex AI includes a suite of foundation models, these are referred to as the PaLM and Gemini family of generative ai models, and they come in different versions. We are going to cover how to use via API to: - execute prompts in text and chat - cover multimodal use cases with image prompts. - finetune and distill to improve knowledge domains - run function calls with foundation models to optimize them for specific tasks. At the end of the session, developers will understand how to innovate with generative AI and develop apps using the generative ai industry trends.
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
Research data management: course 0HV90, Behavioral Research Methods
1. Research data management
part 1: what and why
Course 0HV90
TU/e, 22-11-2017
l.osinski@tue.nl, TU/e IEC/Library
Available under CC BY-SA license, which permits copying
and redistributing the material in any medium or format &
adapting the material for any purpose, provided the original
author and source are credited & you distribute the
adapted material under the same license as the original
2. Research data management [RDM]
what #1
Essence of RDM: “… tracking back to what you did 7
years ago and recovering it (...) immediately in a re-
usable manner.” (Henry Rzepa)
3. Research data management [RDM]
what #2
RDM: caring for your data with the purpose to:
1. protect their mere existence: data loss, data authenticity
(RDM basics)
2. share them with others
a. for reasons of reuse: in the same context or in a different context; during
research and after research
b. for reasons of reproducibility checks scientific integrity; data quality
RDM = good data practices that make your data easy to work with,
understandable, and available to other scientists
4. Source: Research Data
Netherlands / Marina
Noordegraaf
Outline
1. Research data management [RDM]: what and why
2. Sharing your data, or making your data findable and accessible
a. data protection: back up, file naming, organizing data
b. data sharing: via collaboration platforms, data archives
3. Caring for your data, or making your data usable and interoperable
a. tidy data
b. metadata/documentation
c. licenses
d. open data formats
5. Because you work together with other researchers collaborative science
Because of re-using results: data-driven science open science
Because of scientific integrity: validating data analysis by reproducibility checks
requires data and the code that is used to clean, process and analyze the data and
to produce the final outputs reproducible science
Additional reasons
Because your data are unique / not easily repeatable
(long term observational data)
Because it’s required… by funders like NWO and EC and
your university
Why sharing research data? #1
7. Reasons not to share your data
Preparing my data for sharing takes time and effort
But research data management also increases your research efficiency
My data are confidential
But you can anonymize or pseudonymize your data
My data still need to yield publications
But you can publish your data under an embargo and by publishing your data you
establish priority and you can get credits for it
My data can be misused or misinterpret
But the best defense against malicious use is to refer to an archival copy of your
data which is guaranteed exactly as you mean it to be
My data are only interesting for me
But sharing your data may be required by a funder /
journal or your data may be requested to validate your
results
9. RDM: caring for your data with the purpose to:
1. protect their mere existence: data loss, data authenticity
(RDM basics)
2. share them with others
a. for reasons of reuse: in the same context or in a different context;
during research and after research
b. for reasons of reproducibility checks scientific integrity; data
quality
Research data management
10. Be safe
storage, backup data safety, protecting against loss
+ 3-2-1 rule: save 3 copies of your data, on 2 different devices,
and 1 copy off site
+ use local ICT infrastructure (university network servers) if
possible
access control data security, protecting against unauthorized use:
with SURFdrive, Open Science
Framework or DataverseNL
for example you can arrange
access to your data
Protecting your data #1
good data practices during your research
11. Be organized, or: you (and others) should be able to tell
what’s in a file without opening it
file-naming
organizing data in folders
Protecting your data #2
good data practices during your research
“…we can copy everything and do not manage it well.” (Indra Sihar)
12. File naming #1
File naming
Good file names are:
consistent (use file-naming conventions)
unique (distinguishes a file from files
with similar subjects as well as different
versions of the file), and
meaningful (use descriptive names).
+ Avoid using special characters in file
names
+ Include dates (format YYYYMMDD) and
a version number on file names
Source: Best practices for file naming (Stanford University Libraries)
13. File naming #2
Order by date:
2013-04-12_interview-recording_THD.mp3
2013-04-12_interview-transcript_THD.docx
2012-12-15_interview-recording_MBD.mp3
2012-12-15_interview-transcript_MBD.docx
Order by subject:
MBD_interview-recording_2012-12-15.mp3
MBD_interview-transcript_2012-12-15.docx
THD_interview-recording_2013-04-12.mp3
THD_interview-transcript_2013-04-12.docx
Order by type:
Interview-recording_MBD_2012-12-15.mp3
Interview-recording_THD_2013-04-12.mp3
Interview-transcript_MBD_2012-12-15.docx
Interview-transcript_THD_2013-04-12.docx
Forced order with numbering:
01_THD_interview-recording_2013-04-12.mp3
02_THD_interview-transcript_2013-04-12.docx
03_MBD_interview-recording_2012-12-15.mp3
04_MBD_interview-transcript_2012-12-15.docx
Think about the ordering of elements in a filename
14. Organizing your data in folders #1
PAGE 1422-11-2017
Source: Haselager, G.J.T., Aken, M.A.G. van
(2000): Personality and Family
Relationships. DANS.
http://dx.doi.org/10.17026/dans-xk5-y7vc .
15. Organizing your data in folders #2
TIER documentation protocol
Guiding principles of TIER documentation protocol
1. keep your raw or original data raw
+ save your raw data read-only in its original format in a separate folder
+ make a working copy of your raw data (input data, used for
processing and analysis)
2. keep the command files (files containing code written in the syntax of the
(statistical) software you use for the study) apart from the data
3. keep the analysis files (the fully cleaned and processed data files that you
use to generate the results reported in your paper) in a separate folder
4. store the metadata (codebook, description of variables, etc.) in a separate
folder, apart from the data itself
16. Organizing your data in folders #3
TIER documentation protocol
1. Main project folder (name of your research project/working title of your
paper)
1.1. Original data and metadata
1.1.1. Original data
1.1.2. Metadata
1.1.2.1. Supplements
1.2. Processing and analysis files
1.2.1. Importable data files
1.2.2. Command files
1.2.3. Analysis files
1.3. Documents
1.4. Literature
17. 1. Main project folder (name of your research project/working title of your
paper)
1.1. Original data and metadata
1.1.1. Original data (raw data, obtained/gathered data)
Any data that were necessary for any part of the processing and/or
analysis you reported in you paper.
Copies of all your original data files, saved in exactly the format it was
when you first obtained it. The name of the original data file may be
changed
Keep these data read only!
1.1.2. Metadata
1.1.2.1. Supplements
Organizing your data in folders #4
TIER documentation protocol
18. 1. Main project folder (name of your research project/working title of your paper)
1.1. Original data and metadata
1.1.1. Original data
1.1.2. Metadata
The Metadata Guide: document that provides information about each of your
original data files. Applies especially to obtained data files
A bibliographic citation of the original data files, including the date you
downloaded or obtained the original data files and unique identifiers that
have been assigned to the original data files.
Information about how to obtain a copy of the original data file
Whatever additional information to understand and use the data in the
original data file
1.1.2.1. Supplements
Additional information about an original data file that’s not written by
yourself but that is found in existing supplementary documents, such as
users’ guides and code books that accompany the original data file
Organizing your data in folders #5
TIER documentation protocol
19. Organizing your data in folders #6
TIER documentation protocol
1. Main project folder (name of your research project/working title of your paper)
1.1. Original data and metadata
1.1.1. Original data
1.1.2. Metadata
1.1.2.1. Supplements
1.2. Processing and analysis files
1.2.1. Importable data files (the data you work with, input data, suitable for
processing and analysis)
A corresponding version for each of the original data files. This version can be identical
to the original version, or in some cases it will be a modified version.
For example modifications required to allow your software to read the file (converting
the file to another format, removing unusable data or explanatory notes from a table)
The original and importable versions of a data file should be given different names
The importable data file should be as nearly as identical as possible to the original
The changes you make to your original data files to create the corresponding
importable data files should be described in a Readme file
1.2.2. Command files
1.2.3. Analysis files
20. Organizing your data in folders #7
TIER documentation protocol
1. Main project folder (name of your research project/working title of your paper)
1.1. Original data and metadata
1.1.1. Original data
1.1.2. Metadata
1.1.2.1. Supplements
1.2. Processing and analysis files
1.2.1. Importable data files
1.2.2. Command files
One or more files containing code written in the syntax of the (statistical) software you use
for the study
Importing phase: commands to import or read the files and save them in a format that
suits your software
Processing phase: commands that execute all the processing required to transform the
importable version of your files into the final data files that you will use in your analysis
(i.e. cleaning, recoding, joining two or more data files, dropping variables or cases,
generating new variables)
Generating the results: commands that open the analysis data file(s), and then
generate the results reported in your paper.
1.2.3. Analysis files
21. Organizing your data in folders #8
TIER documentation protocol
1. Main project folder (name of your research project/working title of your paper)
1.1. Original data and metadata
1.1.1. Original data
1.1.2. Metadata
1.1.2.1. Supplements
1.2. Processing and analysis files
1.2.1. Importable data files
1.2.2. Command files
1.2.3. Analysis files
The fully cleaned and processed data files that you use to generate the
results reported in your paper in your paper
The Data Appendix: codebook for your analysis data files: brief description
of the analysis data file(s), a complete definition of each variable (including
coding and/or units of measurement), the name of the original data files
from which the variable was extracted, the number of valid observations for
the variable, and the number of cases with missing values
22. Organizing your data in folders #9
TIER documentation protocol
1. Main project folder (name of your research project/working title of your paper)
1.1. Original data and metadata
1.1.1. Original data
1.1.2. Metadata
1.1.2.1. Supplements
1.2. Processing and analysis files
1.2.1. Importable data files
1.2.2. Command files
1.2.3. Analysis files
1.3. Documents
An electronic copy of your complete final paper
The Readme-file for your replication documentation
What statistical software or other computer programs are needed to run the
command files
Explain the structure of the hierarchy of folders in which the documentation is
stored
Describe precisely any changes you made to your original data files to create
the corresponding importable data files
Step-by-step instructions for using your documentation to replicate the
statistical results reported in your paper
1.4. Literature
Retrieved relevant literature
24. RDM: caring for your data with the purpose to:
1. protect their mere existence: data loss, data authenticity
(RDM basics)
2. share them with others
a. for reasons of reuse: in the same context or in a different context;
during research and after research
b. for reasons of reproducibility checks scientific integrity; data
quality
Research data management
25. During research After researchInstitutionDisciplin
Local
ICT
services
Overview research data sharing
and storage services
Data sharing per se is pretty straightforward
26. Sharing your data
during your research versus after your research
During your research via collaboration or sharing platforms
Data sharing is (more) aimed at collaboration, at working
together on data
Being able to control access to data is crucial
After your research via data archives or repositories
Publishing and keeping an archival copy of data is essential
Long term preservation of your data is important
27. General data sharing platforms:
SURFdrive [TU/e only]: Dutch academic Dropbox, 250 Gb,
maximum data transfer 16 Gb
Students can request a SURFdrive account
Google Drive, Dropbox: don’t use these to store sensitive data
Sharing your data
during your research: general data sharing platforms
28. DataverseNL [TU/e only]: data sharing platform for active
research data [based on Harvard’s Dataverse Project] where you
may:
store your data in an organized
and safe way
clearly describe your data
version control of your data
arrange access to your data
Sharing your data
during your research: DataverseNL
Students may use DataverseNL
Go to: https://dataverse.nl/
Click ‘Log in’ (at the top right); under Institutional
account click SURFconext
Select Eindhoven University of Technology and log
on with your TU/e username and password
When asked for it, give permission to share your
data by answering Yes or click this Tab
When asked to create an account, answer Yes or
click this Tab.
When you succeeded to create an account, your
username is the prefix of your email address
Click 4TU dataverse Eindhoven dataverse Add
data: you can now create and publish data sets,
upload files and assign access rights to data sets or
files.
30. On request
“I'd like to thank E.J. Masicampo and Daniel LaLande for sharing and allowing me to share
their data…”
Daniël Lakens (2014), What p-hacking really looks like: A comment on Masicampo & LaLande (2012)
On a (personal) website
“Let me start by saying that the reason why I put all excel files online, including all the
detailed excel formulas about data constructions and adjustments, is precisely because I
want to promote an open and transparent debate about these important and sensitive
measurement issues.”
Thomas Piketty, My response to the Financial Times, HuffPost The Blog, 29-05-2014 ;
originally published as Addendum: Response to FT, 28-05-2014
A data journal
Journal of open psychology data, Geoscience data journal,
Data in brief, Scientific data, Data reports
Sharing your data
after your research
Source: www.aukeherrema.nl
31. Choose a repository where other researchers in your discipline are sharing
their data
If not available, use a multidisciplinary or general repository that at least
assigns a persistent identifier to your data (DOI) and requires that you provide
adequate metadata
for example Zenodo, Figshare, DANS, B2SHARE, or;
4TU.ResearchData
+ DOI’s [ persistent identifier for citability and retrievability ]
+ open access
+ long-term availability, Data Seal of Approval
+ self-upload (single data sets < 3Gb)
Sharing your data
after your research has ended, via an archive or repository
32. Link your data to your publication
Sharing your data
link your data to your publication
33. The Human Technology Interaction group of the department of
Industrial Engineering & Innovation Sciences requires that an
archival copy of the results (‘replication package’) will be
deposited in ARCHIE
the copy cannot be changed
the copy is only accessible to the researcher
Archiving your data
after your research has ended
35. The welfare consequences… /
by Jonathan J. Cooper et. al.,
http://dx.doi.org/10.1371/jou
rnal.pone.0102722
Word.doc
Before data can be
reusable, it has first
to be usable
36. Making your data usable and interoperable with good data
practices
tidy data
metadata/documentation
licenses
open data formats
Research data management
37. Tidy data is about structure of a table / data set.
Tidy data ≠ clean data. It’s a step towards clean data
+ Each variable you measure is in one column
+ Column headers are variable names
+ Each observation is in a different row
+ Every cell contains only one piece of information
Tidy data
what
38. Tidy data allow your data to be easily:
+ imported by data management systems
+ analyzed by analysis software
+ visualized, modelled, transformed
+ combined with other data (interoperability)
Tidy data
why: making your data easy to handle for computers
39. Tidy data versus messy data
1. More than one variable in a
single column (‘clumped data’)
2. Column headers are values, or:
one variable over many columns
(‘wide data’)
3. Variables are in rows and
columns
4. More pieces of information in
one cell (cells are highlighted or
colored; values and
measurement units in one cell)
1. Each variable you measure
is in one column
2. Column headers are
variable names
3. Each observation is in a
different row
4. Every cell contains only one
piece of information
Tidy data Messy data
40. patient_id drug_a drug_b
1 67 56
2 80 90
3 64 50
4 85 75
Tidy data versus messy data
example
‘Wide’ data: one variable
over many columns Tidy data
patient_id drug heart_rate
1 a 67
2 a 80
3 a 64
4 a 85
1 b 56
2 b 90
3 b 50
4 b 75
41. Tools for tidying data
OpenRefine
download OpenRefine: http://openrefine.org/download.html
runs on your computer (not in the cloud), inside the Firefox browser (not in
IE), no web connection is needed
captures all steps done to your raw data ; original dataset is not modified;
steps are easily reversed;
R, TidyR package
scripted language (R (free), Matlab, SAS…) to process data (tidying,
cleaning, etc.), run the analysis and to produce final outputs
versus
Excel: data provenance and documentation of data processing with a
graphical user interface is bad because it doesn’t leaves a record
42. The table or data set itself
+ columns: use clear, descriptive variable names (no hard to
understand abbreviations), avoid special characters (can cause
problems with some software)
+ rows: if possible, use standard names within cells (derived
from a taxonomy, for example: standard species name, CAS
registry for chemical substances, standard date formats, …)
+ try to avoid coding categorical or ordinal data as numbers
+ missing data: use NA
Documentation / metadata
making your data understandable for humans #1
43. The table or data set as a whole
A description (documentation) that at least mentions:
+ size of the data set: number of observations and variables
+ information about the variables and its measurement units
(code book)
+ what’s included and excluded in the data set, why data are
missing
+ description of how you collected the data (study design), data
manipulation steps (provenance)
+ when your data consists of multiple files organized in a folder
structure, an explanation of the structure and naming of the
files
Documentation / metadata
making your data understandable for humans #2
“Research outputs that are poorly documented are like canned goods with the label
removed (…)” (Carly Strasser)
44. Documentation / metadata
metadata standards
Sometimes there are metadata standards for the
documentation of your data set but where no standard
exists, a simple readme file can be good enough
46. Documentation / metadata
making your data findable for humans and search engines
Descriptive metadata for discovery and identification of
your data mainly
+ creator
+ title
+ short description + key words
+ date(s) of data collection
+ publication year
+ related publications
+ DOI (assigned by data archive)
+ etc.
When uploading your
data in a data archive
like 4TU.ResearchData,
you will be asked to
enter these metadata
A DOI is assigned by
the data archive
47. User license
making clear that other people are allowed to use your data
Let other people know in advance what they are
allowed to do with your data by attaching a user license
to it
+ Creative Commons license for data sets
+ GNU General Public License (GPL) for software
+ License selector
48. Open data formats
ensuring the ‘longevity’ of your data
+ with open (non-proprietary) data formats it is best
ensured that the data will remain usable and ‘legible’
for computers in the future
+ are easy to use in a variety of software, like .csv for
tabular data
+ check the data formats that are supported by a data
archive like 4TU.ResearchData
49. Research data management
recommended reading on ‘good data practices’
1. Goodman, A., Pepe, A., Blocker, A.W., et al. (2014) Ten simple rules for the care
and feeding of scientific data, PLOS Computional Biology, 10(4), e10033542.
https://doi.org/10.1371/journal.pcbi.1003542
2. Eugene Barsky (2017), Good enough research data management: a very brief
guide
3. Dynamic ecology (2016), Ten commandments for good data management.
https://dynamicecology.wordpress.com/2016/08/22/ten-commandments-for-
good-data-management/
4. Ellis SE, Leek JT. (2017) How to share data for collaboration. PeerJ
Preprints5:e3139v1 https://doi.org/10.7287/peerj.preprints.3139v1
5. Wilson G, Bryan J, Cranston K, Kitzes J, Nederbragt L, Teal TK (2017) Good enough
practices in scientific computing. PLOS Computational Biology, 13(6): e1005510.
https://doi.org/10.1371/journal.pcbi.1005510