This document provides an overview of data management basics for graduate students. It discusses why managing data is important, including requirements from funders and for responsible research. It then covers topics like organizing data through file naming, versioning, backup and storage strategies, and post-project activities. Resources for developing data management plans and tools are also listed. The overall message is that planning is key to prevent data loss and enable efficient and ethical research.
DataCite and Campus Data Services
Paul Bracke, Associate Dean for Digital Programs and Information Services, Purdue University
Research libraries are increasingly interested in developing data services for their campuses. There are many perspectives, however, on how to develop services that are responsive to the many needs of scientists; sensitive to the concerns of scientists who are not always accustomed to sharing their data; and that are attractive to campus administrators. This presentation will discuss the development of campus-based data services programs, the centrality of data citation to these efforts, and the ways in which engagement with DataCite can enhance local programs.
Presentation from a University of York Library workshop on research data management. The workshop provides an introduction to research data management, covering best practice for the successful organisation, storage, documentation, archiving, and sharing of research data.
University of Bath Research Data Management training for researchersJez Cope
Slides from a workshop on Research Data Management for research staff and students at the University of Bath.
Part of the Research360 project (http://blogs.bath.ac.uk/research360).
Authors: Cathy Pink and Jez Cope, University of Bath
This presentation was delivered at the Elsevier Library Connect Seminar on 6 October 2014 in Johannesburg, 7 October 2014 in Durban and 9 October 2014 in Cape Town and gives an overview of the potential role that librarians can play in research data management
Data Equivalence
Mark Parsons, Lead Project Manager, Senior Associate Scientist, National Snow and Ice Data Center
Data citation, especially using persistent identifiers like Digital Object Identifiers (DOIs), is an increasingly accepted scientific practice. Recently, several, respected organizations have developed guidelines for data citation. The different guidelines are largely congruent in that they agree on the basic practice and elements of data citation, especially for relatively static, whole data collections. There is less agreement on the more subtle nuances of data citation that are sometimes necessary to ensure precise reference and scientific reproducibility--the core purpose of data citation. We need to be sure that if you follow a data reference you get to the precise data that were used or at least their scientific equivalent. Identifiers such as DOIs are necessary but not sufficient for the precise, detailed, references necessary. This talk discusses issues around data set versioning, micro-citation, and scientific equivalence. I propose some interim solutions and suggest research strategies for the future.
The document provides information about MANTRA, a free online course for research data management created by the University of Edinburgh. MANTRA teaches best practices for managing research data through open educational modules aligned with the research data lifecycle. It is available for reuse and repurposing under an open license. The course covers topics like data planning, organization, documentation, storage, security, and sharing.
Scientific discovery and innovation in an era of data-intensive science
William (Bill) Michener, Professor and Director of e-Science Initiatives for University Libraries, University of New Mexico; DataONE Principal Investigator
The scope and nature of biological, environmental and earth sciences research are evolving rapidly in response to environmental challenges such as global climate change, invasive species and emergent diseases. Scientific studies are increasingly focusing on long-term, broad-scale, and complex questions that require massive amounts of diverse data collected by remote sensing platforms and embedded environmental sensor networks; collaborative, interdisciplinary science teams; and new tools that promote scientific data preservation, discovery, and innovation. This talk describes the challenges facing scientists as they transition into this new era of data intensive science, presents current solutions, and lays out a roadmap to the future where new information technologies significantly increase the pace of scientific discovery and innovation.
DataCite and Campus Data Services
Paul Bracke, Associate Dean for Digital Programs and Information Services, Purdue University
Research libraries are increasingly interested in developing data services for their campuses. There are many perspectives, however, on how to develop services that are responsive to the many needs of scientists; sensitive to the concerns of scientists who are not always accustomed to sharing their data; and that are attractive to campus administrators. This presentation will discuss the development of campus-based data services programs, the centrality of data citation to these efforts, and the ways in which engagement with DataCite can enhance local programs.
Presentation from a University of York Library workshop on research data management. The workshop provides an introduction to research data management, covering best practice for the successful organisation, storage, documentation, archiving, and sharing of research data.
University of Bath Research Data Management training for researchersJez Cope
Slides from a workshop on Research Data Management for research staff and students at the University of Bath.
Part of the Research360 project (http://blogs.bath.ac.uk/research360).
Authors: Cathy Pink and Jez Cope, University of Bath
This presentation was delivered at the Elsevier Library Connect Seminar on 6 October 2014 in Johannesburg, 7 October 2014 in Durban and 9 October 2014 in Cape Town and gives an overview of the potential role that librarians can play in research data management
Data Equivalence
Mark Parsons, Lead Project Manager, Senior Associate Scientist, National Snow and Ice Data Center
Data citation, especially using persistent identifiers like Digital Object Identifiers (DOIs), is an increasingly accepted scientific practice. Recently, several, respected organizations have developed guidelines for data citation. The different guidelines are largely congruent in that they agree on the basic practice and elements of data citation, especially for relatively static, whole data collections. There is less agreement on the more subtle nuances of data citation that are sometimes necessary to ensure precise reference and scientific reproducibility--the core purpose of data citation. We need to be sure that if you follow a data reference you get to the precise data that were used or at least their scientific equivalent. Identifiers such as DOIs are necessary but not sufficient for the precise, detailed, references necessary. This talk discusses issues around data set versioning, micro-citation, and scientific equivalence. I propose some interim solutions and suggest research strategies for the future.
The document provides information about MANTRA, a free online course for research data management created by the University of Edinburgh. MANTRA teaches best practices for managing research data through open educational modules aligned with the research data lifecycle. It is available for reuse and repurposing under an open license. The course covers topics like data planning, organization, documentation, storage, security, and sharing.
Scientific discovery and innovation in an era of data-intensive science
William (Bill) Michener, Professor and Director of e-Science Initiatives for University Libraries, University of New Mexico; DataONE Principal Investigator
The scope and nature of biological, environmental and earth sciences research are evolving rapidly in response to environmental challenges such as global climate change, invasive species and emergent diseases. Scientific studies are increasingly focusing on long-term, broad-scale, and complex questions that require massive amounts of diverse data collected by remote sensing platforms and embedded environmental sensor networks; collaborative, interdisciplinary science teams; and new tools that promote scientific data preservation, discovery, and innovation. This talk describes the challenges facing scientists as they transition into this new era of data intensive science, presents current solutions, and lays out a roadmap to the future where new information technologies significantly increase the pace of scientific discovery and innovation.
A basic course on Research data management: part 1 - part 4Leon Osinski
Slides belonging to a basic course on research data management. The course consists of 4 parts:
Part 1: what and why
1.1 data management plans
Part 2: protecting and organizing your data
2.1 data safety and data security
2.2 file naming, organizing data (TIER documentation protocol)
Part 3: sharing your data
3.1 via collaboration platforms (during research)
3.2 via data archives (after your research)
Part 4: caring for your data, or making data usable
4.1 tidy data
4.2 documentation/metadata
4.3 licenses
4.4 open data formats
Going Full Circle: Research Data Management @ University of PretoriaJohann van Wyk
Presentation delivered at the eResearch Africa Conference, held 23-27 November 2014, at the University of Cape Town, Cape Town, South Africa. Various approaches to Research Data Management at Higher Education Institutions focus on an aspect or two of the research data cycle. At the University of Pretoria the approach has been to support researchers throughout the research process covering the whole research data cycle. The idea is to facilitate/capture the research data throughout the research cycle. This will give context to the data and will add provenance to the data. The University of Pretoria uses the UK Data Archive’s research data cycle model, to align its Research Data Management project-development. This model identifies the stages of a research data cycle as: creating data, processing data, analysing data, preserving data, giving access to data, and reusing data. This paper will give a short overview of the chronological development of research data management at the University of Pretoria. The overview will also highlight findings of two surveys done at the University, one in 2009 and one in 2013. This will be followed by a discussion of a number of pilot projects at the University, and how the needs of researchers involved in these projects are being addressed in a number of the stages of the research data cycle. The discussion will also give a short overview of how the University plans to support those stages not currently being addressed. The second part of the presentation will focus on the projects and technology (software and hardware) used. The University of Pretoria has adopted an Enterprise Content Management (ECM) approach to manage its Research Data. ECM is not a singular platform or system but rather a set of strategies, tools and methodologies that interoperate with each other to create a comprehensive management tool. These sets create an all-encompassing process addressing document, web, records and digital asset management. At the University of Pretoria we address all these processes with different software suites and tools to create a complete management system. Each process presented its own technical challenges. These had to be addressed, while keeping in mind the end objective of supporting researchers throughout the whole research process and data life cycle. Various platforms and standards have been adopted to meet the University of Pretoria’s criteria. To date three processes have been addressed namely, the capturing of data during the research process, the dissemination of data and the preservation of data.
Managing data throughout the research lifecycleMarieke Guy
This document summarizes a presentation about managing data throughout the research lifecycle. It discusses the stages of the research lifecycle, including planning, data creation, documentation, storage, sharing, and preservation. It provides examples of research lifecycle models and addresses key questions to consider at each stage, such as what formats to use, how to document data, where to store it, and how to share and preserve it. The presentation emphasizes making informed decisions about data management and talking to colleagues for support and advice.
Data Management for the Digital HumanitiesThea Atwood
This document provides an overview of key concepts and best practices for data management in the digital humanities. It defines data and discusses its generation. Guidelines for developing a data management plan from funders like NEH and NSF are examined. The importance of data management is explained in terms of meeting requirements, increasing visibility, saving time and money, and facilitating new discoveries. Elements of an effective data management plan, such as roles and responsibilities, expected data, data formats and dissemination, and long-term storage and access are also outlined.
This document provides guidance on creating a data management plan (DMP). It explains that DMPs are required by many funders to help researchers better organize, document, and preserve their data. The key parts of a DMP include describing the data, metadata standards, data security, archiving and preservation, and access. The presenter provides tips for addressing each part, such as using open formats and partnering with repositories. Resources for creating a DMP at the University of Wisconsin-Milwaukee are also listed.
Data Literacy: Creating and Managing Reserach Datacunera
This document discusses best practices for creating and managing research data. It covers defining data, the importance of data management, developing a data management plan, file naming conventions, metadata, data sharing and preservation. Key points include making a data management plan addressing types of data, standards, access and sharing policies; using descriptive file names with dates; storing multiple versions of data; and including metadata to explain the data. Resources for data management support are provided.
This two-part course is a collaboration between CU Libraries/Information Services and the Office of Research Compliance & Training. The purpose of this course is to familiarize you with the various aspects of research data management (RDM)
Part 1: Why RDM is both recommended and required
What research data are
Who is responsible for RDM
Part 2:
When RDM activities occur
How you can carry out RDM activities
This document provides an introduction to data management. It discusses why data management is important, covering key aspects like developing data management plans, file organization, documentation and metadata, storage and backup, legal and ethical considerations, sharing and reuse, and preservation. Effective data management is critical for research success as it supports reproducibility, sharing, and preventing data loss. The document outlines best practices and resources like the library that can help with developing strong data management strategies.
This document summarizes Rob Grim's presentation on e-Science, research data, and the role of libraries. It discusses the Open Data Foundation's work in promoting metadata standards like DDI and SDMX. It also outlines the research data lifecycle and how metadata management can help libraries support research through services like data registration, archiving, discovery and access. Finally, it provides examples of how Tilburg University library supports research data through services aligned with data availability, discovery, access and delivery.
Brad Houston presented information on data management plans (DMPs) required by the National Science Foundation (NSF) for grant proposals. He explained that DMPs must describe the data to be collected or generated, how it will be organized and formatted, and how it will be preserved and shared. He emphasized using open standards and preparing metadata to help others understand and find the data. Researchers were advised to consider long-term preservation and to partner with libraries or repositories to ensure access over time. Contact information was provided for those needing assistance developing their DMP.
Research Data Management: Part 1, Principles & ResponsibilitiesAmyLN
This two-part course is a collaboration between CU Libraries/Information Services and the Office of Research Compliance & Training. The purpose of this course is to familiarize you with the various aspects of research data management (RDM)
Part 1: Why RDM is both recommended and required
What research data are
Who is responsible for RDM
Part 2:
When RDM activities occur
How you can carry out RDM activities
This document provides an introduction to data management. It discusses the importance of data management and introduces best practices. These include making a data management plan, properly organizing and naming files, adding descriptive metadata, securely storing and backing up data, considering legal and ethical issues, enabling sharing and reuse, and ensuring long-term preservation. Effective data management is important across all disciplines and throughout the entire data lifecycle from creation to archiving.
This document discusses the importance of research data management. It covers the data lifecycle and components of a data management plan. The data lifecycle includes collecting, processing, analyzing, storing, preserving, and sharing data. A data management plan outlines how data will be managed and preserved during and after a research project. It includes information about the data, metadata, data sharing policies, long-term storage, and budget. Developing a data management plan helps keep data organized, track processes, control versions, prepare data for sharing and reuse, and ensure long-term access.
Research Data Management and Sharing for the Social Sciences and HumanitiesRebekah Cummings
This document summarizes a presentation on research data management for social and behavioral sciences and humanities. The presentation covered topics such as what data management is, why it is important to manage and share data, how to create data management plans, organize data files through naming conventions and folder structures, describe data through metadata and codebooks, issues around data ownership, and data storage, archiving and sharing options. The presentation was aimed at providing guidance to researchers at the University of Utah on best practices for managing and sharing their research data.
Presentation for Northwestern University's first Computational Research Day, April 22, 2014. http://www.it.northwestern.edu/research/about/campus-events/research-day/agenda.html . By Cunera Buys, e-Science Librarian, and Claire Stewart, Director, Center for Scholarly Communication and Digital Curation and Head, Digital Collections
This slideshow was used in an Introduction to Research Data Management course taught for the Mathematical, Physical and Life Sciences Division, University of Oxford, on 2015-02-09. It provides an overview of some key issues, looking at both day-to-day data management, and longer term issues, including sharing, and curation.
NIH Data Policy or: How I Learned to Stop Worrying and Love the Data Manageme...Kristin Briney
This document summarizes the key points from a presentation about NIH data management and sharing plan requirements. It discusses why these plans are now required for grants over $500,000, how to write an effective plan including what data to share, when, where, who will access it, and how it will be prepared. It also provides tips for effective long-term data management practices like file organization, documentation, backup plans, and security. Resources for creating data management plans and getting help from librarians and tools are also mentioned.
Opening Keynote: The Many and the One: BCE themes in 21st century data curation
Allen Renear, Professor and Interim Dean, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign
Two scientists can be using "the same data" even though the computer files involved appear to be quite different. This is familiar enough, and for the most part, in small communities with shared practices and familiar datasets, raises few problems. But these informal understandings do not scale to 21st century data curation. To get full value from cyberinfrastructure we must support huge quantities of heterogeneous data developed by diverse communities and used by diverse communities -- often with widely varying methods, tools, and purposes. To accomplish this our informal practices and understandings much be replaced, or at least supplemented, by a shared framework of standard terminology for describing complex cascades of representational levels and relationships. Fundamental problems in data curation -- and in particular problems involving provenance, identifiers, and data citation — cannot be fully resolved without such a framework. Although the deepest problems here have ancient origins, useful practical measures are now within reach. Some recent work toward this end that is being carried out at the Center for Informatics Research in Science and Scholarship (CIRSS) at the Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign will be described.
Responsible Conduct of Research: Data ManagementKristin Briney
This presentation was given by myself and Brad Houston (http://www.slideshare.net/herodotusjr), for UWM's Responsible Conduct of Research (RCR) series in Fall of 2013. It covers data management plans and practical data management tips. The corresponding handout is also available on Slideshare: http://www.slideshare.net/kbriney/rcr-data-management-handout
- The document summarizes a workshop on research data management given by Stephanie Simms from the California Digital Library.
- It discusses an overview of research data management and the "SupportYour Data" program, which aims to help researchers better organize, save, document, and share the outputs of their work.
- The workshop covered assessing current data management practices, accessing tools and resources, and data-related services available at Kyoto University.
This talk was given by Brianna Marshall, Digital Curation Coordinator, at the UW-Madison Digital Humanities Research Network meeting on December 2, 2014.
A basic course on Research data management: part 1 - part 4Leon Osinski
Slides belonging to a basic course on research data management. The course consists of 4 parts:
Part 1: what and why
1.1 data management plans
Part 2: protecting and organizing your data
2.1 data safety and data security
2.2 file naming, organizing data (TIER documentation protocol)
Part 3: sharing your data
3.1 via collaboration platforms (during research)
3.2 via data archives (after your research)
Part 4: caring for your data, or making data usable
4.1 tidy data
4.2 documentation/metadata
4.3 licenses
4.4 open data formats
Going Full Circle: Research Data Management @ University of PretoriaJohann van Wyk
Presentation delivered at the eResearch Africa Conference, held 23-27 November 2014, at the University of Cape Town, Cape Town, South Africa. Various approaches to Research Data Management at Higher Education Institutions focus on an aspect or two of the research data cycle. At the University of Pretoria the approach has been to support researchers throughout the research process covering the whole research data cycle. The idea is to facilitate/capture the research data throughout the research cycle. This will give context to the data and will add provenance to the data. The University of Pretoria uses the UK Data Archive’s research data cycle model, to align its Research Data Management project-development. This model identifies the stages of a research data cycle as: creating data, processing data, analysing data, preserving data, giving access to data, and reusing data. This paper will give a short overview of the chronological development of research data management at the University of Pretoria. The overview will also highlight findings of two surveys done at the University, one in 2009 and one in 2013. This will be followed by a discussion of a number of pilot projects at the University, and how the needs of researchers involved in these projects are being addressed in a number of the stages of the research data cycle. The discussion will also give a short overview of how the University plans to support those stages not currently being addressed. The second part of the presentation will focus on the projects and technology (software and hardware) used. The University of Pretoria has adopted an Enterprise Content Management (ECM) approach to manage its Research Data. ECM is not a singular platform or system but rather a set of strategies, tools and methodologies that interoperate with each other to create a comprehensive management tool. These sets create an all-encompassing process addressing document, web, records and digital asset management. At the University of Pretoria we address all these processes with different software suites and tools to create a complete management system. Each process presented its own technical challenges. These had to be addressed, while keeping in mind the end objective of supporting researchers throughout the whole research process and data life cycle. Various platforms and standards have been adopted to meet the University of Pretoria’s criteria. To date three processes have been addressed namely, the capturing of data during the research process, the dissemination of data and the preservation of data.
Managing data throughout the research lifecycleMarieke Guy
This document summarizes a presentation about managing data throughout the research lifecycle. It discusses the stages of the research lifecycle, including planning, data creation, documentation, storage, sharing, and preservation. It provides examples of research lifecycle models and addresses key questions to consider at each stage, such as what formats to use, how to document data, where to store it, and how to share and preserve it. The presentation emphasizes making informed decisions about data management and talking to colleagues for support and advice.
Data Management for the Digital HumanitiesThea Atwood
This document provides an overview of key concepts and best practices for data management in the digital humanities. It defines data and discusses its generation. Guidelines for developing a data management plan from funders like NEH and NSF are examined. The importance of data management is explained in terms of meeting requirements, increasing visibility, saving time and money, and facilitating new discoveries. Elements of an effective data management plan, such as roles and responsibilities, expected data, data formats and dissemination, and long-term storage and access are also outlined.
This document provides guidance on creating a data management plan (DMP). It explains that DMPs are required by many funders to help researchers better organize, document, and preserve their data. The key parts of a DMP include describing the data, metadata standards, data security, archiving and preservation, and access. The presenter provides tips for addressing each part, such as using open formats and partnering with repositories. Resources for creating a DMP at the University of Wisconsin-Milwaukee are also listed.
Data Literacy: Creating and Managing Reserach Datacunera
This document discusses best practices for creating and managing research data. It covers defining data, the importance of data management, developing a data management plan, file naming conventions, metadata, data sharing and preservation. Key points include making a data management plan addressing types of data, standards, access and sharing policies; using descriptive file names with dates; storing multiple versions of data; and including metadata to explain the data. Resources for data management support are provided.
This two-part course is a collaboration between CU Libraries/Information Services and the Office of Research Compliance & Training. The purpose of this course is to familiarize you with the various aspects of research data management (RDM)
Part 1: Why RDM is both recommended and required
What research data are
Who is responsible for RDM
Part 2:
When RDM activities occur
How you can carry out RDM activities
This document provides an introduction to data management. It discusses why data management is important, covering key aspects like developing data management plans, file organization, documentation and metadata, storage and backup, legal and ethical considerations, sharing and reuse, and preservation. Effective data management is critical for research success as it supports reproducibility, sharing, and preventing data loss. The document outlines best practices and resources like the library that can help with developing strong data management strategies.
This document summarizes Rob Grim's presentation on e-Science, research data, and the role of libraries. It discusses the Open Data Foundation's work in promoting metadata standards like DDI and SDMX. It also outlines the research data lifecycle and how metadata management can help libraries support research through services like data registration, archiving, discovery and access. Finally, it provides examples of how Tilburg University library supports research data through services aligned with data availability, discovery, access and delivery.
Brad Houston presented information on data management plans (DMPs) required by the National Science Foundation (NSF) for grant proposals. He explained that DMPs must describe the data to be collected or generated, how it will be organized and formatted, and how it will be preserved and shared. He emphasized using open standards and preparing metadata to help others understand and find the data. Researchers were advised to consider long-term preservation and to partner with libraries or repositories to ensure access over time. Contact information was provided for those needing assistance developing their DMP.
Research Data Management: Part 1, Principles & ResponsibilitiesAmyLN
This two-part course is a collaboration between CU Libraries/Information Services and the Office of Research Compliance & Training. The purpose of this course is to familiarize you with the various aspects of research data management (RDM)
Part 1: Why RDM is both recommended and required
What research data are
Who is responsible for RDM
Part 2:
When RDM activities occur
How you can carry out RDM activities
This document provides an introduction to data management. It discusses the importance of data management and introduces best practices. These include making a data management plan, properly organizing and naming files, adding descriptive metadata, securely storing and backing up data, considering legal and ethical issues, enabling sharing and reuse, and ensuring long-term preservation. Effective data management is important across all disciplines and throughout the entire data lifecycle from creation to archiving.
This document discusses the importance of research data management. It covers the data lifecycle and components of a data management plan. The data lifecycle includes collecting, processing, analyzing, storing, preserving, and sharing data. A data management plan outlines how data will be managed and preserved during and after a research project. It includes information about the data, metadata, data sharing policies, long-term storage, and budget. Developing a data management plan helps keep data organized, track processes, control versions, prepare data for sharing and reuse, and ensure long-term access.
Research Data Management and Sharing for the Social Sciences and HumanitiesRebekah Cummings
This document summarizes a presentation on research data management for social and behavioral sciences and humanities. The presentation covered topics such as what data management is, why it is important to manage and share data, how to create data management plans, organize data files through naming conventions and folder structures, describe data through metadata and codebooks, issues around data ownership, and data storage, archiving and sharing options. The presentation was aimed at providing guidance to researchers at the University of Utah on best practices for managing and sharing their research data.
Presentation for Northwestern University's first Computational Research Day, April 22, 2014. http://www.it.northwestern.edu/research/about/campus-events/research-day/agenda.html . By Cunera Buys, e-Science Librarian, and Claire Stewart, Director, Center for Scholarly Communication and Digital Curation and Head, Digital Collections
This slideshow was used in an Introduction to Research Data Management course taught for the Mathematical, Physical and Life Sciences Division, University of Oxford, on 2015-02-09. It provides an overview of some key issues, looking at both day-to-day data management, and longer term issues, including sharing, and curation.
NIH Data Policy or: How I Learned to Stop Worrying and Love the Data Manageme...Kristin Briney
This document summarizes the key points from a presentation about NIH data management and sharing plan requirements. It discusses why these plans are now required for grants over $500,000, how to write an effective plan including what data to share, when, where, who will access it, and how it will be prepared. It also provides tips for effective long-term data management practices like file organization, documentation, backup plans, and security. Resources for creating data management plans and getting help from librarians and tools are also mentioned.
Opening Keynote: The Many and the One: BCE themes in 21st century data curation
Allen Renear, Professor and Interim Dean, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign
Two scientists can be using "the same data" even though the computer files involved appear to be quite different. This is familiar enough, and for the most part, in small communities with shared practices and familiar datasets, raises few problems. But these informal understandings do not scale to 21st century data curation. To get full value from cyberinfrastructure we must support huge quantities of heterogeneous data developed by diverse communities and used by diverse communities -- often with widely varying methods, tools, and purposes. To accomplish this our informal practices and understandings much be replaced, or at least supplemented, by a shared framework of standard terminology for describing complex cascades of representational levels and relationships. Fundamental problems in data curation -- and in particular problems involving provenance, identifiers, and data citation — cannot be fully resolved without such a framework. Although the deepest problems here have ancient origins, useful practical measures are now within reach. Some recent work toward this end that is being carried out at the Center for Informatics Research in Science and Scholarship (CIRSS) at the Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign will be described.
Responsible Conduct of Research: Data ManagementKristin Briney
This presentation was given by myself and Brad Houston (http://www.slideshare.net/herodotusjr), for UWM's Responsible Conduct of Research (RCR) series in Fall of 2013. It covers data management plans and practical data management tips. The corresponding handout is also available on Slideshare: http://www.slideshare.net/kbriney/rcr-data-management-handout
- The document summarizes a workshop on research data management given by Stephanie Simms from the California Digital Library.
- It discusses an overview of research data management and the "SupportYour Data" program, which aims to help researchers better organize, save, document, and share the outputs of their work.
- The workshop covered assessing current data management practices, accessing tools and resources, and data-related services available at Kyoto University.
This talk was given by Brianna Marshall, Digital Curation Coordinator, at the UW-Madison Digital Humanities Research Network meeting on December 2, 2014.
Getting to grips with research data management Wendy Mears
This document provides an overview of research data management. It defines research data management and discusses its importance. It also outlines the data lifecycle model and provides guidance on sharing data, working with data, planning for data management, and useful resources for research data management. The document aims to help researchers effectively manage the data created throughout the research process.
http://kulibrarians.g.hatena.ne.jp/kulibrarians/20170222
Presentation by Cuna Ekmekcioglu (The University of Edinburgh)
- Creating and Managing Digital Research Data in Creative Arts: An overview (2016)
CC BY-NC-SA 4.0
Presentation given at the Indiana University School of Medicine's Ruth Lilly Medical Library. Contains information and resources specific to Indiana University Purdue University Indianapolis (IUPUI). For full class materials, see LYD17_IUPUIWorkshop folder here: https://osf.io/r8tht/.
The document provides guidance on writing a data management plan (DMP). It explains that DMPs are now required by many funders to accompany grant applications. A DMP outlines how research data will be managed and shared during and after a project. It should address issues like the type of data being collected, documentation, storage and backup plans, data sharing and reuse, legal and ethical concerns, and long-term preservation. Writing a DMP helps ensure good data management practices and that a project is compliant with funder policies supporting open access to research data.
The state of global research data initiatives: observations from a life on th...Projeto RCAAP
The document discusses research data management and provides guidance on best practices. It defines research data management as the active management of data over its lifecycle. It recommends writing a data management plan to document how data will be created, stored, shared, and preserved. It also provides tips for making data accessible and reusable through use of metadata standards, documentation, open licensing, and depositing data in repositories with persistent identifiers. The goal is to help researchers manage and share their data effectively to increase access and reuse.
This document provides an overview of research data management and outlines the steps for creating a data management plan. It discusses why research data management is important, including enabling data reuse and sharing and meeting funder requirements. The document then walks through creating a data management plan, covering topics like the types and formats of data that will be generated, ethical and intellectual property issues, how data will be stored and backed up, and long-term preservation and deposition of data. It emphasizes that planning early helps ensure accurate, complete and secure data, and avoids problems down the line.
Prerequisies of DBMS
Course Objectives of DBMS
Syllabus
What is the meaning of data and database
DBMS
History of DBMS
Different Databases available in Market
Storage areas
Why to Learn DBMS?
Peoples who work with Databases
Applications of DBMS
This is the PowerPoint for my "Data Management for Undergraduate Researchers" workshop for the Office of Undergraduate Research Seminar and Workshop Series. Major topics include motivations behind good data management, file naming, version control, metadata, storage, and archiving.
This document provides guidance on research data management and developing data management plans. It discusses why managing research data is important, including making research easier to conduct, avoiding accusations of fraud or bad science, and getting credit for data produced. The document outlines what is involved in research data management and considerations for sharing and preserving data, such as file formats, documentation, and standards. It emphasizes the importance of data management planning and provides tips on developing plans to meet funder requirements.
Presentation given at the VADS4R training event in Glasgow on 16th June. VADS4R is a project training PhD students and early career researchers in the visual and performing arts about research data management.
Introduction to Research Data Management for postgraduate studentsMarieke Guy
The document provides an introduction to research data management for postgraduate students, outlining what research data is, the research process, what research data management involves and why it is important, and how students can start thinking about good research data management practices. It discusses defining and organizing data, storage and security, and maintaining findable and understandable data throughout the research lifecycle. The goal is to explain the importance of research data management and the roles students play in effective data management.
This document summarizes a seminar on data management for undergraduate researchers. It discusses what data is, why it needs to be managed, and key aspects of the data management process such as data organization, metadata, storage, and archiving. Topics covered include file naming best practices, version control, documentation, metadata standards, storage options, and long-term archiving. The goal is to help researchers organize and document their data so it can be understood, preserved, and reused.
The document discusses the importance of digital data preservation and ethics. It notes that data loss is worse than commonly known, with estimates that 1 in 1,500 files become corrupt on average and 3-500 files corrupt on each hard drive. Proper data preservation is important for responsibilities to colleagues, research subjects, and the public. Key aspects of preservation include having backup copies stored in different locations and formats, as well as periodically checking the integrity of stored data. University libraries can assist with data management planning, curation, archiving, and publishing to help researchers properly preserve their important digital data.
The document provides an introduction and overview of databases and database management systems. It outlines the course curriculum which includes an introduction to databases and database concepts, Oracle relational databases and tools, SQL and PL/SQL implementation, data modeling using ER diagrams, normalization, and transaction management. It also includes knowledge sharing sessions and a project. The document further defines data, information, and data management approaches like files, XML, and databases. It describes the key aspects and advantages/disadvantages of each approach.
This document discusses creating a data management plan. It explains that a data management plan is a comprehensive plan for managing research data throughout a project's lifecycle and briefly describing how data will be shared per a funder's policy. It provides an overview of key elements to include in a plan such as file formats, organization, sharing, and preservation. The document also reviews funder requirements and available tools to create plans, noting they can be tailored to different funders' guidelines.
Similar to Data managementbasics issr_20130301 (20)
3. 1. Funders Require It
• National Institutes of Health: Data Sharing Policy (2003)
• All grants funded at $500K or above must include a Data Sharing Plan
3/01/13
• National Science Foundation: Data Management Plan Requirement
(2011)
• All proposals must submit a 2 pp supplementary “Data Management Plan” to
Data Management Basics
describe how projects will comply with NSF data sharing policy
• National Endowment for the Humanities: Sustainability and Data
Management Plans Requirement (2012)
• Digital Humanities Implementation Grants must include a plan to discuss how
data will be managed, disseminated, and preserved
• OSTP Directive to Funding Agencies (2013)
• Federal agencies with more than $100M in R&D expenditures must ensure 3
that published results of federally funded research are freely available to the
public within one year of publication -- including data
4. National Science Foundation
• Data Management Plan Requirement
• How projects will conform to NSF data sharing policy
• Flexible
3/01/13
• “The plan should reflect best practices in your area of research, and
should be appropriate to the data you generate.”
Data Management Basics
• Directorate for Social, Behavioral and Economic Sciences
• Discipline-specific guidelines
• Archeology (Digital Archeological Record)
• Economics (American Economic Association)
• Universals (for the NSF Universe)
• What data are generated by your research? 4
• What is your plan for managing the data?
5. 2. It Makes Life Easier
• For you…
• Increases efficiency
• Easier to understand the data collected throughout the life cycle of the
project
3/01/13
• Easier to find the data that you need throughout the life cycle of the
project
•
Data Management Basics
Satisfies applicable legal obligations
• Addresses preservation, documentation, verification issues
• Helps reviewers understand the characteristics of your data
• Increases citation rates for articles
• For others…
• Provides continuity – other researchers can build on your data
• Enhances longevity and usability
• Facilitates new discoveries
• Supports open access 5
6. 3. It’s the Right Thing To Do
Responsible Conduct of Research/Research Ethics
• Data Acquisition, Management, Sharing and Ownership
• Using the appropriate research method
3/01/13
• Providing attention to detail
• Obtaining appropriate permissions
Data Management Basics
• Recording data accurately and securely
• Maintaining data to allow it to confirm research findings,
establish priority, and be reanalyzed by other researchers.
• Storing data to protect confidentiality, be secure from physical and
electronic damage, destruction or theft, and be maintained for the
appropriate time frame dictated by sponsor and University policies.
Compliance
• Research using Human Subjects (Institutional Review Board) 6
7. 3/01/13
Data Management Basics
Naming Your files
Organizing Your Data
Backup and Storage
Post-Project Considerations
SMART DATA PRACTICES 7
8. Organizing Your Data
• Getting Started
• Consider your goals
• What do you want to get out of managing your data?
3/01/13
• What is the most efficient way to organize your data?
• Figure out your criteria for keeping data
Data Management Basics
• Think about where you want your data to end up
8
10. Organization
3/01/13
File
Data Management Basics
naming
and
labeling
Consistency Context
10
11. Some potential components for
your file naming strategy
• Version number
3/01/13
• Date of creation
• Name of creator
Data Management Basics
• Description of content
• Name of individual/research team/department
• Publication date
• Project number
11
12. Organizing Your Data
3/01/13
Data Management Basics
12
W. E. B. Du Bois, Niagara delegate meeting, Boston, 1907. W. E. B. Du Bois Papers (MS 312). Special
Collections and University Archives, University Libraries, University of Massachusetts Amherst
13. Organizing Your Data
• Let’s Clean Up Those File Names
• abcdefghijklmnopqrstuvwxyz.jpg
• doesn’t make much sense, does it?
3/01/13
• How about:
Data Management Basics
• 20120925_credo_du_bois_rrz_001.jpg
• And I put it in a directory called:
• credo_du_bois
13
14. Organizing Your Data
• Why this structure?
• Oh, I just made it up! But I’m going to be consistent
• 20120925 = date I found the image
3/01/13
• credo = database/collection where I found the image
• du_bois = image subject
Data Management Basics
• rrz = my initials (I am working in a group!)
• 001 = an accession number (I made that up, too, but I’ll continue to
use that schema)
14
15. BAD naming practices
• Using generic data file names that may conflict when moved
from one location to another
• Failing to think about scale
3/01/13
• Using special characters in a filename such as:
Data Management Basics
&*%$£]{!@
15
16. Versioning
• Use ordinal numbers (1,2,3) for major version changes and the
decimal for minor changes: v1, v1.1, v2.6
• Beware of using confusing labels: revision, final, final2,
3/01/13
definitive_copy
• Discard or delete obsolete versions
Data Management Basics
• Use an auto-backup facility (if available) rather than saving or
archiving multiple versions
• Turn on versioning or tracking in collaborative documents or
storage utilities such as Wikis, GoogleDocs, etc.
16
17. Quiz! File naming by date
What is the best filename?
A. 2012-09-25_Attachment
3/01/13
B. 25 September 2012 Attachment
C. 25092012attch
Data Management Basics
17
18. Quiz! File naming by description
What is the best filename?
A. dubois_great_barrington_recent_20120925_old
version.docx
3/01/13
B. 2012-09-25_dubois_great_barrington_V1.docx
Data Management Basics
C. FFTX_2365498_old.docx
18
19. Organizing Your Data
• Organizational methods
• Hierarchical
• Tag-based
3/01/13
• Retrieval “Very little skill is
Data Management Basics
• Location-based needed to actually be
• Search-based organized and
efficient…. just the
consciousness to put
this file or folder in the
right place.”
19
20. Organizing Your Data
Use folders!
3/01/13
DuBois
DuBois_Images
Data Management Basics
DuBois_Images/1868-1898/
DuBois_Images/1898-1928/
DuBois_Letters
DuBois_Letters/1868-1898/
DuBois_Letters/1898-1928/
DuBois_Newspapers/
etc.
20
21. Archive what you don’t or won’t
need
• Decide what your final data sets are
• Once your project is over, weed out obsolete data and decide
what you want to keep for the long-term
3/01/13
• Move files and folders to an ‘Archive’ or ‘Old files’ folder
• z_archive
Data Management Basics
21
22. Backup and Storage
3/01/13
Data Management Basics
22
January 2011: “Stolen laptop contains cancer cure data”
23. Backup and Storage
• Backup is an essential component of data management
• Prevent against accidental or malicious data loss
• Restore original data
3/01/13
• Keep 3 copies
Data Management Basics
Original
• Consider
• How much?
• How frequently?
• Which media? External External
Local Remote
• Synchronization
23
• Test your system
24. Backup and Storage
• Accessibility of data depends on storage media and file format
• Vulnerable to deterioration
• Become obsolete over time
3/01/13
• Plan for disruption
Data Management Basics
Original
• Consider
• Non-proprietary
file formats
• Different media types External External
in storage strategy Local Remote
• Migrate data
• Unencrypted, 24
uncompressed
25. Backup and Storage
• Security
• Encryption can be used for safely moving or storing files,
• Encrypting files on storage devices (flash drives)
3/01/13
• Encryption during file transfer (ie: WinSCP)
• Encrypted storage services
Data Management Basics
• Deleting Data
• Weed out obsolete data and decide what you want to keep for
the long-term
• Deleting files does not delete files
• Other things to Consider
• How will the data be used? 25
• Who pays for storage?
27. Data Management is About
Planning
Data management will:
• Prevent bad things
3/01/13
from happening to Collection Description
your data;
Data Management Basics
• Make you a more Storage
Access
efficient researcher; and Backup
• Prepare you for
grant management.
27
28. Data Management Plans
NSF
• The types of data;
3/01/13
• The standards to be used for data and metadata format and
content ;
Data Management Basics
• The policies for access and sharing;
• The policies and provisions for re-use, re-distribution, and the
production of derivatives; and
• The plans for archiving and for preservation of access.
28
30. Planning
• Data Working Group (email datamanagement@library.umass.edu)
• Digital projects
• Long-term preservation
3/01/13
• Assessment
• Web resources
Data Management Basics
• UMass Amherst Libraries: General Resources
(http://guides.library.umass.edu/datamanagement)
• Discipline-specific
• Your faculty
• Your mentors
• Your professional associations
• Industry partners
• Public engagement
30
31. Backup and Storage
• Storage
• Udrive (http://www.oit.umass.edu/udrive )
• Departmental servers
• CDs/DVDs/external hard drives
3/01/13
• Filesharing (see http://chronicle.com/blogs/profhacker/protecting-your-data/37350)
• Dropbox
Data Management Basics
• Google Docs
• Cloud Storage
• Amazon Web Services
• Rackspace
• Microsoft Azure
• Sugar Sync
• Additional Information
• MIT on Backups and Security
http://libraries.mit.edu/guides/subjects/data-management/backups.html
• UK Data Archive on Data Storage 31
http://www.data-archive.ac.uk/create-manage/storage
• UK Preservation Office “Caring for CDs and DVDs”
http://www.bl.uk/blpac/pdf/cd.pdf
33. Sources
• MIT Data Management
(http://libraries.mit.edu/guides/subjects/data-management/)
• UK Data Archive
3/01/13
(http://www.data-archive.ac.uk/)
• MANTRA
Data Management Basics
(http://datalib.edina.ac.uk/mantra/organisingdata.html)
• Creating Order from Chaos: 9 Great Ideas for Managing Your
Computer Files
(http://www.makeuseof.com/tag/creating-order-chaos-9-
great-ideas-managing-computer-files/)
• Research Information Management: Tools for the Humanities
(http://sudamih.oucs.ox.ac.uk/docs/Generic%20Courses/Tools
%20for%20the%20Humanities%20course%20book.docx)
33
34. Questions/contact
datamanagement@library.umass.edu
3/01/13
Data Management Basics
34
Editor's Notes
Starting in January 2011 NSF is requiring that grant proposals have a Data Management Plan.The DMP is described as no more than two pages, specifying the types of data, the standards to be used for data and metadata format and content, policies for accessing and sharing the data.They do state that a valid plan may include only the statement that no detailed plan is needed, but you have to justify that statement.DMP will be reviewed as an integral part of the proposal, coming under the Intellectual Merit or Broader Impacts sections or both. Grant Proposal Guide (GPG), Chapter II.C.2.j NSF Directorates, Programs have additional requirementshttp://www.nsf.gov/bfa/dias/policy/dmp.jspThe Biological Sciences, Engineering, Geosciences, Social, Behavioral and Economic Sciences Directorates are examples having additional requirements for their DMPs.National Institutes of Health expect researchers to include data sharing plans in their proposals as well. This appears to be a trend for other funding agencies. NIH data sharing policy: Data should be made as widely and freely available as possible while safeguarding the privacy of participants, and protecting confidential and proprietary data.NSF data sharing policy:Investigatorsare expected to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of work under NSF grants. NEH: All proposals will be required to include both a sustainability plan that discusses long-term support for the project and a data management plan that discusses how research data will be preserved.
The National Science Foundation recognizes the need for flexibility. Different types of data require different plans. The NSF documentation points researchers to some specific sites for more specific protocols. There are numerous others for the rest of the social sciences. You can contact your scholarly associations for more details.You must demonstrate that you know what you data are and how you will manage it to funding agencies.
For you:Since a number of grants are multi-year, renewable propositions; building good data management practices into proposals is crucial.By doing so, problems associated with lab turnover can be addressed (one professor noted that 3-4 more papers would have come out of his lab if not for this type of issue) But also assists you in remembering relevant details and procedures relating to your data and data collection over the long haul as well.Developing a good data archiving plan safeguards your investment of time and money and makes recovery from disaster possible and hopefully, faster and more complete. Addresses your documentation and verification issues. Type of data to be produced, description of the methodology, standards that will be applied.Satisfies many legal obligations such as security measures to protect confidentiality or IP considerations.Good DMP helps reviewers understand your work and increases the visibility. Easily accessible and clearly understood. Preserves your unique contribution to your field. For others:Although provisions are made for restrictions or embargos on data , particularly those having commercial implications, there is an underlying assumption that data should be shared, distributed and built upon. A data management plan gets you to think about and plan for how that will happen.Promoting new discoveries and minimizing duplication of effort. Open access movement Science Commons, PubChem et al which fosters the development of knowledge. Science Commons is an organization that promotes legal and technical mechanisms to remove barriers to sharing scientific information. One way they are looking at that is through the Open Knowledge Definition which sets out to define openness in relation to content and data.
RCR covers a range of topics that speak to the conduct of investigators and the integrity of the research university (where an investigator is defined in UMass COI policy as: the principal investigator and any other person who is responsible for the design, conduct, or reporting of research funded). It is a philosophy of creating an environment for research that encourages quality and ethical principles. Topics include Mentor/Trainee Responsibilities;Publication Practices and Responsible Authorship; Peer Review; Collaborative Science;Communication and Difficult Conversations; and Data Acquisition, Management, Sharing and Ownership. Many of the practices and constraints will be dictated by the discipline, by the lab, the funding conditions, but there are generally accepted standards that investigators should be aware of and adhere to relative to data ownership, data collection, data protection and data sharing.By following good data practices (or RCR), an investigator can avoid risk of misconduct and comply with policies and regulations regarding intellectual property and animal or human research subjects. Examples of Compliance include protocols for doing research with animals, for biological and environmental safety, and export control. Research using Human Subjects involves having project reviewed by the University’s IRB (a federally mandated body which reviews all sponsored research involving human subjects), obtaining consent, and maintaining confidence of data collected. Research with Human Subject is the domain where privacy (for sensitive data), confidentiality, and security will be major concerns when managing data. Examples of Ethical concerns include Conflicts of Interest (related to financial concerns or intellectual property rights concerns influence the design, conduct or reporting of research), Faculty Consulting, and Whistleblowing. Research Misconduct also in this category. Misconduct: means fabrication, falsification, or plagiarism in proposing, performing, reporting, or reviewing research, not including honest error or difference of opinion; misrepresentation of the procedures and outcomes of research to gain some advantage. Policies to investigate and determine misconduct include fact finding (which means examination of data).
Data = research
Organizing your data is about keeping good records, namely planning file naming conventions and organizing file directories to your advantageWhat are your goals? Based on those goals: how should you organize your data? Are there key themes, categories, people, dates, formats, etc? You might document/store/organize data differently for different outcomes like sharing, preserving, sharing a small subset, etc.What is important to save? If you plan well, you can put your research anywhere.
The most basic part of organizing your data is to consider your filenames. Most computers uses filenames to index content; “Windows search”Clear names will help in retrieving files and should fit with your overall organizational approach for your project.
There are three things to consider when naming files – organization, context, consistency.Organization is important for future access and retrieval –Context could include content-specific or descriptive information Consistency – choose a naming convention and ensure that the rules are followed systematically by always including the same information (such as date and time) in the same order (YYYYMMDD).
How would we name this image file – found in the University Archives?
File naming conventions.
Consistency is key. Use underscores instead of full-stops or spaces because, like special characters, these are parsed differently on different systems The filename should include as much descriptive information that will assist identification independent of where it is storedIf including dates, format them consistently
Scale:if you want to include a project number, don’t limit your project number to 2 digits, or you can only have ninety nine projectsSpecial characters: these are often used for specific tasks in a digital environment
It is important to identify and distinguish versions of research data files consistently. This ensures that a clear audit trail exists for tracking the development of a data file and identifying earlier versions when needed. Thus you will need to establish a method that makes sense to you that will indicate the version of your data files.http://datalib.edina.ac.uk/mantra/organisingdata.html
A – correct. Files using this naming convention are easy to distinguish from one another, easier to browse and locate chronologically.B - Incorrect. File not easy to browse and locate chronologically.C - Incorrect. File not easy to browse and locate chronologically. Filename not immediately intuitive.Tip! If using a date, use the format year-Month-Day: YYYY-MM-DD or YYYY-MM or YYYY-YYYY. This will maintain chronological order of your files.
A – incorrect - date is ambiguous, there could be several ‘old’ versions. B – correct – date is in uniform format and easy to distinguish/sort from files using same date convention. Filename represents more accurately the content. Using a version number convention also makes it easier to distinguish from other versions of the same file.C – incorrect – this is an application generated filename lacking descriptive or context-specific information.
Hierarchical – most commons operating systems default to this way of organizing filesAn item can only go into one place or folder (unless there are duplicates)Must choose a system for categorizing filesWell-adapted to location based findingTag-based – electronic labels or keywords applied to files, flat systemAn item can have many tags, more flexibility with how a file is categorizedTags must be applied consistentlyPlan and then follow the plan. Implement. File things immediately; put things in the right place according to your plan as they are created.
Example of a well-organized file with consistent naming conventions.Major heading with logical subheadings.Individual files under subheadings distinguished by date of analysis, or collection, etc. but be consistentOrganize by category – for example, if you are studying multiple individuals and are collecting many types of documents about them, you could organize first by the individual, then by type of coverage – image, letter, newspaper, then by date. One place for everything – you need a place where you know that you can access your files and folders there. The My Documents folder is the logical and perfect place for this - this is a home for your folders, which contain your files. Think of it in the sense that you wouldn’t put your folders in the yard, nor would you put your filing cabinet in the yard… you put both of them in the house. Your My Documents folder is your “house” of sorts.Plan and implement. File things immediately; put things in the right place according to your plan as they are created.
Personally I recommend still having it in the My Documentsfolder to keep things easy to remember and consistent. With a name like “Archive” it’ll likely be near the top of whatever folder you decide to put it in. To change this, you can add a “z” and a period to the beginning of the name, so the folder could look something like “z.Archive“. This will put it at the bottom of the list so you won’t have to worry about it being in the way all the time.http://www.makeuseof.com/tag/creating-order-chaos-9-great-ideas-managing-computer-files/
V. important component of data management – backup. University of Oklahoma researcher loses years of research due to theft. PC advisor poll from November 2010 indicates that 1 of 13 do not back up important data! [30% back up important data daily; 25% weekly; 21% monthly; 16% rarely backup data; 8% never]http://www.pcadvisor.co.uk/news/security/3248400/poll-30-percent-back-up-data-every-day/
Backup ensures that most recent data will always be accessible and concerns the procedures for saving and synchronizing data. Accidental or malicious data loss due to:hardware faults or failuresoftware or media faultsvirus infection or malicious hackingpower failurehuman errors by changing or deleting filesRecommended practice is to keep 3 copies of your data. How much: What will you need to restore in the event of data loss? Are there backup policies already established for the institutional/network computers you are using and will they be sufficient for your project?How frequently: how critical are the changes being made or the new data being generated? Backup after every change, or at regular intervals. Use automated backup processes. Which media: depends on quantity, file type, project needs. Options include removable media (hard or flash drives), recordable CD/DVD, or network drive. Synchronization: Ensures consistency between backup copies. Use the same or compatible naming conventions for the original project files – label removable media!
Storage concerns the location and media for housing data and is important because digital media are inherently unstable and change rapidly. Media currently available for storing data files are optical media - CDs and DVDs - and magnetic media - hard drives and tapes. Both vulnerable to physical degradation. Storage strategy even for short term projects should include two different forms of media.Non-proprietary file types (follow an open, documented standard; ASCII or Unicode; community-supported; unencrypted; uncompressed):PDF/A, not WordASCII, not Excel MPEG-4, not QuicktimeTIFF or JPEG2000, not GIF or JPGXML or RDF, not RDBMSWhich media: Portable HD? Cloud? Department server? Subject data repository? UK Data Archive recommends using at least two different media types in your storage strategy (optical/magnetic) in addition to local and remote backup copies. Unencrypted is ideal for storing your data because it will make it most easily read by you and others in the future. (MIT)Uncompressed is also ideal for storage, but if you need to do so to conserve space, limit compression to your 3rd backup copy (MIT)
Secure data storage will prevent unauthorized access, changes, disclosure, or destruction of data and includes physical as well as network security. Refers to physical security (passwords, firewalls, anti-virus and anti-malware software) as well as security when sharing or moving files. Encryption is the easiest and most practical method of protecting data stored or transmitted electronically and is particularly essential with sensitive data. (ECU)Moving or storing files, such as back-ups or storage on mobile devices. Individual files can be encrypted, as well as entire storage devices or spaces.http://www.ecu.edu/cs-itcs/itsecurity/DataEncryption.cfmWeeding: Determined by project requirementsHow will the data be used?In-house? Outside users?Restricted?Is it live or “archived?”
These may be things that you will get to toward the end of a project, but are good to think about. Traditional outcomes of research are published papers (much of what tenure and promotion is based on). Growing practice to submit supplemental data files along with manuscripts at the point of publication. Know what your intellectual property is, what your copyrights are and how they apply to data and databases.Much of what is created is considered an “exempt scholarly work”: university automatically waives ownership of this class of IP. UMass Policy: the creator owns IP that is created or discovered here.Copyright providesLegal protection for “original works of authorship”Facts and ideas can not be copyrighted, but their expression canData sets and databases can be protected under copyright as literary works, which includes “tables” and “compilations”Expectations of sharing are have also created an environment where datasets are being shared within communities.It has been recognized, by Creative Commons specifically, that the nature of sharing data sets is fundamentally different than sharing textual documents. Also that the benefits of data sharing outweigh the constraints of applying copyright. They have endorsed a Database Protocol which encourages the unfettered sharing of data through the use of a CC0 license: this essentially puts data into the public domain. Venues for data sharing include Institutional and Disciplinary Repositories. Data Citation means providing a reference to data in the same way as researchers routinely provide a bibliographic reference to printed resources. Important part of validating datasets as a primary research output rather than a by product of research. University resources: university funds, time, and facilities; not use of library, facilities available to the public, or occasional use of office equipment.Exempted scholarly works: Students sign participation agreement (prior to hire as research assistants, for example)Who owns copyright of data?Creator of the dataUnder UMass IP Policy, the creator owns IP that is made, discovered, or created here unlessSignificant use of University resourcesUniversity-commissioned workIP Subject to contractual obligations (ie: sponsored research)Student work (except “exempt scholarly work”)“Exempt Scholarly Work” includesInstruction materials, including text books and class notesResearch articles, monographs, proposalsTheses and dissertations, dramatic works and performances, drawings sculpture, musical compositions and performances, poetry, fiction and non-fictionhttp://www.umass.edu/research/system/files/Intellectual_Propery_Policy_UMA.pdfStop for questions.
These are the elements of data management – thinks that you should think about. Data management will have positive benefits.
You will need somewhere to store your data as you are workingUdrive – you get 1GB, can share files with anyone through the udrive3rd party – many cloud storage providers – Amazon gives you 5GB, Dropbox gives you 2GB, google docs gives you 1GB, but you can purchase more space – 400gb for $100/year, 1TB for $256/year; cloud options provide a nearly infinitely-scalable tier of storage for archiving very large datasets. Prices can range from $0.14/GB to $0.55/GB.OIT security pages have links and instructions for downloading anit-virus and anti-malware software; it has tips for protecting your personal computer from unauthorized access;