The document provides an overview of the CISER Data Archive at Cornell University and introduces key concepts of research data management (RDM).
The CISER Data Archive is a collection of over 27,000 numeric datasets to support quantitative research in various social science fields. It provides consulting services to help users find, access, and use data. It also maintains the Cornell research data repository.
The document defines research data and outlines the research data lifecycle. It discusses best practices for organizing, documenting, storing, and securing research data. Key aspects of RDM include developing data management plans, using appropriate file formats, and ensuring long-term preservation and sharing of research data.
Going Full Circle: Research Data Management @ University of PretoriaJohann van Wyk
Presentation delivered at the eResearch Africa Conference, held 23-27 November 2014, at the University of Cape Town, Cape Town, South Africa. Various approaches to Research Data Management at Higher Education Institutions focus on an aspect or two of the research data cycle. At the University of Pretoria the approach has been to support researchers throughout the research process covering the whole research data cycle. The idea is to facilitate/capture the research data throughout the research cycle. This will give context to the data and will add provenance to the data. The University of Pretoria uses the UK Data Archive’s research data cycle model, to align its Research Data Management project-development. This model identifies the stages of a research data cycle as: creating data, processing data, analysing data, preserving data, giving access to data, and reusing data. This paper will give a short overview of the chronological development of research data management at the University of Pretoria. The overview will also highlight findings of two surveys done at the University, one in 2009 and one in 2013. This will be followed by a discussion of a number of pilot projects at the University, and how the needs of researchers involved in these projects are being addressed in a number of the stages of the research data cycle. The discussion will also give a short overview of how the University plans to support those stages not currently being addressed. The second part of the presentation will focus on the projects and technology (software and hardware) used. The University of Pretoria has adopted an Enterprise Content Management (ECM) approach to manage its Research Data. ECM is not a singular platform or system but rather a set of strategies, tools and methodologies that interoperate with each other to create a comprehensive management tool. These sets create an all-encompassing process addressing document, web, records and digital asset management. At the University of Pretoria we address all these processes with different software suites and tools to create a complete management system. Each process presented its own technical challenges. These had to be addressed, while keeping in mind the end objective of supporting researchers throughout the whole research process and data life cycle. Various platforms and standards have been adopted to meet the University of Pretoria’s criteria. To date three processes have been addressed namely, the capturing of data during the research process, the dissemination of data and the preservation of data.
This presentation was delivered at the Elsevier Library Connect Seminar on 6 October 2014 in Johannesburg, 7 October 2014 in Durban and 9 October 2014 in Cape Town and gives an overview of the potential role that librarians can play in research data management
S. Venkataraman (DCC) talks about the basics of Research Data Management and how to apply this when creating or reviewing a Data Management Plan (DMP). He discusses data formats and metadata standards, persistent identifiers, licensing, controlled vocabularies and data repositories.
link to : dcc.ac.uk/resources
Presented by Elena Yaroshenko (Technion - Israel Institute of Technology) at the seminar "When Open Science Meets Big Data: Adjustment of Library Services" (Teldan Info: The 33rd Annual Conference & Exhibition on May, 16th 2018, Israel).
Going Full Circle: Research Data Management @ University of PretoriaJohann van Wyk
Presentation delivered at the eResearch Africa Conference, held 23-27 November 2014, at the University of Cape Town, Cape Town, South Africa. Various approaches to Research Data Management at Higher Education Institutions focus on an aspect or two of the research data cycle. At the University of Pretoria the approach has been to support researchers throughout the research process covering the whole research data cycle. The idea is to facilitate/capture the research data throughout the research cycle. This will give context to the data and will add provenance to the data. The University of Pretoria uses the UK Data Archive’s research data cycle model, to align its Research Data Management project-development. This model identifies the stages of a research data cycle as: creating data, processing data, analysing data, preserving data, giving access to data, and reusing data. This paper will give a short overview of the chronological development of research data management at the University of Pretoria. The overview will also highlight findings of two surveys done at the University, one in 2009 and one in 2013. This will be followed by a discussion of a number of pilot projects at the University, and how the needs of researchers involved in these projects are being addressed in a number of the stages of the research data cycle. The discussion will also give a short overview of how the University plans to support those stages not currently being addressed. The second part of the presentation will focus on the projects and technology (software and hardware) used. The University of Pretoria has adopted an Enterprise Content Management (ECM) approach to manage its Research Data. ECM is not a singular platform or system but rather a set of strategies, tools and methodologies that interoperate with each other to create a comprehensive management tool. These sets create an all-encompassing process addressing document, web, records and digital asset management. At the University of Pretoria we address all these processes with different software suites and tools to create a complete management system. Each process presented its own technical challenges. These had to be addressed, while keeping in mind the end objective of supporting researchers throughout the whole research process and data life cycle. Various platforms and standards have been adopted to meet the University of Pretoria’s criteria. To date three processes have been addressed namely, the capturing of data during the research process, the dissemination of data and the preservation of data.
This presentation was delivered at the Elsevier Library Connect Seminar on 6 October 2014 in Johannesburg, 7 October 2014 in Durban and 9 October 2014 in Cape Town and gives an overview of the potential role that librarians can play in research data management
S. Venkataraman (DCC) talks about the basics of Research Data Management and how to apply this when creating or reviewing a Data Management Plan (DMP). He discusses data formats and metadata standards, persistent identifiers, licensing, controlled vocabularies and data repositories.
link to : dcc.ac.uk/resources
Presented by Elena Yaroshenko (Technion - Israel Institute of Technology) at the seminar "When Open Science Meets Big Data: Adjustment of Library Services" (Teldan Info: The 33rd Annual Conference & Exhibition on May, 16th 2018, Israel).
Overview of the UKRDDS pilot project at Univwersity of Edinburgh employing PhD interns to validate metadata about research data created by University of Edinburgh researchers and held in local RDM services solutions. This was presented at IASSIST in June 2016, Bergen, Norway.
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...Amanda Whitmire
A workshop as part of the International Digital Curation Conference 2016 on DMP development and support. This presentation demonstrates how we can use data management plans as a source of information to better understand researcher data stewardship practices and how to support them. Be sure to see the slide notes to better understand the presentation (most slides are just photos/icons).
University of Bath Research Data Management training for researchersJez Cope
Slides from a workshop on Research Data Management for research staff and students at the University of Bath.
Part of the Research360 project (http://blogs.bath.ac.uk/research360).
Authors: Cathy Pink and Jez Cope, University of Bath
A talk outlining the virtues and processes of Research Data Management for PhD students in the geosciences. Given by Stuart Macdonald at the Introduction to RDM Workshop, School of Geosciences, University of Edinburgh, on 2 November 2015
This slideshow was used in a Preparing Your Research Data for the Future course taught in the Medical Sciences Division, University of Oxford, on 2015-06-08. It provides an overview of some key issues, focusing on long-term data management, sharing, and curation.
Our regular Introduction to Data Management (DM) workshop (90-minutes). Covers very basic DM topics and concepts. Audience is graduate students from all disciplines. Most of the content is in the NOTES FIELD.
This slideshow was used in an Introduction to Research Data Management course for the Social Sciences Division, University of Oxford, on 2015-05-27. It provides an overview of some key issues, looking at both day-to-day data management, and longer term issues, including sharing, and curation.
Curation and Preservation of Crystallography DataManjulaPatel
A presentation given by Manjula Patel (UKOLN) at "Chemistry in the Digital Age: A Workshop connecting research and education", June 11-12th 2009, Penn State University,
http://www.chem.psu.edu/cyberworkshop09
Overview of the UKRDDS pilot project at Univwersity of Edinburgh employing PhD interns to validate metadata about research data created by University of Edinburgh researchers and held in local RDM services solutions. This was presented at IASSIST in June 2016, Bergen, Norway.
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...Amanda Whitmire
A workshop as part of the International Digital Curation Conference 2016 on DMP development and support. This presentation demonstrates how we can use data management plans as a source of information to better understand researcher data stewardship practices and how to support them. Be sure to see the slide notes to better understand the presentation (most slides are just photos/icons).
University of Bath Research Data Management training for researchersJez Cope
Slides from a workshop on Research Data Management for research staff and students at the University of Bath.
Part of the Research360 project (http://blogs.bath.ac.uk/research360).
Authors: Cathy Pink and Jez Cope, University of Bath
A talk outlining the virtues and processes of Research Data Management for PhD students in the geosciences. Given by Stuart Macdonald at the Introduction to RDM Workshop, School of Geosciences, University of Edinburgh, on 2 November 2015
This slideshow was used in a Preparing Your Research Data for the Future course taught in the Medical Sciences Division, University of Oxford, on 2015-06-08. It provides an overview of some key issues, focusing on long-term data management, sharing, and curation.
Our regular Introduction to Data Management (DM) workshop (90-minutes). Covers very basic DM topics and concepts. Audience is graduate students from all disciplines. Most of the content is in the NOTES FIELD.
This slideshow was used in an Introduction to Research Data Management course for the Social Sciences Division, University of Oxford, on 2015-05-27. It provides an overview of some key issues, looking at both day-to-day data management, and longer term issues, including sharing, and curation.
Curation and Preservation of Crystallography DataManjulaPatel
A presentation given by Manjula Patel (UKOLN) at "Chemistry in the Digital Age: A Workshop connecting research and education", June 11-12th 2009, Penn State University,
http://www.chem.psu.edu/cyberworkshop09
Developing Data Literacy Competencies to Enhance Faculty CollaborationsLIBER Europe
Developing Data Literacy Competencies to Enhance Faculty Collaborations (Don MacMillan, University of Calgary, Canada). This presentation was one of the 10 most highly ranked at LIBER's Annual Conference 2014 in Riga, Latvia. Learn more: www.libereurope.eu
Presentation made at the 'Towards linked science - Open Data and DataCite Esrtonia seminar as part of the Estonian Open Access Week at University of Tartu
Delivered by Peter Burnhill, Director of EDINA, at the PRELIDA Consolidation and Dissemination workshop on 17/18 October 2014 (http://prelida.eu/consolidation-workshop).
Summary: The web changes over time, and significant reference rot inevitably occurs. Web archiving delivers only a 50% chance of success. So in addition to the original URI, the link should be augmented with temporal context to increase robustness.
LIBER fostering Open Science and Knowledge DiscoveryLIBER Europe
Presentation by Kristiina Hormia Poutanen, LIBER President. Delivered at 25th Anniversary Conference of The National Repository Library of Finland
Kuopio 22th of May 2015. Content is cc-by.
Présentation réalisée par Simon Hodson, Programme Manager au JISC (Joint Information Systems Committee - Royaume-Uni) pour son intervention lors de la journée d'étude du 26 novembre 2012, organisée par le réseau des Urfist sur le thème de 30 ans de politiques d'information scientifique et technique.
Compte de rendu de sessions du 80ème congrès mondial des bibliothèques IFLA 2014 : le dépôt légal des ebooks, la fouille de données (Text&Data Mining), les posters scientifiques, l'IdeasBox (médiathèque mobile pour camps de réfugiés)
Developing data services: a tale from two Oregon universitiesAmanda Whitmire
While the generation or collection of large, complex research datasets is becoming easier and less expensive all the time, researchers often lack the knowledge and skills that are necessary to properly manage them. Having these skills is paramount in ensuring data quality, integrity, discoverability, integration, reproducibility, and reuse over time. Librarians have been preserving, managing and disseminating information for thousands of years. As scholarly research is increasingly carried out digitally, and products of research have expanded from primarily text-based manuscripts to include datasets, metadata, maps, software code etc., it is a natural expansion of scope for libraries to be involved in the stewardship of these materials as well. This kind of evolution requires that libraries bring in faculty with new skills and collaborate more intimately with researchers during the research data lifecycle, and this is exactly what is happening in academic libraries across the country. In this webinar, two researchers-turned-data-specialists, both based in academic libraries, will share their experiences and perspectives on the development of research data services at their respective institutions. Each will share their perspective on the important role that libraries can play in helping researchers manage, preserve, and share their data.
Who owns the data? Intellectual property considerations for academic research...Rebekah Cummings
Intellectual property (IP) is often complicated but is even more so as it pertains to data, as “facts” are not eligible for copyright protection under United States copyright law. The IP issues surrounding data in academic research environments are often exacerbated by the fact that data ownership has rarely been discussed in university environments prior to NSF’s data management plan requirement in 2011. Researchers retained custody over their datasets and other stakeholders – namely universities and funding agencies – rarely contested ownership. Now, as datasets are increasingly seen as valuable outputs of research alongside publications, questions of data ownership are coming to the fore. This presentation will frame the complex issues surrounding data ownership in an academic research setting and will discuss strategies for educating and advising your researchers on intellectual property issues related to research data.
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...Sarah Anna Stewart
Presentation given at the M25 Consortium of Academic Libraries, CPD25 Event on 'The Role of the Library in Supporting Research'. Provides an introduction to data, software and PIDs and a brief look at how libraries can enable researchers to gain impact and credit for their research data and software.
Meeting Federal Research Requirements for Data Management Plans, Public Acces...ICPSR
These slides cover evolving federal research requirements for sharing scientific data. Provided are updates on federal agency responses to the 2013 OSTP memo, guidance on data management plans, resources for data management and curation training for staff/researchers, and tips for evaluating public data-sharing services. ICPSR's public data-sharing service, openICPSR, is also presented. Recording of this presentation is here: https://www.youtube.com/watch?v=2_erMkASSv4&feature=youtu.be
Stuart Macdonald talks about the Research Data Management programme at the University of Edinburgh Data Library, delivered at the ADP Workshop for Librarians: Open Research Data in Social Sciences and Humanities (ADP), Ljubljana, Slovenia, 18 June 2014
The basics of Research Data Management. Session 1.1 of the RDMRose v3 materials.
The JISC funded RDMRose project (June 2012-May 2013) was a collaboration between the libraries of the University of Leeds, Sheffield and York, with the Information School at Sheffield to provide an Open Educational Resource for information professionals on Research Data Management. The materials were revised between November 2014 and February 2015 for the consortium of North West Academic Libraries (NoWAL).
http://www.sheffield.ac.uk/is/research/projects/rdmrose
Managing provenance in the Social Sciences: the Data Documentation Initiative...ARDC
Slides from webinar: Provenance and social science data. Presented on 15 March 2017. Presenter was Dr Steve McEachern, Director Australian Data Archive
FULL webinar recording: https://youtu.be/elPcKqWoOPg
1. Dr Steve McEachern (Director, Aust Data Archive) Data Documentation Initiative (DDI: http://www.ddialliance.org/): A free, international standard for describing data produced by surveys and other observational methods in the social, behavioral, economic, and health sciences. It can document and manage different stages in the research data lifecycle, eg conceptualization, collection, processing, distribution, discovery, and archiving. Documenting data with DDI facilitates understanding, interpretation, and use -- by people, software systems, and computer networks.
Slides presented at the Spanish Agency of Science and Technology (FECYT) and the network of Spanish repositories (RECOLECTA) Research Data Management Webinar Series - see url:
http://www.recolecta.net/buscador/webminars.jsp
1. CISER Data Archive &
Introduction to RDM
Stuart Macdonald
CISER Data Services Librarian
srm262@cornell.edu
Research Design CRP-7201, Stone Laboratory, Cornell Univ. 19 March 2014
2. • CISER Data Archive
• What is Research Data Management (RDM)
• Research Data Defined
• Data Management Planning
• Organising Data
• File Formats & Transformations
• Documentation & Metadata
• Storage & Security
• Data protection & Rights
• Preservation & Sharing
• Research Data MANTRA
3. CISER Data Archive: Collection and Services
Established over 30 years ago
Collection of numeric datasets to support quantitative
research
c. 27,000 online files in addition to thousands of studies on CD/DVD
Emphasis on demography (state/federal censuses),
economics, health, labor, election studies, attitudinal and
behavioral studies, family life etc.
4. • Consulting services to match user needs with appropriate data
and statistical analysis software
•finding, accessing and using data
• Current Cornell researchers can download archive files from
online catalog (search & browse) in formats conversant with
statistical software
• Data files are identified by a ‘traffic light’ icon that indicates
usage level:
• Green – downloadable by anyone
• Yellow – downloadable from links in the catalog with CUWebAuth
authentication (for use within the CISER research computing
environment - CISERRSCH) – Cornell researchers can apply for a
computing account
• Red – data to be used in restriction (via CRADC or conditions
imposed by data provider)
6. 6
CISER Data Archive maintain links to a range of social science
data resources including:
•Data Distributors and Producers: U.S. Government e.g. Dept. Agriculture,
Dept. Commerce, Dept. Energy, Dept. Justice, Dept. Labor, Federal Agencies
•Data Distributors and Producers: Other U.S. Sources
•Data Distributors and Producers: International eg. Eurostat, FAOSTAT, ILO,
OCED, UN Statistics Division, World Bank
•Data Libraries and Archives e.g. Harvard-MIT Data Center, UKDA, DANS, CESSDA,
•Social Science Research Institutes e.g. Odum Institute, Survey Research
Institute
•Online Reference Tools e.g. Boundary files, geocoding tools, SIC codes, data
citation tools
•State and Local Government data and statistical sources e.g. NY State
Depts. Education, Health, Labor, State Data Center
See URL: http://ciser.cornell.edu/ASPs/datasource.asp
7. • Provides Cornell social science researchers with a
repository for sharing and providing long-term preservation
of their numeric/statistical research data
• Participates in Cornell’s Research Data Management
Service Group
• Assist Cornell social science researchers with Research
Data Management (RDM) plans
• Provide Cornell social science researchers with support
and expertise in obtaining and using restricted data
8. Other social science research data resources:
• Inter-University Consortium for Political and Social Research
(ICPSR)
• National Archive of Criminal Justice Data
• Minority Data Resource Center
• National Archive of Computerized Data on Aging
• Roper Center for Public Opinion Archives
• International Data Archives
• CESSDA, UKDA, Eurostat
• CESSDA catalog (DDI) provides a multi-lingual interface to datasets from
member social science data archives across Europe
• Non-Governmental Organizations
• National / Governmental Statistical Agencies
9. • CISER Data Archive Catalog:
http://ciser.cornell.edu/ASPs/search.asp
• ICPSR:
www.icpsr.umich.edu/
• Roper Center for Public Opinion Research:
http://www.ropercenter.uconn.edu/
• CESSDA:
http://www.cessda.org/
• Eurostat:
http://www.epp.eurostat.ec.europa.eu/
URLs:
10. CISER Data Archive is located at 391 Pine Tree Road,
Ithaca
CISER is open 8.30am – 4.30pm (Mon-Fri) – walk-in
assistance is not always available – so appointments are
recommended
Location & hours:
Contacts:
Tel.: (607) 255 4801
Email: ciser@cornell.edu
12. Why Manage Research Data?
Current research data management initiatives are based
on three trends:
The data deluge – exponential growth in volume of digital
research artifacts created within academia (often
created by publicly funded research)
Data management is required by multiple disciplines
Increasing perception of the value of data (data as
commodity)
13. What is Research Data Management?
• RDM is an umbrella terms to describe all aspects
of planning, organising, documenting, storing and
sharing research data.
• It also takes into account issues such as
documentation, data protection and
confidentiality.
• It provides a framework that supports researchers
and their data throughout the course of their
research and beyond.
• It is one of the essential areas of responsible
conduct of research
15. Research Data Defined
US Office of Management and Budget in its grants management circular A-110
defines research data as “the recorded factual material commonly accepted in
the scientific community as necessary to validate research findings.”
The KRDS2 study (Beagrie et al, 2009) define research data as ‘collections of
structured digital data from any disciplines or sources which can be used by
academic researchers to undertake their research or provides an evidential
record of their research.’
RIN Classification*
• Observational – real-time, unique, usually irreplaceable
• Experimental – from lab equipment, expensive, often reproducible
• Simulation – generated from models – model & metadata are as important as
output data
• Derived – resulting from processing or combining “raw” data. reproducible
but expensive
• Reference - a (static or organic) collection of smaller (peer-reviewed)
datasets, probably published and curated
* Stewardship of digital research data: a framework of principles and guidelines, Research Information Network, 2008. URL: http://tinyurl.com/l56gftx
16. Research Data Defined
• Research data, unlike other information types, is
collected, observed, or created, for purposes of
analysis to produce original research results.
• Research data can be generated for different
purposes and through different processes in a
multitude of digital formats.
17. Research data comes in many varied formats:
Text Flat text files, Word, Portable Document Format (PDF), Rich‐
Text Format (RTF), Extensible Markup Language (XML).
Numerical SPSS, Stata, Excel.‐
Multimedia - jpeg, tiff, dicom, mpeg, quicktime.
Models - 3D, statistical.
Software - Java, C.
Discipline specific - Flexible Image Transport System (FITS) in
astronomy, Crystallographic Information File (CIF) in chemistry,
Instrument specific - Olympus Confocal Microscope Data Format,Carl
Zeiss Digital Microscopic Image Format (ZVI)
18. Research data may include the
following:
• Documents (text, MS Word), spreadsheets
• Lab books, field notes, diaries
• Questionnaires, transcripts, codebooks
• Audiotapes, videotapes, photographs, images
• Slides, artefacts, specimens, samples
• Collection of digital objects acquired & generated during the research
process
• Database contents (video, audio, text, images)
• Models, algorithms, scripts
• Contents of an application (input, output, logfiles for analysis software,
schemas)
• Methodologies, workflows
• SOPs, protocols
19. By managing your data you will:
• ensure scientific integrity of research and aid replication
• ensure research data and records are accurate, complete, authentic
and reliable
• increase your research efficiency
• save time, effort and resources in the long run
• enhance data security and minimise the risk of data loss
• prevent duplication of effort by enabling others to use your data
• meet funding grant requirements
Note:
It may also be important to manage research records (both digital &
hardcopy) during and beyond the life of the project such as:
correspondence (emails)
grant applications
technical reports
research reports
consent forms
ethics applications
20. What Do Funders Want?
• timely release of data
- once patents are filed or on (acceptance for)
publication.
• data shared openly
- minimal or no restrictions if possible.
• preservation of data
- typically 5-10+ years if of long-term value.
• data management plans
See :
NIH Data Sharing Policy: https://grants.nih.gov/grants/policy/data_sharing/
NSF Data Sharing Policy: http://www.nsf.gov/bfa/dias/policy/dmp.jsp
21. Data Management Plan. What is it?
Funding bodies require researchers to supply detailed, cost-
effective plans for managing research data. These are called Data
Management Plans
A DMP is a document which describes:
What research data will be created.
What policies (funding, institutional, legal) apply to the data.
What data management practices (backups, storage, access
control, archiving) will be used.
What facilities and equipment are equired (hard-disk space,
backup server, repository).
Who will own the copyright and have access to the data.
How long-term preservation will be ensured after the original
research is completed.
The data management plan must be continuously maintained and
kept up-to-date throughout the course of research.
22. Why do we need one?
It improves your research both now and later...
•Data is often valuable for a long time!
•Results of your research may outlast your project.
•Will you use your data throughout your career?
•Prevents loss of digital data and records.
•Prevents loss of usefulness through media and software
obsolescence,
•Forgetting stuff!
Good practice Better research→
23. Why do we need one?
•Ensure research integrity (and repeatability) through
keeping better records.
•People can trace your outcomes from data collection,
through research methodology, through to results.
•Maximises usefulness of data to fellow researchers.
•Highlights how data was collected, quality controls,
how people can and should use it (access and
licensing).
•Facilitates data use within collaboration.
•Can help lead to subsequent research papers.
24. Getting started with a DMP
Gain an understanding of terminology & issues.
Gain understanding of your project/community
– Supervisor and colleagues
– People in your School, i.e. IT Officers, Research
Coordinator/Administrator
Talk to your supervisor about data authorship, IP, licensing,
policies.
Keep it practical and simple, don't spend too much time. What
you don't know leave gaps, investigate, fill in later.
Remember it is never finished! Review it regularly through the
course of your research.
CDL’s DMP Tool: https://dmp.cdlib.org/
Cornell University RDM Services Group - Writing a DMP:
https://confluence.cornell.edu/display/rdmsgweb/data-
management-planning-overview
26. Benefits of organising your data
Research data files and folders need to be labelled and
organised in a systematic way so that:
•Data files are not accidentally overwritten or deleted
•Data files are distinguishable from each other within their
containing folder
•Data file naming prevents confusion when multiple people are
working on shared files
•Data files are easier to locate and browse
•Data files can be retrieved by both creator and by other users
•Data files can be sorted in logical sequence
•Different versions of data files can be identified
•If data files are moved to other storage platforms their names
will retain useful context
27. File Formats & Transformation
• Files are based on either text or binary encoding. The
former is both machine- and human-readable and the latter
only readable by means of appropriate software.
• Thus text files are less likely to become obsolete. Examples
of file name extensions for these files are .txt, .csv
and .por.
• Be aware of the file formats your data exists in
– Does this format require a specific type of software?
– Can others access the data in this format?
– Can alternative formats be used?
• Using widely available or open formats maximises the
chances of your data being stable and usable
28. File Formats & Transformation
•When compressing your data files for storage or
transportation you encode the information using fewer bits than
the original representation. Commonly used compression
programs are Zip and Tar.
•You may use the process of data normalisation. This means to
convert data from one format (e.g. proprietary) into another for
use or preservation (e.g. ASCII).
•If you convert or migrate your data files from one format to
another, be aware of potential risk of data loss or corruption
and take appropriate steps to avoid/minimise it.
•Watch out for backwards compatibility if software is upgraded
30. Documenting Data
There are many reasons why you need to document your
data:
•To help you remember the details later
•To help others understand your research
•Verify your findings
•Review your submitted publication
•Replicate your results
•Archive your data for access and re-use
Some examples of data documentation are:
•Laboratory notebooks
•Field notes
•Questionnaires
31. Documenting Data
Research data need to be documented at various levels:
•Project level
•File or database level
•Variable or item level
The term metadata (‘data about data’) is often used.
The importance of metadata lies in the potential for
machine-to-machine interoperability to assist location and
access to data through search interfaces.
32. Secure data storage:
For the purposes of integrity and efficiency it is important that research
data is stored securely & backed up regularly via:
• Networked drives
• Fileservers managed by department / school / IT Dept.
• Stored in single, secure, accessible place – regular back-ups.
• Personal computers / laptops
• Convenient, temporary storage - should not be used for storing
master copies.
• Local drives may fail & laptops may get lost/stolen.
33. • External storage devices
• Hard drives, USB sticks, CDs, DVDs – low cost & portable BUT not
recommended for long term storage.
• Longevity not guaranteed – degradation over time.
• Easily damaged or misplaced.
• Not big enough for all research data – might be need to use multiple
discs/drives.
• May pose a security threat.
If USB sticks, DVDs, CDs are used for working data or extra back-up
then:
• Choose high quality products from reputable manufacturers.
• Conduct regular checks to ensure media is not failing.
• Periodically refresh data (i.e. copy to a new disc or drive).
• Ensure confidential data is password protected / encrypted
34. • Remote or online back-up services – services that
provides an online system for storing and backing-up computer
files e.g. Dropbox, Mozy, Humyo, A-Drive
• Allow users to store and sync data files online and between
computers.
• Employ cloud computing storage facilities (e.g. Amazon S3).
• Business model – first few GBs free, pay for more space.
35. Backing-up
Considerations for back-up policy:
• Whether all data (full back-up), or only changed data will be
backed-up (incremental back-up)?
• How often full and incremental back-ups will be made?
• How much hard-drive space or DVDs will be required to maintain
this schedule?
• If working with sensitive data, how will it be secured (and
destroyed)?
• What back-up services are available that meet your these needs?
• Who will be responsible for ensuring back-ups are available?
Recommendation:
Keep at least 3 copies of your data (e.g. original, external/local,
and external/remote) and put in place regular back-up procedure
36. Data Security
The means of ensuring that data is kept safe from corruption and
that access to it is suitably controlled. It is important to consider
data security to prevent:
• Accidental or malicious damage / modification to data.
• Theft of valuable or irreplaceable data.
• Breach of confidentiality agreements and privacy laws.
• Release of data before it has been checked for accuracy and
authenticity.
38. Data Protection (also called data privacy)
• In the US, there is no single, comprehensive federal (national) law
regulating the collection and use of personal data. Instead, the US has
a patchwork system of federal and state laws, and regulations that
overlap, dovetail and may contradict one another.
• The combination of an increase in cross-border data flow, together
with the increased enactment of data protection statutes heightens the
risk of privacy violations and creates a significant challenge for a data
owner/distributor.
Data protection is the relationship between:
•collection and dissemination of data
•technology
•the public expectation of privacy and the legal and political issues
surrounding them
39. Rights and access
• Intellectual property rights (IPR) can be defined as rights acquired
over any work created or invented with the intellectual effort of an
individual.
• Facts are not copyrightable but the structure of a database could be.
• As a researcher, you should clarify ownership of and rights relating to
research data before a project starts. This includes the right of access
and the right to make copies.
• Data licences determine the terms and conditions of use by another,
and may accompany a purchase or subscription.
• Open data licences attempt to “set data free” by minimising and
standardising the terms and conditions of re-use. Conditions may
include attribution, non-commercial use, no derivative works, or ‘share
alike’.
40. Open Data Commons (ODC) have prepared a set
of licences each with an accompanying statement
which can be placed with your data on a webpage
that points to your data.
Open Data Commons: http://opendatacommons.org/
41. Benefits of Sharing Data
• Scientific integrity – publishing & citing data in published
research papers can allow others to replicate, validate, or
correct results, thus improving the scientific record.
• Publicly funded research - there is a growing movement for
making publicly funded research available to the public.
• Funding mandates - US Funding Agencies are increasingly
mandating data sharing so as to avoid duplication of effort and
save costs.
• Preserve research data for researchers’ own future use.
43. Research Data MANTRA
Partnership between:
EDINA & Data Library, University of Edinburgh
Institute for Academic Development
Funded by JISC Managing Research Data Programme (Sept.
2010 – Aug. 2011)
Aim was to develop online interactive open learning resources
for PhD students and early career researchers that will:
Raise awareness of the key issues related to research data
management & contribute to culture change.
Provide guidelines for good practice.
44. Eight units with activities, scenarios and videos:
• Research data explained
• Data management plans
• Organising data
• File formats and transformation
• Documentation and metadata
• Storage and security
• Data protection, rights and access
• Preservation, sharing and licensing
Four data handling practicals: SPSS, NVivo, R, ArcGIS
Video stories from researchers in variety of settings
Online Learning Module
45. Online Learning Module
• Delivered online – self-paced, available ‘anytime, anyplace’
• Emphasis on practical experience and active engagement via
online activities
• One hour per unit
• Read and work through scenarios & activities (incl. videos etc)
• CC licence to allow manipulation of content for re-use with
attribution
• Portable content in open standard formats (e.g. SCORM)
• Research data MANTRA course:
http://datalib.edina.ac.uk/mantra
Data, documentation and associated files (e.g. SAS, SPSS, Stata) are housed on the CISER file server. Files are downloaded from the catalog in ZIP compressed format..
Cross-National Time Series data
As CISER is an ICPSR member, researchers can gain access to data held in those CESSDA Archives that are themselves ICPSR members
CESSDA member organisations adhere to a Trans-border Data Access Agreement
European community household panel survey, European Union labour force survey, Community Innovation survey, European health Interview Survey, Structure of Earnings Survey, European Union Statistics on Income and Living Conditions
What about preserving?
Observational – sensor data, survey or sample data, neuroimages – e.g. ocean temperature, voters attitudes before an election, photographs of a supernova
Experimental – e.g. gene sequences, chromatograms, toroid magnetic field data, HPLC, gel electrophoresis, chemical reaction rates,
Simulation – e.g. climate models, economic models, algorithms
Derived – e.g. text and data mining, compiled database, 3D models, maps
Reference - e.g. gene sequence databanks, chemical structures, spatial data portals
Funded by JISC as part of its UK programme, Managing Research Data to develop online learning materials to assist researchers manage their digital assets.
IAD – set up to deliver training and development for postgraduate students and staff – via online course, Virtual Learning Environments, transferable skills training
Shareable Content Object Reference Model – XML-based