This two-part course is a collaboration between CU Libraries/Information Services and the Office of Research Compliance & Training. The purpose of this course is to familiarize you with the various aspects of research data management (RDM)
Part 1: Why RDM is both recommended and required
What research data are
Who is responsible for RDM
Part 2:
When RDM activities occur
How you can carry out RDM activities
Research Data Management: Part 1, Principles & ResponsibilitiesAmyLN
This two-part course is a collaboration between CU Libraries/Information Services and the Office of Research Compliance & Training. The purpose of this course is to familiarize you with the various aspects of research data management (RDM)
Part 1: Why RDM is both recommended and required
What research data are
Who is responsible for RDM
Part 2:
When RDM activities occur
How you can carry out RDM activities
Data Management Lab: Session 4 Slides (more details at http://ulib.iupui.edu/digitalscholarship/dataservices/datamgmtlab)
What you will learn:
1. Build awareness of research data management issues associated with digital data.
2. Introduce methods to address common data management issues and facilitate data integrity.
3. Introduce institutional resources supporting effective data management methods.
4. Build proficiency in applying these methods.
5. Build strategic skills that enable attendees to solve new data management problems.
Presentation for Northwestern University's first Computational Research Day, April 22, 2014. http://www.it.northwestern.edu/research/about/campus-events/research-day/agenda.html . By Cunera Buys, e-Science Librarian, and Claire Stewart, Director, Center for Scholarly Communication and Digital Curation and Head, Digital Collections
Research Data Management: Part 1, Principles & ResponsibilitiesAmyLN
This two-part course is a collaboration between CU Libraries/Information Services and the Office of Research Compliance & Training. The purpose of this course is to familiarize you with the various aspects of research data management (RDM)
Part 1: Why RDM is both recommended and required
What research data are
Who is responsible for RDM
Part 2:
When RDM activities occur
How you can carry out RDM activities
Data Management Lab: Session 4 Slides (more details at http://ulib.iupui.edu/digitalscholarship/dataservices/datamgmtlab)
What you will learn:
1. Build awareness of research data management issues associated with digital data.
2. Introduce methods to address common data management issues and facilitate data integrity.
3. Introduce institutional resources supporting effective data management methods.
4. Build proficiency in applying these methods.
5. Build strategic skills that enable attendees to solve new data management problems.
Presentation for Northwestern University's first Computational Research Day, April 22, 2014. http://www.it.northwestern.edu/research/about/campus-events/research-day/agenda.html . By Cunera Buys, e-Science Librarian, and Claire Stewart, Director, Center for Scholarly Communication and Digital Curation and Head, Digital Collections
Presentation from a University of York Library workshop on research data management. The workshop provides an introduction to research data management, covering best practice for the successful organisation, storage, documentation, archiving, and sharing of research data.
This slideshow was used at a lunchtime session delivered at the Humanities Division, University of Oxford, on 2014-05-12. It provides a general overview of some key data management topics, plus some pointers on where to find further information.
A basic course on Research data management: part 1 - part 4Leon Osinski
Slides belonging to a basic course on research data management. The course consists of 4 parts:
Part 1: what and why
1.1 data management plans
Part 2: protecting and organizing your data
2.1 data safety and data security
2.2 file naming, organizing data (TIER documentation protocol)
Part 3: sharing your data
3.1 via collaboration platforms (during research)
3.2 via data archives (after your research)
Part 4: caring for your data, or making data usable
4.1 tidy data
4.2 documentation/metadata
4.3 licenses
4.4 open data formats
This slideshow was used in a Preparing Your Research Data for the Future course taught in the Medical Sciences Division, University of Oxford, on 2015-06-08. It provides an overview of some key issues, focusing on long-term data management, sharing, and curation.
This presentation was delivered at the Elsevier Library Connect Seminar on 6 October 2014 in Johannesburg, 7 October 2014 in Durban and 9 October 2014 in Cape Town and gives an overview of the potential role that librarians can play in research data management
Going Full Circle: Research Data Management @ University of PretoriaJohann van Wyk
Presentation delivered at the eResearch Africa Conference, held 23-27 November 2014, at the University of Cape Town, Cape Town, South Africa. Various approaches to Research Data Management at Higher Education Institutions focus on an aspect or two of the research data cycle. At the University of Pretoria the approach has been to support researchers throughout the research process covering the whole research data cycle. The idea is to facilitate/capture the research data throughout the research cycle. This will give context to the data and will add provenance to the data. The University of Pretoria uses the UK Data Archive’s research data cycle model, to align its Research Data Management project-development. This model identifies the stages of a research data cycle as: creating data, processing data, analysing data, preserving data, giving access to data, and reusing data. This paper will give a short overview of the chronological development of research data management at the University of Pretoria. The overview will also highlight findings of two surveys done at the University, one in 2009 and one in 2013. This will be followed by a discussion of a number of pilot projects at the University, and how the needs of researchers involved in these projects are being addressed in a number of the stages of the research data cycle. The discussion will also give a short overview of how the University plans to support those stages not currently being addressed. The second part of the presentation will focus on the projects and technology (software and hardware) used. The University of Pretoria has adopted an Enterprise Content Management (ECM) approach to manage its Research Data. ECM is not a singular platform or system but rather a set of strategies, tools and methodologies that interoperate with each other to create a comprehensive management tool. These sets create an all-encompassing process addressing document, web, records and digital asset management. At the University of Pretoria we address all these processes with different software suites and tools to create a complete management system. Each process presented its own technical challenges. These had to be addressed, while keeping in mind the end objective of supporting researchers throughout the whole research process and data life cycle. Various platforms and standards have been adopted to meet the University of Pretoria’s criteria. To date three processes have been addressed namely, the capturing of data during the research process, the dissemination of data and the preservation of data.
Compliance: Data Management Plans and Public Access to DataMargaret Henderson
Presented at The 8th Annual University of Massachusetts and New England Area Librarian e-Science Symposium, Wednesday, April 6, 2016
University of Massachusetts Medical School
Presentation from a University of York Library workshop on research data management. The workshop provides an introduction to research data management, covering best practice for the successful organisation, storage, documentation, archiving, and sharing of research data.
This slideshow was used at a lunchtime session delivered at the Humanities Division, University of Oxford, on 2014-05-12. It provides a general overview of some key data management topics, plus some pointers on where to find further information.
A basic course on Research data management: part 1 - part 4Leon Osinski
Slides belonging to a basic course on research data management. The course consists of 4 parts:
Part 1: what and why
1.1 data management plans
Part 2: protecting and organizing your data
2.1 data safety and data security
2.2 file naming, organizing data (TIER documentation protocol)
Part 3: sharing your data
3.1 via collaboration platforms (during research)
3.2 via data archives (after your research)
Part 4: caring for your data, or making data usable
4.1 tidy data
4.2 documentation/metadata
4.3 licenses
4.4 open data formats
This slideshow was used in a Preparing Your Research Data for the Future course taught in the Medical Sciences Division, University of Oxford, on 2015-06-08. It provides an overview of some key issues, focusing on long-term data management, sharing, and curation.
This presentation was delivered at the Elsevier Library Connect Seminar on 6 October 2014 in Johannesburg, 7 October 2014 in Durban and 9 October 2014 in Cape Town and gives an overview of the potential role that librarians can play in research data management
Going Full Circle: Research Data Management @ University of PretoriaJohann van Wyk
Presentation delivered at the eResearch Africa Conference, held 23-27 November 2014, at the University of Cape Town, Cape Town, South Africa. Various approaches to Research Data Management at Higher Education Institutions focus on an aspect or two of the research data cycle. At the University of Pretoria the approach has been to support researchers throughout the research process covering the whole research data cycle. The idea is to facilitate/capture the research data throughout the research cycle. This will give context to the data and will add provenance to the data. The University of Pretoria uses the UK Data Archive’s research data cycle model, to align its Research Data Management project-development. This model identifies the stages of a research data cycle as: creating data, processing data, analysing data, preserving data, giving access to data, and reusing data. This paper will give a short overview of the chronological development of research data management at the University of Pretoria. The overview will also highlight findings of two surveys done at the University, one in 2009 and one in 2013. This will be followed by a discussion of a number of pilot projects at the University, and how the needs of researchers involved in these projects are being addressed in a number of the stages of the research data cycle. The discussion will also give a short overview of how the University plans to support those stages not currently being addressed. The second part of the presentation will focus on the projects and technology (software and hardware) used. The University of Pretoria has adopted an Enterprise Content Management (ECM) approach to manage its Research Data. ECM is not a singular platform or system but rather a set of strategies, tools and methodologies that interoperate with each other to create a comprehensive management tool. These sets create an all-encompassing process addressing document, web, records and digital asset management. At the University of Pretoria we address all these processes with different software suites and tools to create a complete management system. Each process presented its own technical challenges. These had to be addressed, while keeping in mind the end objective of supporting researchers throughout the whole research process and data life cycle. Various platforms and standards have been adopted to meet the University of Pretoria’s criteria. To date three processes have been addressed namely, the capturing of data during the research process, the dissemination of data and the preservation of data.
Compliance: Data Management Plans and Public Access to DataMargaret Henderson
Presented at The 8th Annual University of Massachusetts and New England Area Librarian e-Science Symposium, Wednesday, April 6, 2016
University of Massachusetts Medical School
Session presented by Judith Carr, Research Data Manager at the University of Liverpool on Research Data Management and your PhD.
Aim:- To show how research data management can contribute to the success of your PhD.
Covers:
* What is research data and why it is important?
* The Research Data lifecycle
Research Data – more than just your results
* FAIR data and Open Research
DMP online tool
An introduction to Research Data Management and Data Management Planning for research managers and administrators. The presentation was given at the Open University on 18th July 2013.
Introduction to Data Management PlanningErin Owens
Data management planning is an essential step of preparing to launch a research project, but it's often not given the robust consideration it deserves. External funders are increasingly requiring research funding proposals to include detailed plans for how data will be accurately and effectively collected, maintained, preserved, and shared. Even without a funder requirement, sound data management planning improves accuracy and efficiency of research data collection. This session from the Scholarly Communications Librarian at Sam Houston State University will walk step by step through the process of data management planning; participants will leave with an outline of their own plan and a list of useful resources.
OU Library Research Support webinar: Working with research dataIzzyChad
Slides from a webinar delivered on 31st January 2018 for OU research staff and students. Covers practical strategies for managing research data, including policies, file naming, information security, metadata and working with sensitive data.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...2023240532
Quantitative data Analysis
Overview
Reliability Analysis (Cronbach Alpha)
Common Method Bias (Harman Single Factor Test)
Frequency Analysis (Demographic)
Descriptive Analysis
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
1. Managing Research Data
Part 2
WHY – WHAT– WHO – WHEN & HOW
Planning Working
Finalizing
Sharing
Data
This work is licensed under a Creative Commons
Attribution 4.0 International License.
2. WHY manage data -
WHAT research data are-
WHO manages research data -
WHEN & HOW data management is done -
Planning Working
Finalizing
Sharing
Data
Managing Research Data
This work is licensed under a Creative Commons
Attribution 4.0 International License.
3. This two-part course is a collaboration between CU Libraries/
Information Services and the Office of Research Compliance &
Training. The purpose of this course is to familiarize you with the
various aspects of research data management (RDM) by taking
3
Managing Research Data
56/
Managing Research Data
This course will guide you through these areas, offering in-depth
details on each of them. Please refer to the top navigation to keep
track of which area you are currently exploring.
• Why RDM is both recommended and required
• What research data are
• Who is responsible for RDM
• When RDM activities occur
• How you can carry out RDM activities
Part 1:
Part 2:
4. Learning objectives:
At the end of this training you will be able to:
• Identify at which research stages data management activities occur
• Understand practical details of research data management such as:
– File naming
– File formats
– Spreadsheet structure
– Data preservation
4
Managing Research Data
56/
Managing Research Data
5. Links to many of the references and
policies referred to in this course can be
found on the final slides.
Have Fun!
5
Managing Research Data
56/
Managing Research Data
6. When does Research Data
Management happen?
How is it done?
WHY –WHAT – WHO – WHEN & HOW
656/
Managing Research Data
7. WHY –WHAT – WHO – WHEN & HOW
Planning Working
Finalizing
Sharing
Data
756/
Managing Research Data
9. When planning to manage
data or writing a data
management plan consider:
Planning
956/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
• What data will be shared?
• Who will have access to the data?
• Where will the data to be shared be located?
• When will the data be shared?
• How will researchers locate and access the data?
10. CONSIDER:
• File format
• File sizes
• Changing rates of data production
• Anticipated size of project data
• Storage & Back-up
• Privacy / security requirements
• Data description
• Retention period
• Sharing requirements
Planning
1056/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Plan for the entire data life-cycle.
11. • Non-proprietary
• Open, documented standard
• Standard representation (e.g., ASCII, Unicode)
• Common, or commonly used by the research community (e.g.
FITS, CIF)
• Unencrypted
• Uncompressed
Planning
Some
commonly
recognized
formats
to
avoid
for
storage
include:
Word
[.doc(x)],
SPSS
[.sav],
Excel
[.xls(x)],
STATA
[.dta],
Access
[.mdb,
.accdb],
JPEG
[.jpg],
.gif,
QuickIme
[.mov],
SAS
[.sas]
Some
commonly
recognized
formats
meeIng
these
criteria:
ASCII
[e.g.,
.csv,
.txt],
PDF
[.pdf],
FLAC,
TIFF,
JPEG2000
[.jp2],
MPEG-‐4
[.mp4],
XML
[.xml,
.odf,
.rdf],
R
[.r]
11
http://www.data-archive.ac.uk/media/2894/managingsharing.pdf
http://www.digitalpreservation.gov/formats/index.shtml?PHPSESSID=c26c5e5101396d5f5ebacedb13cae6e356/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Storage file formats should be:
X
✓
Not
sure
about
the
extension?
Check
hYps://
www.naIonalarchives.
gov.uk/PRONOM/
default.htm
12. Storage / Back-ups Planning
Lifespan of Storage Media: http://www.crashplan.com/medialifespan/1256/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
No storage medium lasts forever.
Consider the following media life-spans:
13. When choosing storage and back-up options you should:
• Reduce the risk of damage or loss
• Use multiple locations (here, near, far)
• Create a back-up schedule
• Use reliable back-up media
• Test your back-up system (i.e., test file recovery, checksums)
Planning
1356/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Storage / Back-ups
14. Remember to:
• Back up data frequently
• Make 3 copies
– Original (here)
– External/local (near)
– External/remote – different geographic area (far)
• Verify recovery is possible
– Confirm that file has not been corrupted, e.g., checksum
validation
– Make sure you can reload the file, i.e., test file restore after
initial set-up
– Check file recovery periodically & systematically thereafter
Planning
1456/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Storage / Back-ups
15. Consider physical, network, computer system and file security
for:
• Intellectual Property –Trade secrets, commercial
information, confidential materials, restricted data
• Personally identifying information (PII)
• Personal health information (PHI)
• High-security data
Planning
1556/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Privacy & Security
16. CU Information Security Charter:
“Users are persons who use Information Resources. Users
are responsible for ensuring that such Resources are used
properly in compliance with the Columbia University
Acceptable Usage of Information Resources Policy
http://policylibrary.columbia.edu/acceptable-usage-
information-resources-policy,
information is not made available to unauthorized persons,
and appropriate security controls are in place.”
Planning
16 http://policylibrary.columbia.edu/information-security-charter56/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Privacy & Security
17. (Some) Best practices for handling sensitive data:
• Restrict physical access to computers, offices and storage media
• Encrypt any device (mobile, laptop, desktop, tablet, removable media
[e.g., USB flash drives, CDs, hard drives]) containing sensitive data
• Store lab notebooks, research records, in locked cabinets
• Keep confidential and sensitive data on computers not connected to the
Internet
• Don't send confidential data via e-mail or FTP (use encryption, if you
must)
• Use strong passwords on files and computers
• Sanitize all systems before reusing, disposing, or donating
Planning
1756/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Privacy & Security
18. • Lab notebooks
• Data descriptions / code book
• File naming
– Consistency: Pick a system, write it down, & stick with it
– Identify necessary elements
– Create brief, understandable names
– Date: YYYY-MM-DD
– Version: v01, v02,…FINAL
In general, try to stay away from spaces in filenames as well as the following
characters:
. / : * ? “ < > | [ ] & $
• File / directory structure
• Sometimes there is a community standard for data formatting &
description for sharing/integration (aka metadata schema) – Find
yours!
Planning
1856/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Data Description / Documentation
19. Plan to keep your data according to:
• CU Data Retention Policy: at least 3 years
• Funder requirements: It varies – check them!
• Regulations
• Contract terms, for industry sponsored research
• The importance of the data, regardless of external
requirements
Planning
1956/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Retention Period
20. Do you plan to share your data? Prepare to follow the
requirements of your:
• Funder
• Journal
• Discipline
• Data repository
Planning
2056/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Planning to share
22. Review the data management plan:
• Are you following it?
• Did it survive first contact with the research? If not,
– Does it need to be revised?
– Take the opportunity to change it as necessary,
documenting the changes
Working
2256/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Data Collection
23. Revisit your:
• File naming conventions
– Are they written down?
– Does everyone on the project know & follow them?
• File structure / organization / tagging
– Is it easy to understand / logical?
– Is everyone on the project familiar with the organizational
practices so they can store and find files efficiently?
• Back-up processes
– Are they working?
– Are they being followed?
Working
2356/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Data Collection
24. Are you using someone else’s data as part of your research?
You should probably cite it…
Consider a citation management software to keep track of it:
Working
hYp://
library.columbia.edu/
research/citaIon-‐
management.html
2456/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Data Collection
25. Using a spreadsheet for your data? Structure your data so that
it’s easily sortable & usable by other software/machines. Be
consistent with your:
• Labels
• Types
• Formats
• Layout
(Alternatively, consider using a database for easier data management)
Working
2556/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Data Collection
26. Spreadsheet labels:
Adopt a consistent style that indicates a cell contains a label
rather than a value
Working
Date
Instrument
SoundLevel_R
SoundLevel_L
Amp_Se7ng
2013-‐12-‐22
BK-‐732A
84.6
86.0
3
2013-‐12-‐23
BK-‐732A
115.2
116.4
9
2013-‐12-‐24
BK-‐732A
128.7
130.0
11
Date:
12/22/2013
Instrument
BK732A
Sound
lev
Right
<85
Amplifier
3
(27%)
Lei
86.0
Date:
Dec
23,
2013
Instrument
Amp_Sekng
SoundLevel-‐R
115
BK_732-‐A
9
SoundLevel_L
116.4
J
L
2656/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Data Collection
27. Spreadsheet types:
Don’t mix text & number types in the same column
Working
Date
Instrument
SoundLevel_R
SoundLevel_L
Amp_Se7ng
2013-‐12-‐22
BK-‐732A
84.6
86.0
3
2013-‐12-‐23
BK-‐732A
115.2
116.4
9
2013-‐12-‐24
BK-‐732A
128.7
130.0
11
Date:
12/22/2013
Instrument
BK732A
Sound
lev
Right
<85
Amplifier
3
(27%)
Lei
86.0
Date:
Dec
23,
2013
Instrument
Amp_Sekng
SoundLevel-‐R
115
BK_732-‐A
9
SoundLevel_L
116.4
J
L
2756/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Data Collection
28. Spreadsheet formats:
Do all of your dates or other variable values look the same?
Working
Date
Instrument
SoundLevel_R
SoundLevel_L
Amp_Se7ng
2013-‐12-‐22
BK-‐732A
84.6
86.0
3
2013-‐12-‐23
BK-‐732A
115.2
116.4
9
2013-‐12-‐24
BK-‐732A
128.7
130.0
11
J
L
Date:
12/22/2013
Instrument
BK732A
Sound
lev
Right
<85
Amplifier
3
(27%)
Lei
86.0
Date:
Dec
23,
2013
Instrument
Amp_Sekng
SoundLevel-‐R
115
BK_732-‐A
9
SoundLevel_L
116.4
2856/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Data Collection
29. Spreadsheet layout:
Tables of similar data should be structured similarly
Working
Date
Instrument
SoundLevel_R
SoundLevel_L
Amp_Se7ng
2013-‐12-‐22
BK-‐732A
84.6
86.0
3
2013-‐12-‐23
BK-‐732A
115.2
116.4
9
2013-‐12-‐24
BK-‐732A
128.7
130.0
11
J
L
Date:
12/22/2013
Instrument
BK732A
Sound
lev
Right
<85
Amplifier
3
(27%)
Lei
86.0
Date:
Dec
23,
2013
Instrument
Amp_Sekng
SoundLevel-‐R
115
BK_732-‐A
9
SoundLevel_L
116.4
2956/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Data Collection
30. When it’s not a spreadsheet (it may be a database):
Be consistent!
• Consistent process
• Consistent organization
• Consistent descriptions
AND
• Consistently documenting everything that’s done
Working
3056/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Data Collection
31. (Some)Best practices for assuring quality data entry:
• System-limited value entry, i.e., hard code controlled lists of
values
• Check 5-10% of data records manually
• Check out-of-range values
• Check empty values / blank fields
• Consider using a data entry program or double entry keying
Working
3156/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Quality Assurance / Control
32. • Keep an untouched, “raw” copy of the data file – Make it Read
Only
• Save cleaned or analyzed data as new files (with good file
names, as previously described)
– Take extensive notes of the actions taken or scripts used to
“clean” the data
• Use a scripted language (e.g., R, SAS, SPSS) to consistently
process data and create a record of data processing & analysis
• Document scripts / code with comments!
• Write a ReadMe.txt file as you go, rather than trying to
remember what you did later
Working
3256/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Analysis
33. Create a file to document your project. Include:
• What data are being collected & why
• Names of project files (data & analysis)
• Project file naming and file organization conventions
• Data definitions (aka Code Book or Data Dictionary) – more
next slide
• Project standards
• Calibration, precision, accuracy & units of instruments or
measurements
Working
3356/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Analysis - documenting
34. Code books / data dictionaries should include:
• Data codes or coding keys
• Missing value codes
• Field name / Column header / Data label
– Definition e.g., Amp_setting | Dial setting of guitar amplifier
– Values – Possible values e.g., from 0 to 11, whole numbers
– Units – may be included in either Definitions or Values
– Type e.g., string, float, char, date [YYYYMMDD]
Working
3456/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Analysis - documenting
36. • Check requirements
• Are data useful/usable
• Select data for preservation
• Choose publication path
• Consider publishing negative data – Others may find it useful
• Repositories
Finalizing
3656/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Preparing data for publication,
sharing, storage, preservation:
37. Have you fulfilled the expectations of your:
• Funder
• Journal
• Discipline
• Repository
• Institution
Finalizing
?3756/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Check requirements
38. Are data consistently:
• Formatted
• Named
• Organized
• Described / Documented
Are they in a file format that may be easily accessed and reused?
Finalizing
3856/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Usability of data
39. You might consider long-term preservation if the answer to any
of these questions is “Yes”
• Do the data support published research?
• Are the data difficult or expensive to regenerate?
• Are the data required for your research but from another source
(i.e. not your original research data)?
– If so, is the future availability of that data from the original source
uncertain?
• Do you plan to share your data, or are you required to per funder
agreement?
• Are the data historically significant?
• Are the data vulnerable to loss, corruption, endangerment, etc.?
Finalizing
https://lib.stanford.edu/data-services/preserve3956/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Data preservation: Selection
41. Honestly, the party most interested in the data you are
producing today is probably:
Your future self
But there are others to consider, too, so…
Sharing
4156/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Who will you share with?
42. “By ‘final research data,’ we mean recorded factual material
commonly accepted in the scientific community as necessary to
validate research findings.”
NIH FAQ Data Sharing (3/03)
Guidelines will “be determined by the community of interest”
and “may include…data, publications, samples, physical
collections, software and models.”
Data Management and Sharing FAQ (11/10)
Sharing
4256/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
What to share
43. “Timely release and sharing’ is defined as no later than the
acceptance for publication of the main findings from the final
data set.”
Data Sharing Policy, Section II.8.2.3.1, NIH Grants Policy Statement (10/12)
“The expectation is that all data will be made available after a
reasonable length of time….[which] will be determined by the
community of interest…”
Data Management and Sharing FAQ (11/10)
Sharing
4356/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
When to share
44. There are many paths to publishing data:
• Data paper / Data journal
• Supplementary material
• Data repositories
Wherever you publish, make sure people can find it, use it, and
give you credit for it! (this usually requires a permanent
identifier e.g., DOI)
Do you have negative data? Others may find it useful – consider
making it available!
Sharing
4456/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Publishing data
45. Institutional repository
Columbia’s repository accepts materials from faculty, students,
and staff. It offers:
• Long-term preservation strategy
• Multiple back-ups (including off site)
• Quality content descriptions for increased discoverability
• Monthly usage reports
• Permanent URL & doi
Sharing
4556/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Repositories
46. Disciplinary repository e.g.,
• GenBank
• RCSB Protein Data Bank
• ICPSR
Sharing
4656/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Repositories
47. Public access repository e.g.,
• Figshare.com
• DataDryad.org
• ResearchCompendia
• Academic Commons
Sharing
4756/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Repositories
48. • How would you like your data cited
• Licensing
• Privacy/confidentiality/anonymization – Revisit IRB
commitments
• What to share
• When to share
Sharing
4856/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
When you share your data, consider:
49. • Publish your data, and make sure to cite it in your journal
publication
• When publishing your data, provide a preferred citation
• Did you use someone else’s data?
– Check the license for restrictions
– Provide the following minimum in your work’s citations:
• Title
• Author/Creator name
• Publisher
• Publication year
• Unique identifier e.g., DOI
Sharing
4956/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Data citation
50. In the USA, facts – which means most datasets – are outside the
scope of copyright protection. Some researchers have adopted
the practice of data licensing because of this.
There are many different license types, with varied provisions
for reuse and attribution. When thinking about licenses keep in
mind:
• Funder requirements
• Institutional requirements
• Scientific and scholarly ethos of extending knowledge
Sharing
5056/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Licensing
51. Revisit:
• IRB commitments
• Privacy Board requirements (HIPAA)
• Institutional requirements
• Ethical considerations
Consider:
• Have direct identifiers been removed?
• Have indirect identifiers that could reveal identity when
combined been managed?
• Does relational or spatial data have the possibility of
identifying participants?
Maintain the maximum amount of detail possible without
compromising participants confidentiality.
Sharing
5156/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Privacy / Anonymization
52. .
Planning Working
Finalizing
Sharing
Data
• Managing research data
takes place at every stage of
the research and scholarly
process
– Planning
– Working, where you
follow the plan,
collecting and analyzing
data
– Finalizing, where you make sure you followed the plan
– Sharing, where you sigh with relief; it’s so simple, because you
followed your plan!
• Research data management can be complex, but there are resources
available
5256/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Take-aways
à SEE NEXT PAGE!
53. Resources for Research
Data Management
links located to the left
WHY –WHAT – WHO – WHEN & HOW
5356/
Managing Research Data
Resources for Research Data Management:
WHY –WHAT – WHO – WHEN & HOW
Title
URL
Scholarly Communications Program,
Data Management
http://scholcomm.columbia.edu/data-management/
Research and Data Integrity Program
(ReaDI)
http://www.columbia.edu/cu/compliance/docs/ReaDI_Program/
index.html
Data Management Plan Templates
http://scholcomm.columbia.edu/data-management/data-
management-plan-templates/
CUIT Research Computing Services http://rcs.columbia.edu
Academic Commons Archival Storage http://academiccommons.columbia.edu/about
Citation Management http://library.columbia.edu/research/citation-management.html
Managing Secure Information -
Training
http://columbia.sighttraining.com
Data Security Policies http://policylibrary.columbia.edu/category/computingtechnology
This work is licensed under a Creative Commons
Attribution 4.0 International License.
54. RESOURCES
• CU Data Policies & Procedures:
– Faculty Handbook
– Sponsored Projects Handbook
– Clinical Research Handbook
– Administrative Policy Library, Security Policies
e.g., Electronic Information Resources Security, Data
Classification Policy, Policy on Electronic Data Security Breach
Reporting and Response
• Scholarly Communications Program
• Office of Research Compliance and Training
5456/
Managing Research Data This work is licensed under a Creative Commons
Attribution 4.0 International License.
55. RESOURCES
• Data Management Plans
• CUIT Active Storage options
• Academic Commons archival storage
• Citation management
• Executive Vice President’s Office of Research (EVPR)
• Training on managing Personal Health Information
(PHI)
• Research and Data Integrity Program (ReaDI)
5556/
Managing Research Data This work is licensed under a Creative Commons
Attribution 4.0 International License.
56. REFERENCES
• ScoY,
Mark,
Boardman,
Richard
P.,
Reed,
Philippa
A.S.
and
Cox,
Simon
J.
(2012)
Introducing
research
data.
Southampton,
GB,
Univeristy
of
Southampton,
29pp.
hYp://eprints.soton.ac.uk/338816/
• Responsible
research
data
management
and
the
prevenIon
of
scienIfic
misconduct
www.knaw.nl/Content/Internet_KNAW/publicaIes/pdf/2013569.pdf
• hYp://dmconsult.library.virginia.edu/
5656/
Managing Research Data
Created
by:
Amy
Nurnberger,
2015-‐05-‐12
This work is licensed under a Creative Commons
Attribution 4.0 International License.