Data Management for Undergraduate Research

Data Management for
Undergraduate
Researchers
Office of Undergraduate Research Seminar and Workshop Series
Rebekah Cummings, Research Data Management Librarian
J. Willard Marriott Library, University of Utah
June 18, 2015

• Introductions
• What are data?
• Why manage data?
• Data Management Plans
• File Naming
• Metadata
• Storage and Archiving
• Questions

What are data?
“The recorded factual material
commonly accepted in the research
community as necessary to validate
research findings.”
- U.S. OMB Circular A-110

Why manage data?
Your best collaborator is yourself
six months from now, and your past
self doesn’t answer emails.

Why else manage data?
• Save time and efficiency
• Meet grant requirements
• Promote reproducible research
• Enable new discoveries from your data
• Make the results of publicly funded research
publicly available

We are trying to avoid
this scenario…

Two bears data
management problems
1. Didn’t know where he stored the data
2. Saved one copy of the data on a USB drive
3. Data was in a format that could only be read by
outdated, proprietary software
4. No codebook to explain the variable names
5. Variable names were not descriptive
6. No contact information for the co-author Sam Lee

Data Management Plan
PLANNINGPLANNING
Courtesy of the UK Data
Archive http://www.data-
archive.ac.uk/create-manage/life-
cycle

Scenario
You develop a research project during your
undergraduate experience.You write up the
results, which are accepted by a reputable
journal. People start citing your work! Three
years later someone accuses you of falsifying
your work.
Scenario adapted from MANTRA training
module

• Would you be able to prove you did the
work as you described in the article?
• What would you need to prove you hadn’t
falsified the data?
• What should you have done throughout
your research study to be able to prove
you did the work as described?

Elements of a DMP
• Types of data, including file formats
• Data description
• Data storage
• Data sharing, including confidentiality or
security restrictions
• Data archiving and responsibility
• Data management costs

File naming best
practices
• Be descriptive
• Don’t be generic
• Appropriate length
• Be consistent

• PLPP_EvaluationData_Workshop2_2014.xlsx
• MyData.xlsx
• publiclibrarypartnershipsprojectevaluationdataw
orkshop22014CummingsHelenaMontana.xlsx
Who filed better?

File naming best practices
• Files should include only letters, numbers, and
underscores.
• No special characters (%@#*?!)
• No spaces
• Lowercase or camel case (LikeThis)
• Not all systems are case sensitive.Assume this,
THIS, and tHiS are the same.

Dates and numbering…
1. Use leading zeros for scalability
001
002
009
019
999
2. If using dates use YYYYMMDD
June2015 = BAD!
06-18-2015 = BAD!
20150618 = GREAT!
2015-06-18 = This is fine too 

Who filed better?
• July 24 2014_SoilSamples%_v6
• 20140724_NSF_SoilSamples_Cummings
• SoilSamples_FINAL

File organization best
practices
• Top level folder should include project title
and date.
• Sub-structure should have a clear and
consistent naming convention.
• Document your structure in a README
text file.

Metadata
Unstructured
Data
Structured
Data
There was a study put out by Dr. Gary
Bradshaw from the University of
Nebraska Medical Center in 1982
called “ Growth of Rodent Kidney
Cells in Serum Media and the Effect of
Viral Transformation On Growth”. It
concerns the cytology of kidney cells.
Title Growth of rodent
kidney cells in serum
media and the effect of
viral transformations on
growth.
Author Gary Bradshaw
Date 1982
Publisher University of Nebraska
Medical Center
Subject Kidney -- Cytology

Data documentation
includes…
• Questionnaires
• Interview protocols
• Lab notebooks
• Code or scripts
• Consent forms
• Samples, weights, methods
• Read me files

LOCKSS (Lots of
Copies Keeps
Stuff Safe)

Options for data
storage
• Personal computers or laptops
• Networked drives
• External storage devices

Storing sensitive data
• If possible, collect the necessary data
without using direct identifiers
• Otherwise, de-identify your data upon
collection or immediately afterwards
• Do not store or share sensitive data on
unencrypted devices
• Talk to IRB

Archiving options
• Public repository – FigShare
• Domain-specific repository
• Institutional repository

Major takeaways
• Data management starts at the beginning of
a project
• Document your data so that someone else
could understand it
• Have more than one copy of your data
• Consider archiving options when you are
done with your project

Questions?
rebekah.cummings@utah.edu
(801) 581-7701
Marriott Library, 1705Y
…or ask now!

Data Management for Undergraduate Research

More Related Content

What's hot

Similar to Data Management for Undergraduate Research

More from Rebekah Cummings

Recently uploaded

Data Management for Undergraduate Research

Editor's Notes