Data Management for Undergraduate Researchers (updated - 02/2016)

Data Management for
Undergraduate
Researchers
Office of Undergraduate Research Seminar and Workshop Series
Rebekah Cummings, Research Data Management Librarian
J. Willard Marriott Library, University of Utah
February 23, 2016

• Introductions
• What are data?
• Why manage data?
• Data Management Plans
• Data Organization
• Metadata
• Storage and Archiving
• Questions

What is data management?
The process of controlling the
information (read: data) generated
during a research project.
https://www.libraries.psu.edu/psul/pubcur/what_is_dm.html

What are data?
“The recorded factual material
commonly accepted in the research
community as necessary to validate
research findings.”
- U.S. OMB Circular A-110

Why manage data?
• Save time and efficiency
• Meet grant requirements
• Promote reproducible research
• Enable new discoveries from your data
• Make the results of publicly funded research
publicly available

We are trying to avoid
this scenario…

Two bears data
management problems
1. Didn’t know where he stored the data
2. Saved one copy of the data on a USB drive
3. Data was in a format that could only be read by
outdated, proprietary software
4. No codebook to explain the variable names
5. Variable names were not descriptive
6. No contact information for the co-author Sam Lee

Scenario
You develop a research project during your
undergraduate experience.You write up the
results, which are accepted by a reputable
journal. People start citing your work! Three
years later someone accuses you of falsifying
your work.
Scenario adapted from MANTRA training
module

• Would you be able to prove you did the
work as you described in the article?
• What would you need to prove you hadn’t
falsified the data?
• What should you have done throughout
your research study to be able to prove
you did the work as described?

Data Management Plans
• What data are generated by your research?
• What is your plan for managing the data?
• How will your data be shared?

Elements of a DMP
• Types of data, including file formats
• Data description
• Data storage
• Data sharing, including confidentiality or
security restrictions
• Data archiving and responsibility
• Data management costs

MyData.xls
MeetingNotes.doc
Presentation.ppt
Assignment1.pdf

File naming best practices
1. Be descriptive not
generic
2. Appropriate length
(about 25 chars or less)
3. Be consistent
4. Think critically about
your file names

File naming best practices
• Files should include only letters,
numbers, and underscores/dashes.
• No special characters
• No spaces; Use dashes, underscores, or
camel case (like-this or likeThis)
• Avoid case dependency.Assume this,
THIS, and tHiS are the same.
• Have a strategy for version control.
• Don’t overwrite file extensions

Version Control - Numbering
001
002
003
009
010
099
Use leading zeros for
scalability
Bonus Tip: Use ordinal numbers (v1,v2,v3) for major version
changes and decimals for minor changes (v1.1, v2.6)
1
10
2
3
9
99

Version Control - Dates
If using dates useYYYYMMDD
June2015 = BAD!
06-18-2015 = BAD!
20150618 = GREAT!
2015-06-18 = This is fine too 

From a DMP…
“Each file name, for all types of data, will
contain the project acronym PUCCUK; a
reference to the file content (survey,
interview, media) and the date of an event
(such as the date of an interview).

• PLPP_EvaluationData_Workshop2_2014.xlsx
• MyData.xlsx
• publiclibrarypartnershipsprojectevaluationdataw
orkshop22014CummingsHelenaMontana.xlsx
Who filed better?

Who filed better?
• July 24 2014_SoilSamples%_v6
• 20140724_NSF_SoilSamples_Cummings
• SoilSamples_FINAL

Structuring folders and files
• Consider all the types of files you will handle during the course
of your project.
• Develop a nested folder structure that makes sense for your
project and your team’s retrieval needs.
• Name folders clearly, without special characters (avoid
redundancy)
• Use a standard folder structure for each project or subproject
(including making folders for files not yet created)
• Create a reference document (README file) that notes the
purpose of different folder.
University of Massachusetts Medical School Library http://libraryguides.umassmed.edu/file_management

Research Documentation
• Grant proposals and related reports
• Applications and approvals (e.g. IRB)
• Codebooks, data dictionaries
• Consent forms
• Surveys, questionnaires, interview protocols
• Transcripts, hard copies of audio and video files
• Any software or code you used (no matter how
insignificant or buggy)

Three levels of
documentation
• Project level – what the study set out to do, research
questions, methods, sampling frames, instruments,
protocols, members of the research team
• File or database level – How all the files relate to
one another.A README file is a classic way of capturing
this information.
• Variable or item level – Full label explaining the
meaning of each variable.
http://datalib.edina.ac.uk/mantra/documentation_metadata_citation/

What goes in a codebook?
• Variable name
• Variable meaning
• Variable data types
• Precision of data
• Units
• Known issues with the data
• Relationships to other
variables
• Null values
• Anything else someone
needs to better understand
the data

Metadata
Unstructured
Data
Structured
Data
There was a study put out by Dr. Gary Bradshaw from
the University of Nebraska Medical Center in 1982
called “ Growth of Rodent Kidney Cells in Serum
Media and the Effect of Viral Transformation On
Growth”. It concerns the cytology of kidney cells.
Title Growth of rodent
kidney cells in serum
media and the effect of
viral transformations on
growth.
Author Gary Bradshaw
Date 1982
Publisher University of Nebraska
Medical Center
Subject Kidney -- Cytology

LOCKSS (Lots of
Copies Keeps
Stuff Safe)

Options for data
storage
• Personal computers or laptops
• Networked drives
• External storage devices

3-2-1 Backup Rule
Have 3 copies of your data
On 2 different media
In more than 1 physical location

Language from a DMP
“All data files will be stored on the University server that is backed
up nightly.The University's computing network is protected from
viruses by a firewall and anti-virus software. Digital recordings will
be copied to the server each day after interviews.
Signed consent forms will be stored in a locked cabinet in the
office. Interview recordings and transcripts, which may contain
personal information, will be password protected at file-level and
stored on the server.
Original versions of the files will always be kept on the server. If
copies of files are held on a laptop and edits made, their file names
will be changed.”

Archiving options
• Domain-specific repository
• General Purpose Data Repository
• Institutional repository

When you archive…
• Save the data in both its proprietary and non-proprietary
format (e.g. Excel and CSV; Microsoft Word and ASCII)
• Consider any restrictions on your data (copyright, patent,
privacy, etc.)
• When possible/mandated/desired, share your data online
with a persistent identifier (DOI or ARK)
• Include a data citation and state how you want to get
credit for your data
• Link your data to your publications as often as possible

Major takeaways
• Data management starts at the beginning of
a project
• Document your data so that someone else
could understand it
• Have more than one copy of your data
• Consider archiving options when you are
done with your project

Questions?
Rebekah Cummings
rebekah.cummings@utah.edu
(801) 581-7701
Marriott Library, 1705Y
…or ask now!

Data Management for Undergraduate Researchers (updated - 02/2016)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (13)

Similar to Data Management for Undergraduate Researchers (updated - 02/2016)

Similar to Data Management for Undergraduate Researchers (updated - 02/2016) (20)

More from Rebekah Cummings

More from Rebekah Cummings (20)

Recently uploaded

Recently uploaded (20)

Data Management for Undergraduate Researchers (updated - 02/2016)

Editor's Notes