Data Management for Graduate Students

Data Management for
Graduate Students
Marriott Library Graduate Student Workshop Series
Rebekah Cummings, Research Data Management Librarian
J. Willard Marriott Library, University of Utah
September 27, 2016

• Introductions
• What are data?
• Why manage data?
• Data Management Plans
• Data Organization
• Metadata
• Storage and Archiving
• Questions
In the next hour…

What is data management?
Activities and practices that support long-
term preservation, access, and use of data

What are data?
“The recorded factual material
commonly accepted in the research
community as necessary to validate
research findings.”
- U.S. OMB Circular A-110

We manage data first and
foremost for ourselves

Why else manage data?
• Meet grant and journal
requirements
• Promote reproducible research
• Enable new discoveries from
your data

We are trying to avoid
this scenario…

Two bears data
management problems
1. Didn’t know where he stored the data
2. Saved one copy of the data on a USB drive
3. Data was in a format that could only be read by
outdated, proprietary software
4. No codebook to explain the variable names
5. Variable names were not descriptive
6. No contact information for the co-author Sam Lee

Data Management Plans
• What data are generated by your research?
• What is your plan for managing the data?
• How will your data be shared?

Elements of a DMP
• Types of data, including file formats
• Data description
• Data storage
• Data sharing, including confidentiality or
security restrictions
• Data archiving and responsibility
• Data management costs

MyData.xls
MeetingNotes.doc
Presentation.ppt
Assignment1.pdf

File naming best practices
1. Be descriptive not
generic
2. Appropriate length
(about 25 chars or less)
3. Be consistent
4. Think critically about
your file names

File naming best practices
• Files should include only letters,
numbers, and underscores/dashes.
• No special characters.
• No spaces; Use dashes, underscores, or
camel case (likeThis).
• Avoid case dependency.Assume this,
THIS, and tHiS are the same.
• Have a strategy for version control.
• Don’t overwrite file extensions

Version Control - Numbering
001
002
003
009
010
099
Use leading zeros for
scalability
Bonus Tip: Use ordinal numbers (v1,v2,v3) for major version
changes and decimals for minor changes (v1.1, v2.6)
1
10
2
3
9
99

Version Control - Dates
If using dates useYYYYMMDD
June2015 = BAD!
06-18-2015 = BAD!
20150618 = GREAT!
2015-06-18 = This is fine too 

From a DMP…
“Each file name, for all types of data, will
contain the project acronym PUCCUK; a
reference to the file content (survey,
interview, media) and the date of an event
(such as the date of an interview).

• PLPP_EvaluationData_Workshop2_2014.xlsx
• MyData.xlsx
• publiclibrarypartnershipsprojectevaluationdataw
orkshop22014CummingsHelenaMontana.xlsx
Who filed better?

Who filed better?
• July 24 2014_SoilSamples%_v6
• 20140724_NSF_SoilSamples_Cummings
• SoilSamples_FINAL

Structuring folders and files
• Consider all the types of files you will handle during the course
of your project.
• Develop a nested folder structure that makes sense for your
project and your team’s retrieval needs.
• Name folders clearly, without special characters.
• Use a standard folder structure for each project or subproject
(including making folders for files not yet created)
• Create a reference document (README file) that notes the
purpose of different folder.
University of Massachusetts Medical School Library http://libraryguides.umassmed.edu/file_management

Research Documentation
• Grant proposals and related reports
• Applications and approvals (e.g. IRB)
• Codebooks, data dictionaries
• Consent forms
• Surveys, questionnaires, interview protocols
• Transcripts, hard copies of audio and video files
• Any software or code you used (no matter how
insignificant or buggy)

What goes in a codebook?
• Variable name
• Variable meaning
• Variable data types
• Precision of data
• Units
• Known issues with the data
• Relationships to other
variables
• Null values
• Anything else someone
needs to better understand
the data

Metadata
Unstructured
Data
Structured
Data
There was a study put out by Dr. Gary Bradshaw from
the University of Nebraska Medical Center in 1982
called “ Growth of Rodent Kidney Cells in Serum
Media and the Effect of Viral Transformation On
Growth”. It concerns the cytology of kidney cells.
Title Growth of rodent
kidney cells in serum
media and the effect of
viral transformations on
growth.
Author Gary Bradshaw
Date 1982
Publisher University of Nebraska
Medical Center
Subject Kidney -- Cytology

At the very least…
• Title
• Creator
• Description
• Date
• Type
• Publisher
• Format
• Identifier (DOI)
• Rights
• Any other critical
information to understand
or cite the data.

LOCKSS (Lots of
Copies Keeps
Stuff Safe)

Options for data
storage
• Personal computers or laptops
• Networked drives
• External storage devices

3-2-1 Backup Rule
Have 3 copies of your data
On 2 different media
In more than 1 physical location

Language from a DMP
“All data files will be stored on the University server that is backed
up nightly.The University's computing network is protected from
viruses by a firewall and anti-virus software. Digital recordings will
be copied to the server each day after interviews.
Signed consent forms will be stored in a locked cabinet in the
office. Interview recordings and transcripts, which may contain
personal information, will be password protected at file-level and
stored on the server.
Original versions of the files will always be kept on the server. If
copies of files are held on a laptop and edits made, their file names
will be changed.”

Archiving options
• Domain-specific repository
• General Purpose Data Repository
• Institutional repository

When you archive…
• Save the data in both its proprietary and non-proprietary
format (e.g. Excel and CSV; Microsoft Word and ASCII)
• Consider any restrictions on your data (copyright, patent,
privacy, etc.)
• When possible/mandated/desired, share your data online
with a persistent identifier (DOI or ARK)
• Include a data citation and state how you want to get
credit for your data
• Link your data to your publications as often as possible

Your data librarians
Daureen Nesdill
Research Data
Management
Librarian,
Sciences
Darell Schmick
Research
Librarian, Health
Sciences
Rebekah Cummings
Research Data
Management
Librarian, Social
Sciences &
Humanities

Major takeaways
• Data management starts at the beginning of
a project
• Document your data so that someone else
could understand it
• Have more than one copy of your data
• Consider archiving options when you are
done with your project

Questions?
Rebekah Cummings
rebekah.cummings@utah.edu
(801) 581-7701
Marriott Library, 1705Y
…or ask now!

Data Management for Graduate Students

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to Data Management for Graduate Students

Similar to Data Management for Graduate Students (20)

Recently uploaded

Recently uploaded (20)

Data Management for Graduate Students

Editor's Notes