Data managementbasics issr_20130301

3/01/13
Data Management

Data Management Basics
Basics
A Workshop for Graduate Students
March 1, 2013

1

WHY MANAGE DATA?

Data Management Basics 3/01/13
2

1. Funders Require It
• National Institutes of Health: Data Sharing Policy (2003)
• All grants funded at $500K or above must include a Data Sharing Plan

3/01/13
• National Science Foundation: Data Management Plan Requirement
(2011)
• All proposals must submit a 2 pp supplementary “Data Management Plan” to

describe how projects will comply with NSF data sharing policy

• National Endowment for the Humanities: Sustainability and Data
Management Plans Requirement (2012)
• Digital Humanities Implementation Grants must include a plan to discuss how
data will be managed, disseminated, and preserved

• OSTP Directive to Funding Agencies (2013)
• Federal agencies with more than $100M in R&D expenditures must ensure 3
that published results of federally funded research are freely available to the
public within one year of publication -- including data

National Science Foundation
• Data Management Plan Requirement
• How projects will conform to NSF data sharing policy
• Flexible

3/01/13
• “The plan should reflect best practices in your area of research, and
should be appropriate to the data you generate.”

• Directorate for Social, Behavioral and Economic Sciences
• Discipline-specific guidelines
• Archeology (Digital Archeological Record)
• Economics (American Economic Association)

• Universals (for the NSF Universe)
• What data are generated by your research? 4
• What is your plan for managing the data?

2. It Makes Life Easier
• For you…
• Increases efficiency
• Easier to understand the data collected throughout the life cycle of the
project

3/01/13
• Easier to find the data that you need throughout the life cycle of the
project
•

Satisfies applicable legal obligations
• Addresses preservation, documentation, verification issues
• Helps reviewers understand the characteristics of your data
• Increases citation rates for articles

• For others…
• Provides continuity – other researchers can build on your data
• Enhances longevity and usability
• Facilitates new discoveries
• Supports open access 5

3. It’s the Right Thing To Do
Responsible Conduct of Research/Research Ethics
• Data Acquisition, Management, Sharing and Ownership
• Using the appropriate research method

3/01/13
• Providing attention to detail
• Obtaining appropriate permissions

• Recording data accurately and securely
• Maintaining data to allow it to confirm research findings,
establish priority, and be reanalyzed by other researchers.
• Storing data to protect confidentiality, be secure from physical and
electronic damage, destruction or theft, and be maintained for the
appropriate time frame dictated by sponsor and University policies.

Compliance
• Research using Human Subjects (Institutional Review Board) 6

3/01/13
Naming Your files
Organizing Your Data
Backup and Storage
Post-Project Considerations

SMART DATA PRACTICES 7

• Getting Started
• Consider your goals
• What do you want to get out of managing your data?

3/01/13
• What is the most efficient way to organize your data?
• Figure out your criteria for keeping data

• Think about where you want your data to end up

8

3/01/13
filename = chief identifier for a

research data file

9

Organization

3/01/13
File

naming
and
labeling
Consistency Context

10

Some potential components for
your file naming strategy
• Version number

3/01/13
• Date of creation
• Name of creator

• Description of content
• Name of individual/research team/department
• Publication date
• Project number

11


3/01/13
12
W. E. B. Du Bois, Niagara delegate meeting, Boston, 1907. W. E. B. Du Bois Papers (MS 312). Special
Collections and University Archives, University Libraries, University of Massachusetts Amherst

• Let’s Clean Up Those File Names
• abcdefghijklmnopqrstuvwxyz.jpg
• doesn’t make much sense, does it?

3/01/13
• How about:

• 20120925_credo_du_bois_rrz_001.jpg

• And I put it in a directory called:
• credo_du_bois

13

• Why this structure?
• Oh, I just made it up! But I’m going to be consistent
• 20120925 = date I found the image

3/01/13
• credo = database/collection where I found the image
• du_bois = image subject

• rrz = my initials (I am working in a group!)
• 001 = an accession number (I made that up, too, but I’ll continue to
use that schema)

14

BAD naming practices
• Using generic data file names that may conflict when moved
from one location to another
• Failing to think about scale

3/01/13
• Using special characters in a filename such as:

&*%$£]{!@

15

Versioning
• Use ordinal numbers (1,2,3) for major version changes and the
decimal for minor changes: v1, v1.1, v2.6
• Beware of using confusing labels: revision, final, final2,

3/01/13
definitive_copy
• Discard or delete obsolete versions

• Use an auto-backup facility (if available) rather than saving or
archiving multiple versions
• Turn on versioning or tracking in collaborative documents or
storage utilities such as Wikis, GoogleDocs, etc.

16

Quiz! File naming by date
What is the best filename?
A. 2012-09-25_Attachment

3/01/13
B. 25 September 2012 Attachment
C. 25092012attch

17

Quiz! File naming by description
What is the best filename?
A. dubois_great_barrington_recent_20120925_old
version.docx

3/01/13
B. 2012-09-25_dubois_great_barrington_V1.docx

C. FFTX_2365498_old.docx

18

• Organizational methods
• Hierarchical
• Tag-based

3/01/13
• Retrieval “Very little skill is

• Location-based needed to actually be
• Search-based organized and
efficient…. just the
consciousness to put
this file or folder in the
right place.”
19

Use folders!

3/01/13
DuBois
DuBois_Images

DuBois_Images/1868-1898/
DuBois_Images/1898-1928/
DuBois_Letters
DuBois_Letters/1868-1898/
DuBois_Letters/1898-1928/
DuBois_Newspapers/

etc.

20

Archive what you don’t or won’t
need
• Decide what your final data sets are
• Once your project is over, weed out obsolete data and decide
what you want to keep for the long-term

3/01/13
• Move files and folders to an ‘Archive’ or ‘Old files’ folder
• z_archive

21

Backup and Storage

3/01/13
22
January 2011: “Stolen laptop contains cancer cure data”

Backup and Storage
• Backup is an essential component of data management
• Prevent against accidental or malicious data loss
• Restore original data

3/01/13
• Keep 3 copies

Original
• Consider
• How much?
• How frequently?
• Which media? External External
Local Remote
• Synchronization

23
• Test your system

Backup and Storage
• Accessibility of data depends on storage media and file format
• Vulnerable to deterioration
• Become obsolete over time

3/01/13
• Plan for disruption

Original
• Consider
• Non-proprietary
file formats
• Different media types External External
in storage strategy Local Remote
• Migrate data
• Unencrypted, 24
uncompressed

Backup and Storage
• Security
• Encryption can be used for safely moving or storing files,
• Encrypting files on storage devices (flash drives)

3/01/13
• Encryption during file transfer (ie: WinSCP)
• Encrypted storage services

• Deleting Data
• Weed out obsolete data and decide what you want to keep for
the long-term
• Deleting files does not delete files

• Other things to Consider
• How will the data be used? 25
• Who pays for storage?

Post-Project Activities
• Publication? Sharing?
• Intellectual Property
• Copyright

3/01/13
• Creative Commons

• Platforms?
• ScholarWorks@UMass Amherst
• ICPSR

• Copyright & Information Policy Librarian
Laura Quilter
lquilter@library.umass.edu
26

Data Management is About
Planning
Data management will:
• Prevent bad things

3/01/13
from happening to Collection Description
your data;

• Make you a more Storage
Access
efficient researcher; and Backup
• Prepare you for
grant management.
27

Data Management Plans
NSF

• The types of data;

3/01/13
• The standards to be used for data and metadata format and
content ;

• The policies for access and sharing;
• The policies and provisions for re-use, re-distribution, and the
production of derivatives; and
• The plans for archiving and for preservation of access.

28

RESOURCES

Data Management Basics 3/01/13
29

Planning
• Data Working Group (email datamanagement@library.umass.edu)
• Digital projects
• Long-term preservation

3/01/13
• Assessment
• Web resources

• UMass Amherst Libraries: General Resources
(http://guides.library.umass.edu/datamanagement)
• Discipline-specific
• Your faculty
• Your mentors
• Your professional associations
• Industry partners
• Public engagement
30

Backup and Storage
• Storage
• Udrive (http://www.oit.umass.edu/udrive )
• Departmental servers
• CDs/DVDs/external hard drives

3/01/13
• Filesharing (see http://chronicle.com/blogs/profhacker/protecting-your-data/37350)
• Dropbox

• Google Docs

• Cloud Storage
• Amazon Web Services
• Rackspace
• Microsoft Azure
• Sugar Sync

• Additional Information
• MIT on Backups and Security
http://libraries.mit.edu/guides/subjects/data-management/backups.html
• UK Data Archive on Data Storage 31
http://www.data-archive.ac.uk/create-manage/storage
• UK Preservation Office “Caring for CDs and DVDs”
http://www.bl.uk/blpac/pdf/cd.pdf

Tools
Information Management Desktop Search Tools
• Devonthink • Windows Search
http://www.devontechnologies.com http://www.microsoft.com/en-
• Yojimbo us/download/details.aspx?id=23

3/01/13
http://www.barebones.com/products/yojimbo • UltraSearch
• EverNote http://www.jam-software.com/ultrasearch/
http://www.evernote.com/about/home.php • Locate 32

• Scribe (Mac, Windows, Free) http://locate32.cogit.net/
http://chnm.gmu.edu/tools/scribe/ Tagging Tools
• Springpad • Tabbles
http://springpadit.com/home http://tabbles.net/
Citation Management • TaggTool
• Mendeley http://www.taggtool.com/index.php
http://www.mendeley.com/features/ • TaggedFrog
• Zotero http://lunarfrog.com/taggedfrog/
http://www.zotero.org/ Tool Directories
• RefWorks • Bamboo DiRT
http://guides.library.umass.edu/refworksatum http://dirt.projectbamboo.org/
ass • CHNM Research + Tools
32
http://chnm.gmu.edu/research-and-tools/

Sources
• MIT Data Management
(http://libraries.mit.edu/guides/subjects/data-management/)
• UK Data Archive

3/01/13
(http://www.data-archive.ac.uk/)
• MANTRA

(http://datalib.edina.ac.uk/mantra/organisingdata.html)
• Creating Order from Chaos: 9 Great Ideas for Managing Your
Computer Files
(http://www.makeuseof.com/tag/creating-order-chaos-9-
great-ideas-managing-computer-files/)
• Research Information Management: Tools for the Humanities
(http://sudamih.oucs.ox.ac.uk/docs/Generic%20Courses/Tools
%20for%20the%20Humanities%20course%20book.docx)
33

Questions/contact
datamanagement@library.umass.edu

3/01/13
34

Data managementbasics issr_20130301

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Data managementbasics issr_20130301

Similar to Data managementbasics issr_20130301 (20)

Data managementbasics issr_20130301

Editor's Notes