Trailblazing in the Wilderness of Data Management

Trailblazing in the Wilderness of
Data Management
Where are we going and how do we get
there from here.
Stephanie Wright
Data Services Coordinator
University of Washington Libraries

Click to edit Master title style
AGENDA
• Definitions
• Why venture out
• Paths already taken
–Assessments of needs
–Existing programs
–Tools & resources
• Blazing your own trail
Montana State University – 21 June 2013

Definitions
• Data
• Data Management
• Big Data
• Long Tail of Data
• Acronyms
www.lib.washington.edu

Definitions
DATA
By data, we do not mean a synonym for information. We
mean research data, that which is collected, observed,
or created, for purposes of analyzing to produce
original research results.
Research data may be created in tabular, textual,
statistical, numeric, geospatial, image, multimedia
or other formats.
(Adapted from DISC-UK DataShare Project, p. 16)

Definitions
DATA
Data can be produced from a variety of processes
(e.g., observation, experimentation, simulation,
derivation, compilation), represented in numerous
forms and stored in many digital formats (e.g.,
ASCII, PDF, SPSS, Excel, TIFF, Java, FITS, CIF, ZVI)
The scope of this definition includes data from
disciplines in the sciences, social sciences, and
humanities.
(Adapted from MIT Libraries, “What is Data?”, 2009)

Definitions
DATA MANAGEMENT
Pertains to the collection, cleaning, storage, sharing,
access, disposal, preservation and/or archiving of
research data.
(Adapted from University of North Carolina, Research Data Stewardship
Report, 2012)

Definitions
BIG DATA
• Volume
• Velocity
• Variety
25 Definitions of Big Data:
http://www.opentracker.net/article/25-definitions-
big-data
– Now over 30 definitions

Definitions
LONG TAIL OF DATA
Image credit: disruptormonkey.typepad.com

Acronyms
• RDM – Research Data
Management
• IR – Institutional Repository
• DR – Data Repository
• DMP – Data Management Plan

Why Venture Out
• Funding agencies
• Universities
• Researchers
• Libraries
Image credit: National Park Service, Yellowstone photo collection,
(http://www.nps.gov/features/yell/slidefile/mammals/wolf/Images/15314.jpg)

Funding Agencies
• 1998: NSF
• 2003: NIH
• 2011: NSF
• 2013: NSF, OSTP, OMB, NIH

Universities
• Competitiveness
• Reduce duplication of effort
• Preserve the research record of the
institution
• Encourage innovation & discovery

Researchers
• Verifiability & reproducibility
• Increased citation rates for publications
– (Piwowar et al, 2007)
• Preservation of individual scholarly record
• Save time by planning early

Libraries
•Digital Preservation Network (DPN)
“The Digital Preservation Network is being
created by research-intensive universities to
ensure long-term preservation of the complete
digital scholarly record.”
http://d-p-n.org/

Libraries
NSF Proposal & Award Policies &
Procedures Guide (Oct 2012)
“Instructions for preparation of the
Biographical Sketch have been revised to
rename the "Publications" section to
"Products" ....
(P)roducts may include, but are not limited
to, publications, data sets, software,
patents, and copyrights.”

Paths Already Taken
• Assessments
• Existing programs
• Tools & Resources
Image credit: John W. Ridge
(http://commons.wikimedia.org/wiki/File:Yellowstone_Trail_Map.jpg)

Assessments
• UNC (2012) “Research Data Stewardship
Report”
• University of Colorado Boulder (2012)
“Research Data Management @ UCB”
• Purdue “Data Curation Profiles Directory”
(http://docs.lib.purdue.edu/dcp/)
• More: Georgia Tech, Cornell, Houston,
Oregon….

Findings
• Researchers use a wide variety of data
types – across disciplines
• Most researchers rely on themselves for
data management
• Researchers want to maintain control of
their data
• Many are unaware of existing services
• They want tools that work in existing
workflows

What’s Needed
• Creating & maintaining DMPs
• Best practices guidance all along lifecycle
• Storage
– Short-term access
– Long-term access
– Backup
– Versioning
– Security
• Metadata creation

Existing Programs
• Cornell
– Research Data Management Service Group
• Sr VP for Research and University Librarian
• Faculty Advisory Board
– 9 faculty across disciplines
– OSP & Office of Research Integrity & Assurance
• Management Council
– 2 librarians, 2 faculty, 2 IT, 1 research institute

Existing Programs
• Purdue
– D2C2: Distributed Data Curation Center
• Executive Committee
– Dean of Libraries, VP of Research & VP of IT
• Library: consulting & metadata support
• IT: storage & research computing support

Existing Programs
• University of Washington
– Data Services Program (1.5 FTE)
• Data Services Coordinator
• Data Services Communications & Curriculum Libn
– Data Services Team (10 members)
– Partnerships
• Research Centers (eSci, CSDE, IHME)
• Office of Research (OSP)
• Campus IT
• iSchool

Tools & Resources
• Data Mgmt Planning: DMPTool
• Metadata & Sharing: DataUP
• Sharing & Storage: DataBib
• Citation: EZID
• Best Practices: DMVitals

Blazing Your Own Trail
Image credit: Michigan State University Department of History,
HST 321: History of the American West
(http://history.msu.edu/hst321/files/2010/07/colter.jpg)

• Identify needs
• Consider potential partners
• Scope
– Disciplines
– Specific areas of the data lifecycle
• Determine priorities
– New services? Enhance existing? Market
existing?
Where do you want to go?

• Objective L1
– Assess and improve where needed, student
learning of critical knowledge & skills
• Objective D1
– Elevate the research excellence and
recognition of MSU faculty
• D1.2
• Objective D2
– Enhance infrastructure in support of research,
discovery and creative activities
MSU Strategic Plan

• Support for active data storage
• Data security guidance
• Backup services
• Development of tools that can be
inserted into existing workflows
Campus IT

• Guidance on legal / ethical
considerations
• Incorporate DM planning into
grant submission process
• New faculty data management
orientations
Office of Research

• Market and provide access to
existing RDM resources
• Provide learning opportunities on
RDM best practices
• DMP consultation
• Storage (final)
• Metadata consultation
Libraries

• University policy on data
management
• Integrate RDM activities into T&P
process
• Consider campus policy on open
data
University

Stephanie Wright
Data Services Coordinator
swright@uw.edu
@shefw
http://guides.lib.washington.edu/swright
Data Management Guide
http://guides.lib.washington.edu/dmg
ResearchWorks Data Services
http://researchworks.lib.washington.edu/rw-data.html

Trailblazing in the Wilderness of Data Management

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (12)

Similar to Trailblazing in the Wilderness of Data Management

Similar to Trailblazing in the Wilderness of Data Management (20)

Recently uploaded

Recently uploaded (20)

Trailblazing in the Wilderness of Data Management

Editor's Notes