Libraries and Research Data Curation: Barriers and Incentives for Preservation, Sharing, and Reuse

Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Libraries and Research Data Curation
Barriers and Incentives for Preservation, Sharing, and Reuse
Stephen Abrams
University of California Curation Center
California Digital Library
www.cdlib.org/uc3

Why is data curation important?
 Accelerating scientific progress
 Enabling appropriate scrutiny and verification of results
 Promoting integrity and debate
 Facilitating new collaborations
 Avoiding needless duplication of effort
 Increasingly, complying with institutional policies, publication
requirements, and funder mandates
Cf. White and Teds (2011), “Making the case for research data management” DCC briefing
paper, www.dcc.ac.uk/resources/briefing-papers/making-case-rdm

The library’s role
 A continuation of its long-standing mission and practice to
connect patrons with content of interest in meaningful ways
across barriers of space and time
Cf. Tenopir et al. (2012), “Academic librarians and research data services: Preparation and attitudes,” 78th
IFLA General Conference and Assembly, Helsinki, conference.ifla.org/past/ifla78/116-tenopir-en.pdf
 Offering solutions that enhance the natural points of
alignment between the scholarly research and information
lifecycles
Publish
Reuse
ShareCreate
Discover
Collect
PreserveAccessResearchResearch CurationCuration
Scholarly lifecycle Information lifecycle

Addressing barriers to adoption
 Critical issues on both the demand…
 Poor discovery
and supply side …
 Unfamiliar processes
 Loss of control
 Inadequate guidance
Cf. Schäfer et al. (2011), Baseline Report on Drivers and Barriers in Data Sharing, hdl:10013/epic.39262
 Better access to tools and resources
Embedded best practices
Data use agreements
Data management planning
Data publication and citation
n2t.net/ezid datashare.ucsf.edu merritt.cdlib.org dmptool.orgdataup.org

 Provide the same infrastructural support for data that exists
for traditional publications
 Unique, actionable identifiers
 Stable citation
 Bi-directional references between publications and the data that
underlay their analysis, synthesis, and summarization
 Discovery via disciplinary portals, catalogs, and web searches
 Use and impact metrics
www.flickr.com/photos/fotobib/5555065521 www.flickr.com/photos/minhmeoinfo/4597866532

 Provide the same infrastructural support for data that exists
for traditional publications
http://n2t.net/ezid
 ARK and DOI identifiers
 Descriptive metadata
 Resolution targets
 Aggregation by DataCite
(and soon) Primo and Web of Knowledge

 Data curation is an unfamiliar set of concepts, practices, and
jargon to most researchers
www.flickr.com/photos/vixon/116447718

 Data curation is an unfamiliar set of concepts, practices, and
jargon to most researchers
 It’s easier to augment systems than change behaviors
 Embed curation best practices into tools and workflows already
used by researchers
www.flickr.com/photos/34067077@N00/4576265327 www.flickr.com/photos/wealthofhealth4/6919840647

 Excel is often the database of choice for many researchers
 Excel add-in and Azure web service
 Automates …
 Best practices check
 Data description
 Persistent identifier and
citation generation
 Repository submission
http://dataup.cdlib.org/
2013 Innovation Award winner

 Automates …
citation generation
 ONEShare repository
http://merritt.cdlib.org/m/oneshare_dataup
http://n2t.net/ark:/90135/q13j39xf

 Automates …
citation generation
 DataONE federation
http://dataone.org/
http://cn.dataone.org/onemercury

 Automates …
citation generation
 DataONE federation
http://dataone.org/
 So you don’t need to know …
 Metadata schema
 XML syntax
 Identifier registration
 Packaging standards
 Submission protocol
 Aggregation/harvesting
mechanism

Data use agreements
 Maintain control over the dissemination of research results
through click-through DUAs
 Assert explicit license requirements and terms of use
 Notification of consumer acceptance
Cf. Brazhnik and Jones (2007), “Anatomy of data integration,” Journal of Biomedical Informatics 40(3): 252-
69, doi:10.1016/j.jbi.2006.09.001
http://datashare.ucsf.edu/

Data use agreements
69, doi:10.1016/j.jbi.2006.09.001
From: no-reply-merritt@ucop.edu
Subject:Merritt DUA acceptance
Name: Stephen Abrams
Affiliation: California Digital Library
Collection: UCSF DataShare
Object: Frontotemporal Lobar Degeneration (FTLD)
Date: 2013-05-3109:50:34PDT
Terms of use: As part of this agreement, Consumer submits to the following
statements:
(1) I will receive access to de-identified data and will not attempt to establish the
identity of any of the study subjects.
(2) I will share these data only with my immediate co-workers, and I will not transfer
these data to other research groups. I understand that these data are available to
other research groups through the process by which I obtain them.
(3) I will require anyone in my group who utilizes these data, or anyone with whom I
share these data to comply with this data use agreement
...
http://datashare.ucsf.edu/

Data use agreements
69, doi:10.1016/j.jbi.2006.09.001
 Next steps …
 Disciplinary survey of current DUA practice
 Collaborate with Creative Commons to establish “model” DUAs

 Researchers are being asked to plan for data curation by
institutional policy and as a pre-condition for publication and
grant funding
Cf. Office of Science and Technology Policy (2013), Increasing Access to the Results of Federally Funded
Scientific Research, www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_access_
memo_2013.pdf

 provides guidance and resources for managing plans
 Edit, publish, and share DMPs
 Customizable for funding agency requirements
 Customizable for general, disciplinary, and institutional resources
 19 requirement templates
 43 resource sets
 Next steps …
 DMPTool2: Follow-on
development –
Sloan Foundation
 Outreach and
training – IMLS
http://dmptool.org/
http://blog.dmptool.org/

Removing barriers, providing incentives
 “Access to and sharing of data are essential for the conduct
and advancement of science”
— Arzberger et al. (2004), “Promoting access to public research data for
scientific, economic, and social development,” Data Science Journal 3: 135-
52, doi:10.2481/dsj.3.135
 Libraries are a natural partner for the research community
 Deep and broad experience in the curation, preservation, and
dissemination of digital assets
 Subject area specialization in
science, technology, engineering, and mathematics
 Collaborations with campus IT groups and data centers

Removing barriers, providing incentives
 Libraries are a natural partner for the research community
 Effective discovery through … Data publication and citation
 Maintain control through … Data use agreements
 Familiar processes through … Embedded best practices
 Guidance and resources through …Data management planning
www.slideshare.net/UC3/uc3-librariesandcurationbarriersandincentives
www.cdlib.org/uc3
uc3@ucop.edu
n2t.net/ezid datashare.ucsf.edu merritt.cdlib.org dmptool.orgdataup.org

Libraries and Research Data Curation: Barriers and Incentives for Preservation, Sharing, and Reuse

More Related Content

What's hot

Viewers also liked

Similar to Libraries and Research Data Curation: Barriers and Incentives for Preservation, Sharing, and Reuse

More from University of California Curation Center

Recently uploaded

Libraries and Research Data Curation: Barriers and Incentives for Preservation, Sharing, and Reuse

Editor's Notes