Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Libraries and Research Data Curation
Barriers and Incentives for Preservation, Sharing, and Reuse
Stephen Abrams
University of California Curation Center
California Digital Library
www.cdlib.org/uc3
Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Why is data curation important?
 Accelerating scientific progress
 Enabling appropriate scrutiny and verification of results
 Promoting integrity and debate
 Facilitating new collaborations
 Avoiding needless duplication of effort
 Increasingly, complying with institutional policies, publication
requirements, and funder mandates
Cf. White and Teds (2011), “Making the case for research data management” DCC briefing
paper, www.dcc.ac.uk/resources/briefing-papers/making-case-rdm
Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
The library’s role
 A continuation of its long-standing mission and practice to
connect patrons with content of interest in meaningful ways
across barriers of space and time
Cf. Tenopir et al. (2012), “Academic librarians and research data services: Preparation and attitudes,” 78th
IFLA General Conference and Assembly, Helsinki, conference.ifla.org/past/ifla78/116-tenopir-en.pdf
 Offering solutions that enhance the natural points of
alignment between the scholarly research and information
lifecycles
Publish
Reuse
ShareCreate
Discover
Collect
PreserveAccessResearchResearch CurationCuration
Scholarly lifecycle Information lifecycle
Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Addressing barriers to adoption
 Critical issues on both the demand…
 Poor discovery
and supply side …
 Unfamiliar processes
 Loss of control
 Inadequate guidance
Cf. Schäfer et al. (2011), Baseline Report on Drivers and Barriers in Data Sharing, hdl:10013/epic.39262
 Better access to tools and resources
Embedded best practices
Data use agreements
Data management planning
Data publication and citation
n2t.net/ezid datashare.ucsf.edu merritt.cdlib.org dmptool.orgdataup.org
Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Data publication and citation
 Provide the same infrastructural support for data that exists
for traditional publications
 Unique, actionable identifiers
 Stable citation
 Bi-directional references between publications and the data that
underlay their analysis, synthesis, and summarization
 Discovery via disciplinary portals, catalogs, and web searches
 Use and impact metrics
www.flickr.com/photos/fotobib/5555065521 www.flickr.com/photos/minhmeoinfo/4597866532
Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Data publication and citation
 Provide the same infrastructural support for data that exists
for traditional publications
http://n2t.net/ezid
 ARK and DOI identifiers
 Descriptive metadata
 Resolution targets
 Aggregation by DataCite
(and soon) Primo and Web of Knowledge
Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Embedded best practices
 Data curation is an unfamiliar set of concepts, practices, and
jargon to most researchers
www.flickr.com/photos/vixon/116447718
Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Embedded best practices
 Data curation is an unfamiliar set of concepts, practices, and
jargon to most researchers
 It’s easier to augment systems than change behaviors
 Embed curation best practices into tools and workflows already
used by researchers
www.flickr.com/photos/34067077@N00/4576265327 www.flickr.com/photos/wealthofhealth4/6919840647
Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Embedded best practices
 Excel is often the database of choice for many researchers
 Excel add-in and Azure web service
 Automates …
 Best practices check
 Data description
 Persistent identifier and
citation generation
 Repository submission
http://dataup.cdlib.org/
2013 Innovation Award winner
Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Embedded best practices
 Excel is often the database of choice for many researchers
 Excel add-in and Azure web service
 Automates …
 Best practices check
 Data description
 Persistent identifier and
citation generation
 Repository submission
 ONEShare repository
http://merritt.cdlib.org/m/oneshare_dataup
http://n2t.net/ark:/90135/q13j39xf
2013 Innovation Award winner
Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Embedded best practices
 Excel is often the database of choice for many researchers
 Excel add-in and Azure web service
 Automates …
 Best practices check
 Data description
 Persistent identifier and
citation generation
 Repository submission
 ONEShare repository
http://merritt.cdlib.org/m/oneshare_dataup
 DataONE federation
http://dataone.org/
http://cn.dataone.org/onemercury
2013 Innovation Award winner
Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Embedded best practices
 Excel is often the database of choice for many researchers
 Excel add-in and Azure web service
 Automates …
 Best practices check
 Data description
 Persistent identifier and
citation generation
 Repository submission
 ONEShare repository
http://merritt.cdlib.org/m/oneshare_dataup
 DataONE federation
http://dataone.org/
 So you don’t need to know …
 Metadata schema
 XML syntax
 Identifier registration
 Packaging standards
 Submission protocol
 Aggregation/harvesting
mechanism
2013 Innovation Award winner
Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Data use agreements
 Maintain control over the dissemination of research results
through click-through DUAs
 Assert explicit license requirements and terms of use
 Notification of consumer acceptance
Cf. Brazhnik and Jones (2007), “Anatomy of data integration,” Journal of Biomedical Informatics 40(3): 252-
69, doi:10.1016/j.jbi.2006.09.001
http://datashare.ucsf.edu/
Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Data use agreements
 Maintain control over the dissemination of research results
through click-through DUAs
 Assert explicit license requirements and terms of use
 Notification of consumer acceptance
Cf. Brazhnik and Jones (2007), “Anatomy of data integration,” Journal of Biomedical Informatics 40(3): 252-
69, doi:10.1016/j.jbi.2006.09.001
http://datashare.ucsf.edu/
Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Data use agreements
 Maintain control over the dissemination of research results
through click-through DUAs
 Assert explicit license requirements and terms of use
 Notification of consumer acceptance
Cf. Brazhnik and Jones (2007), “Anatomy of data integration,” Journal of Biomedical Informatics 40(3): 252-
69, doi:10.1016/j.jbi.2006.09.001
From: no-reply-merritt@ucop.edu
Subject:Merritt DUA acceptance
Name: Stephen Abrams
Affiliation: California Digital Library
Collection: UCSF DataShare
Object: Frontotemporal Lobar Degeneration (FTLD)
Date: 2013-05-3109:50:34PDT
Terms of use: As part of this agreement, Consumer submits to the following
statements:
(1) I will receive access to de-identified data and will not attempt to establish the
identity of any of the study subjects.
(2) I will share these data only with my immediate co-workers, and I will not transfer
these data to other research groups. I understand that these data are available to
other research groups through the process by which I obtain them.
(3) I will require anyone in my group who utilizes these data, or anyone with whom I
share these data to comply with this data use agreement
...
http://datashare.ucsf.edu/
Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Data use agreements
 Maintain control over the dissemination of research results
through click-through DUAs
 Assert explicit license requirements and terms of use
 Notification of consumer acceptance
Cf. Brazhnik and Jones (2007), “Anatomy of data integration,” Journal of Biomedical Informatics 40(3): 252-
69, doi:10.1016/j.jbi.2006.09.001
 Next steps …
 Disciplinary survey of current DUA practice
 Collaborate with Creative Commons to establish “model” DUAs
Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Data management planning
 Researchers are being asked to plan for data curation by
institutional policy and as a pre-condition for publication and
grant funding
Cf. Office of Science and Technology Policy (2013), Increasing Access to the Results of Federally Funded
Scientific Research, www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_access_
memo_2013.pdf
Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Data management planning
 provides guidance and resources for managing plans
 Edit, publish, and share DMPs
 Customizable for funding agency requirements
 Customizable for general, disciplinary, and institutional resources
 19 requirement templates
 43 resource sets
 Next steps …
 DMPTool2: Follow-on
development –
Sloan Foundation
 Outreach and
training – IMLS
http://dmptool.org/
http://blog.dmptool.org/
Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Removing barriers, providing incentives
 “Access to and sharing of data are essential for the conduct
and advancement of science”
— Arzberger et al. (2004), “Promoting access to public research data for
scientific, economic, and social development,” Data Science Journal 3: 135-
52, doi:10.2481/dsj.3.135
 Libraries are a natural partner for the research community
 Deep and broad experience in the curation, preservation, and
dissemination of digital assets
 Subject area specialization in
science, technology, engineering, and mathematics
 Collaborations with campus IT groups and data centers
Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Removing barriers, providing incentives
 Libraries are a natural partner for the research community
 Effective discovery through … Data publication and citation
 Maintain control through … Data use agreements
 Familiar processes through … Embedded best practices
 Guidance and resources through …Data management planning
www.slideshare.net/UC3/uc3-librariesandcurationbarriersandincentives
www.cdlib.org/uc3
uc3@ucop.edu
n2t.net/ezid datashare.ucsf.edu merritt.cdlib.org dmptool.orgdataup.org

Libraries and Research Data Curation: Barriers and Incentives for Preservation, Sharing, and Reuse

  • 1.
    Future of ScientificPublishing: Open Access to Manuscripts and Big Data Stanford University, June 27, 2013 Libraries and Research Data Curation Barriers and Incentives for Preservation, Sharing, and Reuse Stephen Abrams University of California Curation Center California Digital Library www.cdlib.org/uc3
  • 2.
    Future of ScientificPublishing: Open Access to Manuscripts and Big Data Stanford University, June 27, 2013 Why is data curation important?  Accelerating scientific progress  Enabling appropriate scrutiny and verification of results  Promoting integrity and debate  Facilitating new collaborations  Avoiding needless duplication of effort  Increasingly, complying with institutional policies, publication requirements, and funder mandates Cf. White and Teds (2011), “Making the case for research data management” DCC briefing paper, www.dcc.ac.uk/resources/briefing-papers/making-case-rdm
  • 3.
    Future of ScientificPublishing: Open Access to Manuscripts and Big Data Stanford University, June 27, 2013 The library’s role  A continuation of its long-standing mission and practice to connect patrons with content of interest in meaningful ways across barriers of space and time Cf. Tenopir et al. (2012), “Academic librarians and research data services: Preparation and attitudes,” 78th IFLA General Conference and Assembly, Helsinki, conference.ifla.org/past/ifla78/116-tenopir-en.pdf  Offering solutions that enhance the natural points of alignment between the scholarly research and information lifecycles Publish Reuse ShareCreate Discover Collect PreserveAccessResearchResearch CurationCuration Scholarly lifecycle Information lifecycle
  • 4.
    Future of ScientificPublishing: Open Access to Manuscripts and Big Data Stanford University, June 27, 2013 Addressing barriers to adoption  Critical issues on both the demand…  Poor discovery and supply side …  Unfamiliar processes  Loss of control  Inadequate guidance Cf. Schäfer et al. (2011), Baseline Report on Drivers and Barriers in Data Sharing, hdl:10013/epic.39262  Better access to tools and resources Embedded best practices Data use agreements Data management planning Data publication and citation n2t.net/ezid datashare.ucsf.edu merritt.cdlib.org dmptool.orgdataup.org
  • 5.
    Future of ScientificPublishing: Open Access to Manuscripts and Big Data Stanford University, June 27, 2013 Data publication and citation  Provide the same infrastructural support for data that exists for traditional publications  Unique, actionable identifiers  Stable citation  Bi-directional references between publications and the data that underlay their analysis, synthesis, and summarization  Discovery via disciplinary portals, catalogs, and web searches  Use and impact metrics www.flickr.com/photos/fotobib/5555065521 www.flickr.com/photos/minhmeoinfo/4597866532
  • 6.
    Future of ScientificPublishing: Open Access to Manuscripts and Big Data Stanford University, June 27, 2013 Data publication and citation  Provide the same infrastructural support for data that exists for traditional publications http://n2t.net/ezid  ARK and DOI identifiers  Descriptive metadata  Resolution targets  Aggregation by DataCite (and soon) Primo and Web of Knowledge
  • 7.
    Future of ScientificPublishing: Open Access to Manuscripts and Big Data Stanford University, June 27, 2013 Embedded best practices  Data curation is an unfamiliar set of concepts, practices, and jargon to most researchers www.flickr.com/photos/vixon/116447718
  • 8.
    Future of ScientificPublishing: Open Access to Manuscripts and Big Data Stanford University, June 27, 2013 Embedded best practices  Data curation is an unfamiliar set of concepts, practices, and jargon to most researchers  It’s easier to augment systems than change behaviors  Embed curation best practices into tools and workflows already used by researchers www.flickr.com/photos/34067077@N00/4576265327 www.flickr.com/photos/wealthofhealth4/6919840647
  • 9.
    Future of ScientificPublishing: Open Access to Manuscripts and Big Data Stanford University, June 27, 2013 Embedded best practices  Excel is often the database of choice for many researchers  Excel add-in and Azure web service  Automates …  Best practices check  Data description  Persistent identifier and citation generation  Repository submission http://dataup.cdlib.org/ 2013 Innovation Award winner
  • 10.
    Future of ScientificPublishing: Open Access to Manuscripts and Big Data Stanford University, June 27, 2013 Embedded best practices  Excel is often the database of choice for many researchers  Excel add-in and Azure web service  Automates …  Best practices check  Data description  Persistent identifier and citation generation  Repository submission  ONEShare repository http://merritt.cdlib.org/m/oneshare_dataup http://n2t.net/ark:/90135/q13j39xf 2013 Innovation Award winner
  • 11.
    Future of ScientificPublishing: Open Access to Manuscripts and Big Data Stanford University, June 27, 2013 Embedded best practices  Excel is often the database of choice for many researchers  Excel add-in and Azure web service  Automates …  Best practices check  Data description  Persistent identifier and citation generation  Repository submission  ONEShare repository http://merritt.cdlib.org/m/oneshare_dataup  DataONE federation http://dataone.org/ http://cn.dataone.org/onemercury 2013 Innovation Award winner
  • 12.
    Future of ScientificPublishing: Open Access to Manuscripts and Big Data Stanford University, June 27, 2013 Embedded best practices  Excel is often the database of choice for many researchers  Excel add-in and Azure web service  Automates …  Best practices check  Data description  Persistent identifier and citation generation  Repository submission  ONEShare repository http://merritt.cdlib.org/m/oneshare_dataup  DataONE federation http://dataone.org/  So you don’t need to know …  Metadata schema  XML syntax  Identifier registration  Packaging standards  Submission protocol  Aggregation/harvesting mechanism 2013 Innovation Award winner
  • 13.
    Future of ScientificPublishing: Open Access to Manuscripts and Big Data Stanford University, June 27, 2013 Data use agreements  Maintain control over the dissemination of research results through click-through DUAs  Assert explicit license requirements and terms of use  Notification of consumer acceptance Cf. Brazhnik and Jones (2007), “Anatomy of data integration,” Journal of Biomedical Informatics 40(3): 252- 69, doi:10.1016/j.jbi.2006.09.001 http://datashare.ucsf.edu/
  • 14.
    Future of ScientificPublishing: Open Access to Manuscripts and Big Data Stanford University, June 27, 2013 Data use agreements  Maintain control over the dissemination of research results through click-through DUAs  Assert explicit license requirements and terms of use  Notification of consumer acceptance Cf. Brazhnik and Jones (2007), “Anatomy of data integration,” Journal of Biomedical Informatics 40(3): 252- 69, doi:10.1016/j.jbi.2006.09.001 http://datashare.ucsf.edu/
  • 15.
    Future of ScientificPublishing: Open Access to Manuscripts and Big Data Stanford University, June 27, 2013 Data use agreements  Maintain control over the dissemination of research results through click-through DUAs  Assert explicit license requirements and terms of use  Notification of consumer acceptance Cf. Brazhnik and Jones (2007), “Anatomy of data integration,” Journal of Biomedical Informatics 40(3): 252- 69, doi:10.1016/j.jbi.2006.09.001 From: no-reply-merritt@ucop.edu Subject:Merritt DUA acceptance Name: Stephen Abrams Affiliation: California Digital Library Collection: UCSF DataShare Object: Frontotemporal Lobar Degeneration (FTLD) Date: 2013-05-3109:50:34PDT Terms of use: As part of this agreement, Consumer submits to the following statements: (1) I will receive access to de-identified data and will not attempt to establish the identity of any of the study subjects. (2) I will share these data only with my immediate co-workers, and I will not transfer these data to other research groups. I understand that these data are available to other research groups through the process by which I obtain them. (3) I will require anyone in my group who utilizes these data, or anyone with whom I share these data to comply with this data use agreement ... http://datashare.ucsf.edu/
  • 16.
    Future of ScientificPublishing: Open Access to Manuscripts and Big Data Stanford University, June 27, 2013 Data use agreements  Maintain control over the dissemination of research results through click-through DUAs  Assert explicit license requirements and terms of use  Notification of consumer acceptance Cf. Brazhnik and Jones (2007), “Anatomy of data integration,” Journal of Biomedical Informatics 40(3): 252- 69, doi:10.1016/j.jbi.2006.09.001  Next steps …  Disciplinary survey of current DUA practice  Collaborate with Creative Commons to establish “model” DUAs
  • 17.
    Future of ScientificPublishing: Open Access to Manuscripts and Big Data Stanford University, June 27, 2013 Data management planning  Researchers are being asked to plan for data curation by institutional policy and as a pre-condition for publication and grant funding Cf. Office of Science and Technology Policy (2013), Increasing Access to the Results of Federally Funded Scientific Research, www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_access_ memo_2013.pdf
  • 18.
    Future of ScientificPublishing: Open Access to Manuscripts and Big Data Stanford University, June 27, 2013 Data management planning  provides guidance and resources for managing plans  Edit, publish, and share DMPs  Customizable for funding agency requirements  Customizable for general, disciplinary, and institutional resources  19 requirement templates  43 resource sets  Next steps …  DMPTool2: Follow-on development – Sloan Foundation  Outreach and training – IMLS http://dmptool.org/ http://blog.dmptool.org/
  • 19.
    Future of ScientificPublishing: Open Access to Manuscripts and Big Data Stanford University, June 27, 2013 Removing barriers, providing incentives  “Access to and sharing of data are essential for the conduct and advancement of science” — Arzberger et al. (2004), “Promoting access to public research data for scientific, economic, and social development,” Data Science Journal 3: 135- 52, doi:10.2481/dsj.3.135  Libraries are a natural partner for the research community  Deep and broad experience in the curation, preservation, and dissemination of digital assets  Subject area specialization in science, technology, engineering, and mathematics  Collaborations with campus IT groups and data centers
  • 20.
    Future of ScientificPublishing: Open Access to Manuscripts and Big Data Stanford University, June 27, 2013 Removing barriers, providing incentives  Libraries are a natural partner for the research community  Effective discovery through … Data publication and citation  Maintain control through … Data use agreements  Familiar processes through … Embedded best practices  Guidance and resources through …Data management planning www.slideshare.net/UC3/uc3-librariesandcurationbarriersandincentives www.cdlib.org/uc3 uc3@ucop.edu n2t.net/ezid datashare.ucsf.edu merritt.cdlib.org dmptool.orgdataup.org

Editor's Notes

  • #2 Copyright © 2013 by The Regents of the University of CaliforniaThis work is made available under the terms of the Creative Commons Attribution-ShareAlike 3.0 license
  • #5 http://www.flickr.com/photos/93623724@N08/8677103901
  • #6 FotoBIB, Barcode, http://www.flickr.com/photos/fotobib/5555065521Minh Meo, XâydựngliênkếtbacklinkEdu Links, http://www.flickr.com/photos/minhmeoinfo/4597866532
  • #8 Barry Egan, File rio 2006, http://www.flickr.com/photos/vixon/116447718
  • #9 Wealth of Health, Nanomedicinescientifist working at the laboratory, http://www.flickr.com/photos/wealthofhealth4/6919840647Martin Caltrane, Work desk, http://www.flickr.com/photos/34067077@N00/4576265327
  • #21 http://sd.keepcalm-o-matic.co.uk/i/keep-calm-and-ask-a-librarian-33.png