NATIONAL FEDERATION OF ADVANCED INFORMATION SERVICES
Mastering the Curation, Integrity and
Citation of Quality Research Data:
Research Data Publication, Part II
Richard Huffine, Independent Consultant
NFAIS Hybrid One-Day Workshop, April 20, 2015
Overview
• Publishing: The Change in
Expectations
• Where it Starts: Planning for
Data
• Role of Data in Publishing
• Metadata and Citation
• Emerging U.S. Federal Policies
• Case Study: USGS
• Capabilities of Data
Repositories
• Access to Data
– Role of Librarians
– Role of Publishers
– Other Commercial Interests
• *A Note About Copyrights
• Questions and Feedback
Publishing: the Change in Expectations
• Just a generation ago publishing was a very prospective
industry
– Most publications were allowed to go out of print
– Publishers were always looking for their next releases, not
focusing on what they had published in the past
• With the dawn of electronic publishing, the industry
changed
• Publishers digitized their entire backlists and are making
the most of everything they publish
• This opens the door for publishers to take an interest in
the data associated with publications
• Data publishing will continue to grow in both
volume and diversity:
– Data associated with publications
– Data as the publication, largely to replace reference
books
– Data sets and sub-sets for specific applications
– Data systems which aggregate and collect data from
multiple sources
• The processes for licensing both content and data
are converging
Publishing: the Change in Expectations
Where it Starts: Planning for Data
• A number of funders are now requiring data
management plans as part of the funding process
• Those plans include the strategies authors will use to
archive, distribute, and preserve the data collected
using those funds
• That data – the entirety of what is collected in support
of a research project – is typically archived by the
institutional sponsors
• Smaller subsets of the data – that which supports
specific publications – is being published either with or
at the same time as the published research.
Role of Data in Publishing
• Publishers have adopted a variety of strategies for
making data available
• Some support ancillary file delivery; others point to
author-provided locations for data
• Some publishers are developing policies for data
availability
– PLoS, Sharing of Data, Materials, and Software
• https://www.plos.org/policies/#sharing
• Beyond publication, institutions are investing in data
management infrastructures that can assist researchers
in managing their data in perpetuity.
– Purdue University Research Repository (PURR)
• https://purr.purdue.edu/
Metadata and Citation
• Metadata Standards
– Repositories should collect everything they can about
the data they hold and only limit what they share
based on the specified standards of requestors
• Citation
– Repositories should provide persistent Identifiers to
data sets and to their descriptive metadata records
– The elements for citation should be one of the
required elements during the ingest of a data
collection.
Emerging U.S. Federal Policies
• The U.S. federal government does not currently have a
legislative mandate to share data produced using federal
funding
• But the current administration has made commitments to
increase the availability of data – both government
produced and funded
• All agencies are developing strategies for managing their
data and making it more accessible
• Some agencies have developed policies in response to a
Presidential Memorandum from 2013 – to enhance public
access to publicly funded research
• Only the Departments of Health & Human Services and
Education have mandates defined in law
USGS: A Case Study
• Longstanding Data Series publication process
• Community for Data Integration (www.usgs.gov/cdi/)
• Data Management Web site (www.usgs.gov/datamanagement/share.php)
• USGS Policies (www.usgs.gov/usgs-manual/95imlist.html)
– Scientific Data Management Foundation
– Metadata for Scientific Data, Software, and Other Information Products
– Review and Approval of Scientific Data for Release
– Preservation Requirements for Digital Scientific Data
• Moving towards repository services to manage data and publications
for discovery and access.
Capabilities of Data Repositories
• The technical infrastructure for managing digital content is evolving.
• Publishers and research institutions currently have very different
strategies
• Publishers are using a variety of expensive commercial products that
can scale to their needs.
• Research institutions are developing repository solutions using Open
Source products that require significant investment and
development
• Neither path is sustainable and the two rarely interface with one
another
• The two also place different priorities on persistence, relationships
to other objects, and end-user capabilities
• Capabilities like Application Protocol Interfaces (APIs) and
Representational State Transfer (REST) services are being sought by
users of both of these solutions
Access to Data: Role of Librarians
• Librarians have a unique skill set to help address the
requirements for data access in this changing environment
• Librarians sit at the intersection between:
– Publishers and Users
– Researchers and Institutional Repositories
– Publishers and Institutions
• Librarians can, and should, be working to improve the path
for data access through improving the interchange that
occurs at these intersections
• Librarians – and not necessarily libraries – are needed to
support a culture of continuous improvement in access and
usability of both publications and data
• Boston Public Library to Tackle Boston’s Data
– http://www.bostonherald.com/business/business_markets/2015/04/library_to_tackle_bostons_data
Access to Data: Role of Publishers
• Publishers, like funders, need to establish standards for data
sharing and facilitate access to data, regardless of where it is
housed
• Publishers need to build on the current trends and make the
data they provide more useful to the users of their content.
Including:
– Interactivity and visualization with data associated with
publications
– Data services for accessing data through direct interchange
• Improved integration with institutional repositories to
facilitate access to ancillary material regardless of where it
resides
Access to Data:
Other Commercial Interests
• The responsibility for data publication and access does not
reside solely with publishers and institutional repositories
• A number of other commercial interests can step up to
support improved access to data and enhanced services for
its re-use
• Research Data Management is a growing opportunity for
both commercial and non-commercial development
• Commercial services that support scientists like Mendeley,
Flow, and EndNote could develop tools for discovery,
visualization, and integration with other sources of data and
analysis
• As publishers and repositories improve, the ability to
support researchers with enhanced tools grows.
* A Note About Copyrights
• Just a note to clarify that in the United States, copyright
law does not apply to facts, data, or ideas. However,
copyright may protect a collection of data as contained
in a database or compilation, but only if it meets certain
requirements.
• In Europe, however, provides much greater protection
of databases. It prohibits the extraction or reutilization
of any database in which there has been a substantial
investment in either obtaining, verification, or
presentation of the data contents.
– Database legal protection
• http://www.bitlaw.com/copyright/database.html
Questions and Feedback
• What questions did today’s presentation raise
for you?
Feedback:
Richard Huffine
202-253-3511
richardhuffine@gmail.com

Data Publishing Overview

  • 1.
    NATIONAL FEDERATION OFADVANCED INFORMATION SERVICES Mastering the Curation, Integrity and Citation of Quality Research Data: Research Data Publication, Part II Richard Huffine, Independent Consultant NFAIS Hybrid One-Day Workshop, April 20, 2015
  • 2.
    Overview • Publishing: TheChange in Expectations • Where it Starts: Planning for Data • Role of Data in Publishing • Metadata and Citation • Emerging U.S. Federal Policies • Case Study: USGS • Capabilities of Data Repositories • Access to Data – Role of Librarians – Role of Publishers – Other Commercial Interests • *A Note About Copyrights • Questions and Feedback
  • 3.
    Publishing: the Changein Expectations • Just a generation ago publishing was a very prospective industry – Most publications were allowed to go out of print – Publishers were always looking for their next releases, not focusing on what they had published in the past • With the dawn of electronic publishing, the industry changed • Publishers digitized their entire backlists and are making the most of everything they publish • This opens the door for publishers to take an interest in the data associated with publications
  • 4.
    • Data publishingwill continue to grow in both volume and diversity: – Data associated with publications – Data as the publication, largely to replace reference books – Data sets and sub-sets for specific applications – Data systems which aggregate and collect data from multiple sources • The processes for licensing both content and data are converging Publishing: the Change in Expectations
  • 5.
    Where it Starts:Planning for Data • A number of funders are now requiring data management plans as part of the funding process • Those plans include the strategies authors will use to archive, distribute, and preserve the data collected using those funds • That data – the entirety of what is collected in support of a research project – is typically archived by the institutional sponsors • Smaller subsets of the data – that which supports specific publications – is being published either with or at the same time as the published research.
  • 6.
    Role of Datain Publishing • Publishers have adopted a variety of strategies for making data available • Some support ancillary file delivery; others point to author-provided locations for data • Some publishers are developing policies for data availability – PLoS, Sharing of Data, Materials, and Software • https://www.plos.org/policies/#sharing • Beyond publication, institutions are investing in data management infrastructures that can assist researchers in managing their data in perpetuity. – Purdue University Research Repository (PURR) • https://purr.purdue.edu/
  • 7.
    Metadata and Citation •Metadata Standards – Repositories should collect everything they can about the data they hold and only limit what they share based on the specified standards of requestors • Citation – Repositories should provide persistent Identifiers to data sets and to their descriptive metadata records – The elements for citation should be one of the required elements during the ingest of a data collection.
  • 8.
    Emerging U.S. FederalPolicies • The U.S. federal government does not currently have a legislative mandate to share data produced using federal funding • But the current administration has made commitments to increase the availability of data – both government produced and funded • All agencies are developing strategies for managing their data and making it more accessible • Some agencies have developed policies in response to a Presidential Memorandum from 2013 – to enhance public access to publicly funded research • Only the Departments of Health & Human Services and Education have mandates defined in law
  • 9.
    USGS: A CaseStudy • Longstanding Data Series publication process • Community for Data Integration (www.usgs.gov/cdi/) • Data Management Web site (www.usgs.gov/datamanagement/share.php) • USGS Policies (www.usgs.gov/usgs-manual/95imlist.html) – Scientific Data Management Foundation – Metadata for Scientific Data, Software, and Other Information Products – Review and Approval of Scientific Data for Release – Preservation Requirements for Digital Scientific Data • Moving towards repository services to manage data and publications for discovery and access.
  • 10.
    Capabilities of DataRepositories • The technical infrastructure for managing digital content is evolving. • Publishers and research institutions currently have very different strategies • Publishers are using a variety of expensive commercial products that can scale to their needs. • Research institutions are developing repository solutions using Open Source products that require significant investment and development • Neither path is sustainable and the two rarely interface with one another • The two also place different priorities on persistence, relationships to other objects, and end-user capabilities • Capabilities like Application Protocol Interfaces (APIs) and Representational State Transfer (REST) services are being sought by users of both of these solutions
  • 11.
    Access to Data:Role of Librarians • Librarians have a unique skill set to help address the requirements for data access in this changing environment • Librarians sit at the intersection between: – Publishers and Users – Researchers and Institutional Repositories – Publishers and Institutions • Librarians can, and should, be working to improve the path for data access through improving the interchange that occurs at these intersections • Librarians – and not necessarily libraries – are needed to support a culture of continuous improvement in access and usability of both publications and data • Boston Public Library to Tackle Boston’s Data – http://www.bostonherald.com/business/business_markets/2015/04/library_to_tackle_bostons_data
  • 12.
    Access to Data:Role of Publishers • Publishers, like funders, need to establish standards for data sharing and facilitate access to data, regardless of where it is housed • Publishers need to build on the current trends and make the data they provide more useful to the users of their content. Including: – Interactivity and visualization with data associated with publications – Data services for accessing data through direct interchange • Improved integration with institutional repositories to facilitate access to ancillary material regardless of where it resides
  • 13.
    Access to Data: OtherCommercial Interests • The responsibility for data publication and access does not reside solely with publishers and institutional repositories • A number of other commercial interests can step up to support improved access to data and enhanced services for its re-use • Research Data Management is a growing opportunity for both commercial and non-commercial development • Commercial services that support scientists like Mendeley, Flow, and EndNote could develop tools for discovery, visualization, and integration with other sources of data and analysis • As publishers and repositories improve, the ability to support researchers with enhanced tools grows.
  • 14.
    * A NoteAbout Copyrights • Just a note to clarify that in the United States, copyright law does not apply to facts, data, or ideas. However, copyright may protect a collection of data as contained in a database or compilation, but only if it meets certain requirements. • In Europe, however, provides much greater protection of databases. It prohibits the extraction or reutilization of any database in which there has been a substantial investment in either obtaining, verification, or presentation of the data contents. – Database legal protection • http://www.bitlaw.com/copyright/database.html
  • 15.
    Questions and Feedback •What questions did today’s presentation raise for you? Feedback: Richard Huffine 202-253-3511 richardhuffine@gmail.com