Thank you for this opportunity to speakwith you today about Dataset Citation and Identification and I’d like to give a special thanks to Sarah Shreeves for asking me.Image credits:By: MDB 28, http://www.flickr.com/photos/mdb28/3787828482/By davecurlee, http://www.flickr.com/photos/davecurlee/4689603488/By sabarishr: http://www.flickr.com/photos/sabarishr/5422105775/By rkrichardson: http://www.flickr.com/photos/45126397@N06/4506403367/By awsheffield: http://www.flickr.com/photos/awsheffield/5932294950/By Scutter: http://www.flickr.com/photos/scutter/109698478/By Amy the Nurse: http://www.flickr.com/photos/amyashcraft/4522601466/By Anita & Greg: http://www.flickr.com/photos/anita__greg/2849453715/
This is a quick look at the topics I’ll try to cover this morning and I’m certain we’ll have plenty of time for discussion afterward.
My library:Serving the 10 UC campuses226,000 students 134,000 faculty and staffWorking collaborativelylibrariesdata centersmuseums, archivesfaculty and researchersCDL has historically provided strategic, integrated technical and program services in a broad portfolio, including:Groundbreaking licensing agreementsUnion bibliographic servicesOpen access publishing servicesData curation & preservation toolsCDL: http://www.cdlib.org/
Adapted from ESIPhttp://wiki.esipfed.org/index.php/Interagency_Data_Stewardship/Citations/provider_guidelinesEarth Science Information Partners have identified 6 important reasons for data citation.REPRODUCIBILITYGETTING CREDIT FOR WORKTRANSPARENCY AND ACCOUNTABILITY FOR THE SCIENTIST AND THE DATA CENTERTRACKING IMPACTRELATED: AUTHOR SEEING HOW DATA IS USED AND SUBSEQUENT USERS SEEING FURTHER USE.
From ESIP –Earth Science Information Partners (same link)Author(s)--the people or organizations responsible for the intellectual work to develop the data set. The data creators.Release Date--when the particular version of the data set was first made available for use (and potential citation) by others.Title--the formal title of the data setVersion--the precise version of the data used. Careful version tracking is critical to accurate citation.Archive and/or Distributor--the organization distributing or caring for the data, ideally over the long term.Locator/Identifier--this could be a URL but ideally it should be a persistant service, such as a DOI, Handle or ARK, that resolves to the current location of the data in question.Access Date and Time--because data can be dynamic and changeable in ways that are not always reflected in release dates and versions, it is important to indicate when on-line data were accessed.From ICPSR—Inter-University Consortium for Political and Social Research http://www.icpsr.umich.edu/icpsrweb/ICPSR/curation/citations.jspTitleAuthorDateVersionPersistent identifier (such as the Digital Object Identifier, Uniform Resource Name URN, or Handle System)
On Mon, Nov 7, 2011 at 12:19 PM, <Rebecca.Lawrence@f1000.com> wrote:Dear all, We (Faculty of 1000 and GigaScience/BGI) are currently writing an open letter to Nature/Science about the fact that data DOIs need to be included in the proper reference list of a paper so that for one, they can be picked up by Thomson Reuters and counted properly as data citations, as there have been some instances recently of publishers refusing to include these formally in the ref list. In fact there has been quite a lot of discussion recently in various venues about how important this is and we would like to obviously get as many publishers as possible to agree to sign up to this.
DataCite was formed in 2009 by 10 Libraries and Research Centers with a Mission: “"Helping you find, access, and reuse data“The number has now grown to 15. In addition there are 3 associate members, including the Korea Institute of Science and Technology Information, so there is a presence in Asia.California Digital Library was one of the founding members.DATACITE’s primary methodology for achieving this mission: issuing DOIs (Digital Object Identifiers) for datasets.
EZID is CDL’s application for offering DataCite DOIs as well as other identifiers.
So here is what this means. Here is an example of a data set deposited with one of our clients, Dryad.Dryad is an international repository of data underlying peer-reviewed articles in the basic and applied biosciences.
Where are these discussions happening?Harvard Data Citation Principles workshop in May 2011ResearchDomain conferencesDC-SAM meetingsDataONE meetingsDataCite Summer meetingListservs
VERSION:Major / Minor—take from ESIP:“The key to making registered locators, such as DOIs, ARKS, or Handles, work unambiguously to identify and locate data sets is through careful tracking and documentation of versions. Individual stewards and data centers will need to develop and follow their own practices.THEREFORE, The metadata standard has to support a variety of practices.Suggestions:A. Track major_version.minor_version. Individual stewards need to determine which are major vs. minor versions and describe the nature and file/record range of every version. Typically, something that affects the whole data set like a reprocessing would be considered a major version.B. Assign unique locators to major versions.C. Include Version in the citation.DataCite Metadata Schema supports description of relationships between registered datasets, that is, earlier or later versions of the same dataset.
EITHERthe dataset is frequently revised, that is, data points are continually improved or updated, or frequently expanded, such as sensor data maintained as a time series. DCC in the UK recommends2 possible approaches:time slices and snapshots.For dynamic datasets, assign identifiers when new snapshots or time slices are created, whether this is on a regular basis or on demand.ESIP recommends including an access date and time in the citation “Because data can be dynamic and changeable in ways that are not always reflected in release dates and versions.”
Not uncommon to have 1000 or more contributors.Possible approach: Microattribution—a new techniqueAgain, from Alex Ball DCCWhere a dataset is assembled from very many contributions, crediting each contributor individually becomes unfeasible using traditional techniques. Microattribution is a way of crediting contributors in a more compact fashion, to keep the operation manageable. It can also be used to credit people or organisations whose contributions don’t fit the roles of creator or compiler: for example, those who implement or carry out intermediate data processing steps.Instead of providing a traditional citation to the data collection paper associated with each contribution, a table is produced that lists each contribution and the agent responsible. Where possible, standard identifiers (for both contributions and contributors) are used to abbreviate the entries, and the table is included in the paper’s supplementary data.
DataCite says: In the case of datasets, "publish" is understood to mean making the data available to the community of researchers.AFFECTSPublicationDate or Publisher FOR CITATION“The name of the entity that holds, archives, publishes, prints, distributes, releases, issues, or produces the resource. This property will be used to formulate the citation, so consider the prominence of the role.” ESIP: WANTS TO LIST ONLY ARCHIVE OR DISTRIBUTOR (although they do have a PublicationPlace element)Why does this matter? There is only so much room in the citation, so do you give this spot to the data center? To the funder? To the …DataCite provide the CONTRIBUTOR field.
AT WHAT LEVEL DO YOU CITE THE DATA? YOUR RESEARCHERS MAY ASK YOU THIS QUESTION.Big science, little science, Tracking versus citationAlex Ball at DCC: Cite DataSets and Link to Publications“Cite datasets at the finest-grained level available that meets your need. If that is not fine enough, provide details of the subset of data you are using at the point in the text where you make the citation.”http://www.dcc.ac.uk/resources/how-guides/cite-datasets
Transcript of "Dataset Citation and Identification"
Dataset Citation and Identification DataCite and EZID Joan Starr California Digital Library January, 2012 @joan_starr
Dataset Identification & CitationIntroductionData Citation Why, what, howDataCite “Helping you find, access and reuse data.”EZID Easy creation and management of DataCite DOIs and other identifiers.Current discussions in data citation
Why?• To aid scientific reproducibility• To provide fair credit• To ensure scientific transparency and reasonable accountability• To aid in tracking the impact, including – helping data authors verify use of their data and – helping future data users identify how others have used the data.
What?• Key identifying elements• Emerging recommendations• Variation among the domains
How?• Key identifying elements• Emerging recommendations• Variation among the domains• In common: Persistent identifier
DataCiteGerman National Library of Economics (ZBW) Canada Institute for Scientific and Technical InformationGerman National Library of Science and Technology (TIB) (CISTI)German National Library of Medicine (ZB MED) Technical Information Center of DenmarkGESIS - Leibniz Institute for the Social Sciences, Germany Institute for Scientific & Technical Information (INIST-Australian National Data Service (ANDS) CNRS), FranceETH Zurich, Switzerland TU Delft Library, The Netherlands The Swedish National Data Service (SNDS) The British Library , UK California Digital Library (CDL), USA Office of Scientific & Technical Information (OSTI), USA Purdue University Library
EZID: long-term identifiers made easy take control of the management anddistribution of your research, share and get credit for it, and build your reputation through its collection and documentation Primary Functions 1. Create persistent identifiers 2. Manage identifiers over time 3. Manage associated metadata over time