Published work is held by libraries, Research data is held by data archiving centers. There is no effective way to link between datasets and articles. … no widely used method to cite or identify datasets … no easy way to share or get credit for data creation.
Because researchers and data both move too. Various globally unique identifiers *are* available and they become persistent when managed over time. This means that an item can be reliably referenced for future access by humans and software.
To address this challenge, DataCite was formed in 2009 by 10 Libraries and Research Centers Many of the European partners being National libraries of technology, whereas two of the three North American partners are University-based.
Here’s the basic DataCite structure, simplified.
This slide illustrates how a researcher might work with DataCite. A researcher may nurture a dataset for months or years before she or he decides to archive it. During this time, the researcher may want to obtain a “preservation-ready” identifier for the dataset. The researcher can go directly to CDL for this identifier, and then later embed this identifier inside the DOI issued upon deposit of the dataset. When it comes time to deposit, the researcher will upload the dataset and descriptive metadata to a data center, library, or publisher’s data storage center. The data storage center will then contact a DataCite member, CDL, to request registration of the dataset . CDL registers the dataset with the DOI infrastructure at the TIB (German National Library of Science and Technology). CDL can then return the DOI to the data center and the researcher, forming the basis of a citation.
Here’s a look at how an embedded reference to a dataset could work. This is ScienceDirect, looking at a link to a Supplementary Dataset.
If you click on the link, then the pop-up window shows you the metadata associated with the dataset, linking back to the article using the article’s DOI, and it provides you with the opportunity to download the dataset.
DOI—or maybe other identifiers? Creator Title Publisher PublicationDate recommending a standard citation format for datasets, based on a small number of properties required for DOI registration; providing the basis for interoperability with other data management schemas; promoting dataset discovery with optional properties allowing for flexible description of the resource, including its relationship to other resources; and, laying the groundwork for future services , including discovery tool implementations by the use of controlled lists of terms as well as standard vocabularies where available.
Big issues: Dublin Core issues: how to ensure we can easily exchange data Citation: format of, and what is the set of required fields Discipline: what vocabulary should we use (DDC elicited a big negative response) Relation type: our list is flawed—can we make it good? Do we want to maintain it? Can we find one we like? Resource type: another list that needs work Format: yet another list that needs work or replacement and Version: here we may need a policy decision as well as some field level workl
CDL has built a tool to provide DataCite DOIs and other persistent identifiers, and we call it EZID. a kind of 1-stop shop for these long-term identifiers. We’re starting with DOIs and ARKs , and we’ll add others over time. The initial functionality automates identifier registration + resolution as well as metadata entry + maintenance. Over time, we’ll add object deposit as an option as well, by hooking up with UC3’s microservices. EZID is available to the entire UC community. We’ll be opening it up more broadly in 2011. It is discipline agnostic.
Here’s a quick picture of our home page. I’m sharing the URL for a couple of reasons. I’m really proud of our service. Although we aren’t able to offer accounts to a broad customer base at the present time, we can let you play around with the service and see how it works. Simply click on the HELP tab and you will see that you can create test DOIs and test ARKs. This gives you a feel for how EZID works.
At many points in the research life cycle, the ability to easily create and manage unique, persistent identifiers can be advantageous. Let’s say you do data-intensive research and writing. You want to refer to the dataset right now even though you haven't yet found a permanent &quot;home&quot; for the data. Register the dataset now with EZID, and you get a clickable reference you can use in your paper. When your papers are published and you move your data, you can update the metadata, and the clickable reference will still work. If you work with a distributed team where the datastore may need to move, or you get a grant from the NSF and you need to demonstrate good data management practices, these are situations where it may make sense to get and maintain persistent identifiers using a tool like EZID.
-> by enabling them to find, cite, and get credit for research datasets with confidence ->by providing workflows and standards for data publication ->as they add datasets to their historic collection-building activities , allowing them to preserve their institution’s research investments ->to enrich their publications with the full story
DataCite and DataCite Metadata
DataCite and the DataCite Metadata Kernel Joan Starr, EZID Service Manager, & Strategic & Project Planning Manager, CDL
Today’s story <ul><li>Datasets have a problem. </li></ul><ul><li>DataCite addresses it. </li></ul><ul><li>Metadata plays a role. </li></ul><ul><li>CDL builds a tool: EZID </li></ul>
Problem statement: second-class citizens in the scholarly record <ul><li>Research data </li></ul><ul><li>Journal article </li></ul>Data is difficult to manage after project funding ceases Who has it? How do I get it? What is it’s impact? Where is it? Libraries keep it safe. Many libraries and archives have it . Many libraries and archives have it and will share it. I can monitor its impact. I know how to find it.
Boiling the problem down & finding a solution <ul><li>Character String </li></ul><ul><ul><li>serial numbers </li></ul></ul><ul><ul><li>names </li></ul></ul><ul><ul><li>addresses </li></ul></ul><ul><li>Object </li></ul><ul><ul><li>images </li></ul></ul><ul><ul><li>texts </li></ul></ul><ul><ul><li>datasets </li></ul></ul>Long-term identifiers Permanent Association
DataCite members <ul><li>Technische Informationsbibliothek (TIB), Germany </li></ul><ul><li>Australian National Data Service (ANDS) </li></ul><ul><li>The British Library </li></ul><ul><li>California Digital Library, USA </li></ul><ul><li>Canada Institute for Scientific and Technical Information (CISTI) </li></ul><ul><li>L’Institut de l’Information Scientifique et Technique (INIST), France </li></ul><ul><li>Library or the ETH Zürich </li></ul><ul><li>Library of TU Delft, </li></ul><ul><li>The Netherlands </li></ul><ul><li>Purdue University, USA </li></ul><ul><li>Technical Information Center of Denmark </li></ul>
DataCite structure International DOI Foundation DataCite Member Institution Managing Agent (TIB) Data Researcher or Producer Data Centre Data Centre Data Center, Library, Publisher
DataCite example International DOI Foundation DataCite/TIB CDL (opt) CDL-hosted EZID id service Researcher data + metadata registrationrequest DOI resolver registration service URL plus id Data Centre Data Centre Data Center, Library or Data Publisher
Metadata makes the citation <ul><li>A few required properties </li></ul><ul><li>A few more optional properties </li></ul><ul><li>4 goals: </li></ul><ul><li>recommend a citation format for datasets </li></ul><ul><li>provide the basis for interoperability </li></ul><ul><li>promote dataset discovery </li></ul><ul><li>lay the groundwork for future services </li></ul>
Where are we now? <ul><li>Sept. 20, 2010: Community comment period ended </li></ul><ul><li>Now in discussion over the results… </li></ul><ul><li>Projected completion: ~November </li></ul><ul><li>Metadata Supervisor, schema versioning </li></ul>
EZID (easy ID) <ul><li>One stop shop for DOIs & more </li></ul><ul><li>California Digital Library is a trusted service provider </li></ul><ul><li>EZID creates ids, stores metadata and resolver target URLs. </li></ul><ul><li>EZID supports DataCite DOIs and lower-cost ids (e.g. ARKs) </li></ul>
The EZID Service: a key tool for research <ul><li>Assisting data intensive research </li></ul><ul><li>Helping a research team </li></ul><ul><li>Facilitating data publication </li></ul><ul><li>Managing the output of a grant </li></ul><ul><li>For more information: </li></ul><ul><li>http://www.cdlib.org/services/uc3/ezid/ </li></ul>
Bridging the data gap <ul><li>DataCite & EZID empower researchers </li></ul><ul><li>DataCite & EZID support data centers </li></ul><ul><li>DataCite & EZID extend libraries </li></ul><ul><li>DataCite & EZID enable publishers </li></ul>