This document discusses persistent identifiers and the EZID service for assigning and managing them. It begins with an overview of why data and identifiers are important for scholarly communication. It then covers what identifiers are, including their basic components and functions. The bulk of the document focuses on introducing the EZID service and how it can be used to easily create and manage persistent identifiers and associated metadata over time. It compares EZID to other identifier schemes like DOIs and ARKs. Finally, it discusses how identifiers can help researchers share, distribute and get credit for their work.
2. EZID: Easy dataset
identification & management
• Why data?
• Identifiers 101
• EZID: identifiers made easy!
• Choosing an identifier
• What does this mean for
you?
3.
4. Data!
By barryegan (Vitor Leite) http://www.flickr.com/photos/vixon/116447718/
12. What is an identifier?
What you see: alphanumeric string (never changes)
Associated with: location of object (such as a URL)
Optional: who, what, when, etc (i.e. metadata)
By Joelk75: http://www.flickr.com/photos/75001512@N00/2728233597/
13. Identifier example
string: doi:10.9999/FK40K2GTV
html version: http://dx.doi.org/10.9999/FK40K2GTV
location: http://www.bologna.edu/biology/xfg/123.xls
metadata
Creator: Dr. Felix Kottor
Title: Data for chromosomal study of catfish (Ictalurus
punctatus)
Publisher: University of Bologna
Publication Year: 2011
14. Identifier example
string: doi:10.9999/FK40K2GTV
html version: http://dx.doi.org/10.9999/FK40K2GTV
location: http://www.state.edu/ecology/783sdr/123.xls
metadata
Creator: Dr. Felix Kottor
Title: Data for chromosomal study of catfish (Ictalurus
punctatus)
Publisher: Dryad Data Repository
Publication Year: 2012
15. Identifiers 201
By Christi Nielsen http://www.flickr.com/photos/christinielsen/476326980/
17. EZID: long-term identifiers made easy
take control of the
management and
distribution of your research,
share and get credit for it,
and build your reputation
through its collection and
documentation
Primary Functions
1. Create long-term identifiers
2. Manage identifiers over time
3. Manage associated metadata over time
24. EZID: Easy dataset
identification & management
Why data?
Identifiers 101
EZID: identifiers made easy!
• Choosing an identifier
• What does this mean for
you?
25. DOIs and ARKs
• both can work like regular hyperlinks.
• both can refer to a
subset or portion of
a resource.
• both become persistent
when the target URL
is maintained.
http://content.cdlib.org/ark:/13030/tf0v19n605/,
courtesy of UC Davis Special Collections
26. DOIs vs ARKs
• Case sensitive
• Flexible metadata
• Special feature supports granularity
27. DOIs vs ARKs: suffix pass-through
The identifier WILD CARD!
28. DOIs vs ARKs: suffix pass-through
ark:/13030/xt54321 ---> http://example.org/hearts
http://n2t.net/ark:/13030/xt54321 http://example.org/hearts
http://n2t.net/ark:/13030/xt54321/king http://example.org/hearts/king
http://n2t.net/ark:/13030/xt54321/queen http://example.org/hearts/queen
29. DOIs vs ARKs
• Gold standard for citation
• Established brand in publishing
• Indexed by major A&I citation databases
Image credit: http://www.flickr.com/photos/mr_t_in_dc/6083561702/By Mr. T in DC
Why all the fuss about Data?Big DATA, Big MONEYBut for us and our clients…
Two aspects are key: DATA MANAGEMENT
This is also a question of an almost perfect fit with our historic mission to preserve and protect our institution’s scholarly output.472,000 in September455,000 in May!
Image credit:http://www.flickr.com/photos/60in3/2338247189/ by 60 in 3So, withthe emergence of these funder mandates, the NSF’s being the most prominent of course, you might call it the greatest thing since sliced bread for libraries.Scientists have been asked for data management plans, and they don’t have the first idea what to do about this. We’re going to hear more about the scientists’s view of things from Carly this morning.Lucky for us, these same scientists who report never stepping into libraries are now turning to librarians and asking for help!
What does DATA MANAGEMENT LOOK LIKE?The players here are domain-specific data repositories like Dryad, in the environmental sciences fields, or institutional repositories and data centers, including those run by libraries and their campus IT partners.
We’re going to look at what identifiers are—what makes them work.
DOIs are one kind of persistent identifier.But what is an identifier?An identifier is an alphanumeric string assigned to an object, and if that assignment is managed with some metadata and the object is made available over time, the identifier becomes a VERY reliable way of keeping track of that object.
Let’s take a look at one.So you can see that with just the identifier and a simple set of metadata, you get:Location for VERIFICATION & RE-USEEXPOSURE & CITATION TRACKING (this is not an actual DOI, nor an actual study)
And here’s that same DOI some time later.THE STRING NEVER CHANGES. This means it can be cited, tracked and associated with all kinds of metadata.
Is everyone with me? If so, I’m going to ask you to be brave for a few minutes while I introduce you to one more piece of information.
Let’s look at that same DOI so we can talk about it’s structure. Remember: this is a STRING associated with a TARGET URL.DOI structure is based on the Handle system of identifiers, because you can think of DOIs are a special implementation of the Handle system.So, here is the segment called the PREFIX.All DOI prefixes begin with ’10’ and this is followed by a “dot” and more numbers. The prefix is a unique number assigned to the specific registrant of DOIs. CDL has its own prefix, for example. Most EZID clients have one too. The prefix is the common element in every DOI the registrant makes.The second part is the suffix--the part after the slash. This part has to be unique for every DOI created with the prefix.
How can EZID be in the business of issuing DataCite DOIs? California Digital Library was one of the founding members.DataCite was indeed formed in 2009 by 10 Libraries and Research Centers with a Mission: “"Helping you find, access, and reuse data“The number has now grown to 16. In addition there are 3 associate members, including the Korea Institute of Science and Technology Information and BGI, so there is a presence in Asia.DATACITE’s primary methodology for achieving this mission: issuing DOIs (Digital Object Identifiers) for datasets.
If you click on this link, you’ll be able to try EZID without an account.
By default, we take to a SIMPLE create screen.
There are other features available on the ADVANCED CREATE screen and MANAGE tabs that I invite you to explore on your own.
Image credit: http://www.flickr.com/photos/mr_t_in_dc/6083561702/By Mr. T in DC
The tool we’ve been looking at, EZID, let’s you create and manage both DOIs and ARKs.But how do you choose?
ARKs come from the Library and Museum world and have been adopted by some large cultural organizations around the world.Managed by the CDL. CASE SENSITVE: MORE OPTIONS (CD, Cd, cD, cd are all distinct)FLEXIBLE: using the API, can supply metadata pairs as desired; can upload existing domain specific metadata if desired.ARKs have a feature called suffix pass-through. It means you can register the root of a file structure and get pointers to the rest of the file structure for free. I’ll show you an example in a minute.
An identifier wild card. The ability for one identifier location URL to stand in for unlimited sub-identifier locations. A sub-identifier is just an identifier extended by a suffix. Image credit: http://www.flickr.com/photos/11356857@N08/5120543262/ by OnFoot4Now (Didi)
An identifier wild card. The ability for one identifier location URL to stand in for unlimited sub-identifier locations. A sub-identifier is just an identifier extended by a suffix. Let’s assume that the identifier and location (or target URL) that you registered were those shown in red above the table. I’ve also listed them in the table as #1SUFFIX PASS-THRU means that you can submit requests to the ARK server for any sub-identifiers, and the suffixes will be passed through to the target server. So, in example #2, the suffix “king” is passed through as a request to the target server, even though it was never registered as an identifier. And so on.Why is suffix pass-through so important? Imagine you have a dataset with 10,000 nameable components, such as packages, files, or tables. You'd like to be able to reference these components for tracking purposes. With suffix pass-through, while you still have to manage the components, you only take on management for one overall dataset identifier.
The gold standardDOIs are for keepsDOIs:DOIs are identifiers originating from the publishing world and are in widespread use for journal articles. Managed by the International DOI Foundation.DOIs should be assigned to objects that are under good long-term management, and where there is an intention is to make the object persistently available.DOIs must be registered exclusively with metadata that is available to public view.
Image credit: http://www.flickr.com/photos/mzn37/562770075 by michael.newmanCan DOIs and ARKs work together?These two identifier schemes can work well together, and EZID offers them both, along with policy support consistent across both schemes.Use ARKs early in the life cycle for good data management and before it’s clear what will be cited. When you are ready to cite, get the DOI, and if desired, incorporate the ARK string into the suffix of the DOI for continuity.
Image credit:http://www.flickr.com/photos/andy_bernay-roman/380095041/ by allspice1
http://www.flickr.com/photos/sekihan/6100774057/ By sekihanhttp://www.flickr.com/photos/expressmonorail/7032291971/ By Express MonorailInformation depotDistribute information, pass along questions, etc.A full data management service centerLibrary as data management service center:Data services that libraries may provide: + data management planning+ institutional repository+ metadata creation and linking+ data archiving & curation+ consultation on above topics+ data management & data literacy training for grad students & faculty + persistent identifier servicesAnything in betweenOne size does not fill all! But EZID and long-term identifier services are a nice, discrete service that fits into any workflow.
You can be part of a growing network. 38 institutions and counting.Government data centers, university-hosted research institutes, research libraries offering data management services, publishers beginning to support the data behind scholarly work.There’s a longer list at the URL I’m showing here…