Dataset Citation and Identifiers: DOIs, ARKs, and EZID


Published on

Joan Starr's presentation at NCAR'S workshop on Bridging Data Lifecycles: Tracking Data Use via Data Citations, April 2012

Published in: Technology, Sports
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Image credits:By trickofthelight, letideascompete, JohnGiez, AtmosNews - NCAR & UCAR, Laertes, secorlew, tab2space, TheNose,
  • Librarians THINK ABOUT THE METADATA THAT THE CITATION REPRESENTS.TO US, IT LOOKS LIKE….DESCRIPTIONAnd we want it to support DISCOVERY AND PRESERVATIONOur interests coincide with the researchers.RE-USE just like ACCESS demands:To know that the data exist, Know where to get the data, andBeable to get the datain a form that is easily integrated into local workflows. And bothPRESERVATION and DATA MANAGEMENT demand:The object be easy to maintainThe funders’ requirements are for data management and And the library’s charge is to preserve our institutions’ scholarly assets
  • From ICPSR—Inter-University Consortium for Political and Social Research identifier (such as the Digital Object Identifier, Uniform Resource Name URN, or Handle System)From ESIP –Earth Science Information Partners ((
  • On Mon, Nov 7, 2011 at 12:19 PM, <> wrote:Dear all, We (Faculty of 1000 and GigaScience/BGI) are currently writing an open letter to Nature/Science about the fact that data DOIs need to be included in the proper reference list of a paper so that for one, they can be picked up by Thomson Reuters and counted properly as data citations, as there have been some instances recently of publishers refusing to include these formally in the ref list.  In fact there has been quite a lot of discussion recently in various venues about how important this is and we would like to obviously get as many publishers as possible to agree to sign up to this.
  • So here is what this means. Here is an example of a data set deposited with one of our clients, Dryad.Dryad is an international repository of data underlying peer-reviewed articles in the basic and applied biosciences.
  • So just what are these Identifiers?
  • DOIs are one kind of persistent identifier.But what is an identifier?An identifier is an alphanumeric string assigned to an object, and if that assignment is managed with some metadata and the object is made available over time, the identifier becomes a VERY reliable way of keeping track of that object.
  • (this is not an actual DOI, nor an actual study)
  • And here’s that same DOI some time later.THE STRING NEVER CHANGES. This means it can be cited, tracked and associated with all kinds of metadata.
  • We’re going to look at that same DOI so we can talk about it’s structure. Remember: this is a STRING associated with a TARGET URL.DOI structure is based on the Handle system of identifiers, because you can think of DOIs are a special implementation of the Handle system.So, here is the segment called the PREFIX.All DOI prefixes begin with ’10’ and this is followed by a “dot” and more numbers. The prefix is a unique number assigned to the specific registrant of DOIs. CDL has its own prefix, for example. NCAR has one too. The prefix is the common element in every DOI the registrant makes.The second part is the suffix--the part after the slash. This part has to be unique for every DOI created with the prefix.
  • EZID is CDL’s application for offering DataCite DOIs, ARKs as well as other identifiers. Soon we’ll support URNs, for example.
  • How can we be in the business of issuing DataCite DOIs? California Digital Library was one of the founding members.DataCite was indeed formed in 2009 by 10 Libraries and Research Centers with a Mission: “"Helping you find, access, and reuse data“The number has now grown to 15. In addition there are 3 associate members, including the Korea Institute of Science and Technology Information and BGI, so there is a presence in Asia.DATACITE’s primary methodology for achieving this mission: issuing DOIs (Digital Object Identifiers) for datasets.
  • If you go to the Home Page, you can use the UI to test EZID. CLICK for HELP TAB.
  • On the Help screen, you have the choice of creating a test ARK or DOI.
  • EZID creates the identifier and sends you to the MANAGE tab where you have the opportunity to enter a target URL and other metadata.
  • When you hover over a field, it opens up for editing as you can see here. This is where you would go if you wanted to maintain the metadata or the target URL.
  • ARKs come from the Library and Museum world and have been adopted by some large cultural organizations around the world.FLEXIBLE: can identify objects of any type: digital, physical, living and intangible.CASE SENSITVE: MORE OPTIONS (CD, Cd, cD, cd are all distinct)ARKs have a feature called suffix pass-through—remember sufffixes? It means you can register the root of a file structure and get pointers to the rest of the file structure for free. I’ll show you an example in a minute.ARKs CAN GIVE YOU EXTRA INFORMATION with something called “inflections” or different endings, ? and ??—in test nowIf the registrant has supplied the information, an ARK should return ? metadata and ?? commitment to persistence
  • Register a single ARKmapped to the top level of your database organization.And you get a kind of wild card. If you have 10,000 nameable parts of your database, you only need to register that one top level item.The ARK server will PASS THROUGH any suffix you later add (but don’t register) to your location server.
  • Here is an example.It’s a powerful way to handle a very large number of items. We have this in testing mode right now and hope to bring it into production later this year.
  • The gold standardDOIs are for keepsDOIs should be assigned to objects that are under good long-term management, and where there is an intention is to make the object persistently available.DOIs must be registered exclusively with metadata that is available to public view.Can DOIs and ARKs work together?These two identifier schemes can work well together, and EZID offers them both, along with policy support consistent across both schemes.
  • Researchers know all too well about the life of a dataset.You start out in a laptop (or a tablet) travelling around, or under a deskMaybe then you get emailed across the country or around the world.Years can go by as you get updated and altered.Eventually, maybe you have a day in the sun: your researcher decides to write up the results and cite you.Then, perhaps, it’s back to a server in the dark. Or, you move from server to server. Will you be forgotten?
  • That’s why we at California Digital Library have taken a life cycle approach with an array of tools.CDL has developed an array of tools and services ranging from the first stage of developing a data management plan, through to formal publication. We encourage researchers to assign an ID early in the process - to provide a credible data management plan for funders;- to make the later stages easier and - to manage situations where changes might occur during the course of the research—a researcher changes institutions or a research team changes the location of their data, for example.
  • +USE ARKS to Keep track of datasets early in the life cycle when you’re not sure where you’re keeping things. +ASSIGN ARKSTO EVERYTHING YOU NEED TO TRACK. Get common & stable references for distributed research teams.+PUT IDENTIFIERS INTO YR DATA MGMT PLAN. Identifiers can be a key part of data organization plans mandated by funders because they demonstrate commitment to long-term tracking. Use both ARKs and DOIS your plan—ARKs for tracking and organization and…+ONCE YOU ARE READY TO CITE, GET A DOI. Citations in published papers keep working even if the data moves.Photo credits:in field: by Dave Rogers black board: ©All rights reserved by University of California, stars: ©All rights reserved by University of California, table: by David Mellis,
  • One unified destination for all EZID INFORMATION.—screenshot of home pageNOTE: This screen is subject to change between now and release date!
  • b) Another big change you’ll see immediately is the new Manage IDs screen, where you can view all the identifiers you have created, as well as the last 10 you’ve been working on.In the next few months, we will also be introducing enhanced support for the DataCite metadata scheme, as well as other features that make reporting on institution accounts easier.
  • For data citation to work, it is necessary for data providers and publishers to adopt the practice.Government data centers, university-hosted research institutes, research libraries offering data management services, publishers beginning to support the data behind scholarly work.
  • And, we are banding together with the other two US DataCite full members (Purdue University and the Office of Scientific and Technical Information—OSTI) to create the DataCite US Alliance. We’ll be growing the number of US Affiliate members and, with them, we’ll have a larger voice to support researchers and organizations here in the US. We’ve noticed that there are patterns of research and data management practice here distinct from those in Europe, where DataCite is headquartered.Let me know if you are interested.
  • This is the key to tracking identifiers, building those statistics—the second piece of making data citation work. The final key is community usage—scholars must cite data and the community must use the metrics.DataCite metadata in harvestable form (OAI-PMH)Ex Libris is harvesting now. Discussions underway with Thomson Reuters and Elsevier.
  • Dataset Citation and Identifiers: DOIs, ARKs, and EZID

    1. 1. Dataset Citation and Identifiers DOIs, ARKs, and EZID Joan Starr California Digital Library April, 2012 @joan_starr
    2. 2. Dataset Citation & IdentifiersData CitationIdentifiers 101Dataset Identification with EZIDChoosing an IdentifierLife Cycle Data ManagementLooking ahead By Brain farts (Joschua)
    3. 3. Partnership between CDL | 10 UC campuses | Peer institutionsProvide solutions, services, resources for digital assetsPool & distribute diverse experience, expertise, & resources
    4. 4. Data Citation By barryegan (Vitor Leite)
    5. 5. What?• Key identifying elements• Emerging recommendations• Variation among the domains
    6. 6. How?• Key identifying elements• Emerging recommendations• Variation among the domains• In common: Persistent identifier
    7. 7. What this means…
    8. 8. What this means…
    9. 9. Dataset Citation & IdentifiersData CitationIdentifiers 101Dataset Identification with EZIDChoosing an IdentifierLife Cycle Data ManagementLooking ahead By Brain farts (Joschua)
    10. 10. Identifiers 101
    11. 11. What is an identifier?What you see: alphanumeric string (never changes)Associated with: location of object (such as a URL)Optional: who, what, when, etc (i.e. metadata) By Joelk75:
    12. 12. Identifier examplestring: doi:10.9999/FK40K2GTVhtml version: creator: Dr. Felix Kottor title: Data for chromosomal study of catfish (Ictalurus punctatus) publisher: University of Bologna date: 8/31/2011
    13. 13. Identifier examplestring: doi:10.9999/FK40K2GTVhtml version: creator: Dr. Felix Kottor title: Data for chromosomal study of catfish (Ictalurus punctatus) publisher: Dryad Data Repository date: 10/01/2011
    14. 14. Identifiers 201 By Christi Nielsen
    15. 15. Identifiers 201• string: doi:10.9999/FK40K2GTV “prefix” “suffix”
    16. 16. Dataset Citation & IdentifiersData CitationIdentifiers 101Dataset Identification with EZIDChoosing an IdentifierLife Cycle Data ManagementLooking ahead By Brain farts (Joschua)
    17. 17. EZID: long-term identifiers made easy take control of the management anddistribution of your research, share and get credit for it, and build your reputation through its collection and documentation Primary Functions 1. Create persistent identifiers 2. Manage identifiers over time 3. Manage associated metadata over time
    18. 18.
    19. 19.
    20. 20.
    21. 21.
    22. 22. Dataset Citation & IdentifiersData CitationIdentifiers 101Dataset Identification with EZIDChoosing an IdentifierLife Cycle Data ManagementLooking ahead By Brain farts (Joschua)
    23. 23. DOIs and ARKs• both can work like regular hyperlinks.• both can refer to a subset or portion of a resource.• both become persistent when the target URL is maintained., courtesy of UC Davis Special Collections
    24. 24. DOIs vs ARKs• Case sensitive• Special feature supports granularity• Informative• Less costly
    25. 25. DOIs vs ARKs: suffix pass-through• string: ark:/99999/Big4 /*• location: http://x.y.z/foo/Big4/db/*
    26. 26. DOIs vs ARKs: suffix pass-through• string: ark:/99999/Big4/table/cell/45-8.txt• location: http://x.y.z/foo/Big4/db /table/cell/45-8.txt
    27. 27. DOIs vs ARKs• Established brand in publishing• Indexed by major A&I citation databases• Cannot be deleted• More costly
    28. 28. Dataset Citation & IdentifiersData CitationIdentifiers 101Dataset Identification with EZIDChoosing an IdentifierLife Cycle Data ManagementLooking ahead By Brain farts (Joschua)
    29. 29. The Life of Data By jfcherry
    30. 30. A life cycle approach CDL Curation and Publishing Services Create, edit, share, and save data management plans Open source add-in for Microsoft Excel as a data collection tool Create and manage persistent identifiers Curation repository: store, manage, and share research dataOpen access scholarly publishing services:papers, journals, books, seminars & moreAn infrastructure to publish and get credit Data Publication for sharing research data
    31. 31. Identifiers and the data life cycle Track your Organize results your data Get more citations Meet funder requirements
    32. 32. Dataset Citation & IdentifiersData CitationIdentifiers 101Dataset Identification with EZIDChoosing an IdentifierLife Cycle Data ManagementLooking ahead By Brain farts (Joschua)
    33. 33. 1. New User Interface. By Leonard John Matthews
    34. 34. 2. Growing user community Thanks to Scott Edmunds, GigaScience Journal for input
    35. 35. 2. Growing user community
    36. 36. 3. A&I Indexing
    37. 37. For more informationEZIDEZID application: website: on Twitter: @ezidCDLJoan Starr: @joan_starr