Unpacking persistent identifiers for research


Persistent Identifiers (PiDs) for research – why we have them, why there are so many PiD systems, how they work looking at a few examples (Handles, DOIs, ORCIDs), how to choose one, can PiD systems fail and what’s happening in the international PiD community

  1. 1. VALA Tech Camp 13-14 July 2017 Unpacking persistent identifiers for research Natasha Simons Senior Research Data Specialist
  2. 2. What’s the problem?
  3. 3. What are Persistent Identifiers (PiDs)? A persistent identifier is a long–lasting reference to a digital resource Photo attribution: Jan Hettenhausen - (reproduced with permission)
  4. 4. Use PiDs to connect… Researchers Publications Data Software Methods Equipment ??? Why use PiDs? PiDs play a key role in the discoverability, accessibility and reproducibility of research.
  5. 5. Why are there so many PiDs? Marked by differences in: • Purpose • Scope • Underlying technology • Governance and social infrastructure • Metadata collected • Cost • Extent of use ARK PURL NLA party ID
  6. 6. Example: The Handle System • Run by CNRI • Robust system • Widely used in publication repositories • Used to identify research datasets
  7. 7. How do Handles work? Example: = resolver service / 11343 = prefix identifying assigning body (Uni Melb) / 130078 = suffix identifying resource (Melb Uni report)
  8. 8. Example: Digital Object Identifiers (DOIs) • Run by international DOI Foundation • Robust – built on the Handle System • Origins in publishing industry • Used to identify and cite publications and research datasets • The most widely used PiD for research data
  9. 9. How do DOIs work? This is an example from Griffith University: = resolver service / 10.4225 = prefix identifying the assigning body (ANDS) / 01 = Suffix 1 – the institution identifier (Griffith University) / 4F3DB08617645 = Suffix 2 – the resource item or collection identifier (a dataset held in the Griffith data repository)
  10. 10. More about DOIs • Metadata required! Example: DataCite Metadata Schema • DOI search services e.g. DataCite • Cost involved but some agencies like ANDS offer a free service • To get a DOI through the ANDS service: m2m or manual minting
  11. 11. Example: ORCIDs • Run by ORCID organisation • Identifier for people (researchers) • Links people with their research ‘works’ • Widely used internationally • Australian research sector-wide endorsement • Embedded in scholarly workflows
  12. 12. How do ORCIDs work? • 16 digit identifier based on ISNI block • Prototype: Thomson Reuters ResearcherID • Most metadata fields are optional • Free for researchers, fee for members (organisations) • Public API (free) and premium API (members) • Transparent governance and development process
  13. 13. The power of linking PiDs • International efforts to link ORCIDs (researchers) with DOIs (publications and data) • The Scholix initiative: • a global framework to improve the links between publications and data • beneficial for all, especially publishers (display this link in journals) and repositories (link back to data held in repositories)
  14. 14. Which PiD to choose? Evaluate the PiD service: • Purpose • Scope • Underlying technology • Governance and social infrastructure • Metadata collected • Cost • Extent of use • Trustworthiness? Choose the best fit PiD for the type of resource and it’s point in the research lifecycle Better to choose one than none!
  15. 15. PiDs sound great - but hang on….? Erm… • Recent PiD crises: PURL, LSID • “Zombie PiDs”? Remember: • PiDs are both social and technical systems • Governance/ organisations can be the achilles heel of PiD systems See: Klump, J. & Huber, R., (2017). 20 Years of Persistent Identifiers – Which Systems are Here to Stay?. Data Science Journal. 16, p.9. DOI: Have PiD systems ever failed? What’s the guarantee they will stay “long lasting”?
  16. 16. Cool and groovy international PiD community
  17. 17. Summary • PiDs play a key role in the discovery, accessibility and reproducibility of research. • There are many PiD systems which vary in purpose, scope, underlying technology, governance and social infrastructure, metadata collected, cost, extent of use. • When evaluating which PiD to assign to a resource, consider: • The differences above and importantly, trustworthiness • Better to assign a PiD or more than no PiD at all • Remember that PiDs are about social as well as technical infrastructure. It is the responsibility of the PiD owner (e.g. a university) to update the PiD if the resource location changes. • PiDs are evolving so get your geek on and join in the discussions!
  18. 18. Further resources • ANDS website for PiD Guides, DOI service, Handle service: • ANDS PiDs short bites webinar series: (persistent identifiers playlist) - more to come in this series! • THOR Project: and webinar series: pids-what-why-how/ • ICSU/CODATA Data Science Journal special issue: 20 years of Persistent Identifiers years-of-persistent-identifiers-applications-and-future- directions/
  19. 19. With the exception of logos, third party images or where otherwise indicated, this work is licensed under the Creative Commons Australia Attribution 3.0 Licence. ANDS is supported by the Australian Government through the National Collaborative Research Infrastructure Strategy Program. Monash University leads the partnership with the Australian National University and CSIRO. Natasha Simons Tw: @n_simons ORCID: