EZID: Easy Persistent Identifiers
       and Data Citation
          31 October 2011

      John Kunze and Joan Starr
       California Digital Library
EZID:
      Easy Persistent Identifiers & Data Citation

Introduction
Citation, DataCite and EZID
     Who? Why? What?
EZID’s next steps: tech talk
     New stuff, use cases, feedback
Feedback
California Digital Library (CDL)
The research data problem
an article about data, but no data
What citation offers
• To aid scientific reproducibility
• To provide fair credit
• To ensure scientific transparency and
  reasonable accountability
• To aid in tracking the impact, including
  – helping data authors verify use of their data and
  – helping future data users identify how others have
    used the data
DataCite
German National Library of Economics (ZBW)                   Canada Institute for Scientific and Technical Information
German National Library of Science and Technology (TIB)          (CISTI)

German National Library of Medicine (ZB MED)                 Technical Information Center of Denmark

GESIS - Leibniz Institute for the Social Sciences, Germany   Institute for Scientific & Technical Information (INIST-

Australian National Data Service (ANDS)                          CNRS), France

ETH Zurich, Switzerland                                      TU Delft Library, The Netherlands

                                                             The Swedish National Data Service (SNDS)

                                                             The British Library , UK

                                                             California Digital Library (CDL), USA

                                                             Office of Scientific & Technical Information (OSTI), USA

                                                             Purdue University Library
EZID: long-term identifiers made easy
           take control of the
            management and
distribution of your research,
   share and get credit for it,
    and build your reputation
    through its collection and
              documentation




                  Primary Functions
                  1. Create persistent identifiers
                  2. Manage identifiers over time
                  3. Manage associated metadata over time
http://n2t.net/ezid
Current EZID Clients
                                               A partial list

UC Berkeley Library (on behalf of the UC Berkeley     The Digital Archaeological Record (tDAR)
campus) Sponsored accounts:

      Open Context                                    Dryad Digital Repository
      CRCNS.org
UC San Diego Library (on behalf of the UC San Diego   Fred Hutchinson Cancer Research Center
campus)

American Astronomical Society (AAS)                   LabArchives
Centre national de documentation                      National Center for Atmospheric Research
                                                      (NCAR)
pédagogique (CNDP)
Cornell Institute for Social & Economic               USGS/Earth Sciences Data Clearinghouse
Research                                              (formerly National Biological Info. Infrastructure)
New features in trial or active development

• Service replicas: manager and resolver
• URN (Uniform Resource Name) support (urn:uuid:)
• Suffix pass-thru: do NT and get N/ST/S for free
• Tombstone/incubation/... surrogate pages, id
  status (reserved or public), and multiple targets
• Identifier status: reserved or public
• Content negotiation and inflections: ? ?? / .
• ARK community and governance, eg, registries
Service replicas
• EZID is an id manager that populates N2T
   – It tolerates down time
   – Other id manager services might one day populate N2T
• N2T (Name-to-Thing) is an id resolver that ...
   – It is very intolerant of down time, since it services all
     access requests for locations and metadata
   – N2T was designed with global replication in mind
URN support
• N2T and EZID are agnostic about kinds of things,
  names, and metadata
   – Digital, physical, abstract, living, fictional, groups, etc.
   – Any metadata & known profiles (DataCite, Dublin Kernel)
   – ARK, DOI, URN, Handle, IVOA, LSID, PMID, etc., requiring
     namespace “write” permission, eg, via DataCite
• In test: Uniform Resource Names (URNs)
   – urn:uuid namespace
Under the hood keysmithing terms:
bows, shoulders, blades, tips, covers
Suffix pass-thru: NT gets N/ST/S for free

Idea: if name N points to target T, then requests for N
  extended by any suffix N/S can take you to T/S
• For dataset doi:10.5072/Big4 with 10,000
  nameable components,
   – Register and manage 10,001 names or 1 name?
   – Eg, http://x.y.z/foo/Big4/db/table/cell/45-8.txt could be
     reached with doi:1.5072/Big4/table/cell/45-8.txt
• In test with ARKs. Conflict with other resolvers?
Tombstone and other surrogate pages

Tombstone, incubation, and other surrogate pages
  (probation?) auto-generated from metadata, eg,
  http://n2t.net/ezid/tombstone/id/ark:/20775/bb3243444z
Reserved identifiers and multiple targets

• Some ids must be created and managed (reserved)
  before going public, eg, for manuscript preparation
• In test: infrastructure for multiple targets and
  multiple instances of any metadata element
• What should user experience be for multiple targets?
   – Present a menu of targets (burden of choice)?
   – One target chosen for them (burden of inflexibility)?
Identifier (ARK) inflections: ? ?? / .

• Inflect: change endings w.o. creating new words
  – Terminal ? means “I want metadata”, which is similar to
    linked data content negotiation (also in EZID test)
  – Terminal ?? means “I also want support metadata”
  – Drawing board: / could mean “I want a landing page”
    and . could mean “I want the usual computable thing”
• Allow inflections beyond ARKs to DOIs/URNs?
Example: http://n2t.net/ark:/13030/qt0349g1rh?
        Renninger, Heidi; Phillips, Nathan; Hodel, Donald. “Comparative hydraulic and
            anatomic properties in palm trees (Washingtonia robusta) of varying
            heights”. 2009-04-29. ark:/13030/qt0349g1rh



     HTML content with
     embedded comments in
     ANVL/ERC and RDF



erc:
who: Renninger, Heidi,; Phillips,
   Nathan,; Hodel, Donald,
what: Comparative hydraulic and
   anatomic properties in palm
   trees (Washingtonia robusta)
   of varying heights
when: 2009-04-29
where: ark:/13030/qt0349g1rh
ARK community and governance

•   ARKs soon to have a mailing list
•   Topics: governance, community, standardization
•   Registry maintenance: shoulders and NAANs
•   N2T consortium with alternative EZID-like services
For information
• http://www.cdlib.org/services/uc3/ezid
  •   Understanding ids and conventions (shoulders, etc)
  •   Choosing the right identifier (ARK vs DOI? ARK and DOI?)
  •   EZID FAQs and N2T vision
  •   EZID Service Guidelines
  •   EZID Handout/brochure
  •   EZID webinars & slides


Contact Joan Starr at uc3@ucop.edu
For (even) more information
EZID
http://n2t.net/ezid/
http://www.cdlib.org/services/uc3/ezid/


UC Curation Center
http://www.cdlib.org/uc3
uc3@ucop.edu


UC3 webinar series
http://www.cdlib.org/uc3/uc3webinars.html


UC3/CDL
Stephen Abrams        David Loy
Lisa Colvin           Mark Reyes
Patricia Cruse        Abhishek Salve
Scott Fisher          Tracy Seneca
Erik Hetzner          Carly Strasser
Greg Janée            Joan Starr
John Kunze            Marisa Strong
Margaret Low          Perry Willett
Questions?




 by Horia Varlan
 http://www.flickr.com/photos/horiavarlan/4273168957/in/photostream/

EZID: Easy Persistent Identifiers and Data Citation

  • 1.
    EZID: Easy PersistentIdentifiers and Data Citation 31 October 2011 John Kunze and Joan Starr California Digital Library
  • 2.
    EZID: Easy Persistent Identifiers & Data Citation Introduction Citation, DataCite and EZID Who? Why? What? EZID’s next steps: tech talk New stuff, use cases, feedback Feedback
  • 3.
  • 5.
    The research dataproblem an article about data, but no data
  • 6.
    What citation offers •To aid scientific reproducibility • To provide fair credit • To ensure scientific transparency and reasonable accountability • To aid in tracking the impact, including – helping data authors verify use of their data and – helping future data users identify how others have used the data
  • 8.
    DataCite German National Libraryof Economics (ZBW) Canada Institute for Scientific and Technical Information German National Library of Science and Technology (TIB) (CISTI) German National Library of Medicine (ZB MED) Technical Information Center of Denmark GESIS - Leibniz Institute for the Social Sciences, Germany Institute for Scientific & Technical Information (INIST- Australian National Data Service (ANDS) CNRS), France ETH Zurich, Switzerland TU Delft Library, The Netherlands The Swedish National Data Service (SNDS) The British Library , UK California Digital Library (CDL), USA Office of Scientific & Technical Information (OSTI), USA Purdue University Library
  • 9.
    EZID: long-term identifiersmade easy take control of the management and distribution of your research, share and get credit for it, and build your reputation through its collection and documentation Primary Functions 1. Create persistent identifiers 2. Manage identifiers over time 3. Manage associated metadata over time
  • 10.
  • 11.
    Current EZID Clients A partial list UC Berkeley Library (on behalf of the UC Berkeley The Digital Archaeological Record (tDAR) campus) Sponsored accounts: Open Context Dryad Digital Repository CRCNS.org UC San Diego Library (on behalf of the UC San Diego Fred Hutchinson Cancer Research Center campus) American Astronomical Society (AAS) LabArchives Centre national de documentation National Center for Atmospheric Research (NCAR) pédagogique (CNDP) Cornell Institute for Social & Economic USGS/Earth Sciences Data Clearinghouse Research (formerly National Biological Info. Infrastructure)
  • 12.
    New features intrial or active development • Service replicas: manager and resolver • URN (Uniform Resource Name) support (urn:uuid:) • Suffix pass-thru: do NT and get N/ST/S for free • Tombstone/incubation/... surrogate pages, id status (reserved or public), and multiple targets • Identifier status: reserved or public • Content negotiation and inflections: ? ?? / . • ARK community and governance, eg, registries
  • 13.
    Service replicas • EZIDis an id manager that populates N2T – It tolerates down time – Other id manager services might one day populate N2T • N2T (Name-to-Thing) is an id resolver that ... – It is very intolerant of down time, since it services all access requests for locations and metadata – N2T was designed with global replication in mind
  • 14.
    URN support • N2Tand EZID are agnostic about kinds of things, names, and metadata – Digital, physical, abstract, living, fictional, groups, etc. – Any metadata & known profiles (DataCite, Dublin Kernel) – ARK, DOI, URN, Handle, IVOA, LSID, PMID, etc., requiring namespace “write” permission, eg, via DataCite • In test: Uniform Resource Names (URNs) – urn:uuid namespace
  • 15.
    Under the hoodkeysmithing terms: bows, shoulders, blades, tips, covers
  • 16.
    Suffix pass-thru: NTgets N/ST/S for free Idea: if name N points to target T, then requests for N extended by any suffix N/S can take you to T/S • For dataset doi:10.5072/Big4 with 10,000 nameable components, – Register and manage 10,001 names or 1 name? – Eg, http://x.y.z/foo/Big4/db/table/cell/45-8.txt could be reached with doi:1.5072/Big4/table/cell/45-8.txt • In test with ARKs. Conflict with other resolvers?
  • 17.
    Tombstone and othersurrogate pages Tombstone, incubation, and other surrogate pages (probation?) auto-generated from metadata, eg, http://n2t.net/ezid/tombstone/id/ark:/20775/bb3243444z
  • 18.
    Reserved identifiers andmultiple targets • Some ids must be created and managed (reserved) before going public, eg, for manuscript preparation • In test: infrastructure for multiple targets and multiple instances of any metadata element • What should user experience be for multiple targets? – Present a menu of targets (burden of choice)? – One target chosen for them (burden of inflexibility)?
  • 19.
    Identifier (ARK) inflections:? ?? / . • Inflect: change endings w.o. creating new words – Terminal ? means “I want metadata”, which is similar to linked data content negotiation (also in EZID test) – Terminal ?? means “I also want support metadata” – Drawing board: / could mean “I want a landing page” and . could mean “I want the usual computable thing” • Allow inflections beyond ARKs to DOIs/URNs?
  • 20.
    Example: http://n2t.net/ark:/13030/qt0349g1rh? Renninger, Heidi; Phillips, Nathan; Hodel, Donald. “Comparative hydraulic and anatomic properties in palm trees (Washingtonia robusta) of varying heights”. 2009-04-29. ark:/13030/qt0349g1rh HTML content with embedded comments in ANVL/ERC and RDF erc: who: Renninger, Heidi,; Phillips, Nathan,; Hodel, Donald, what: Comparative hydraulic and anatomic properties in palm trees (Washingtonia robusta) of varying heights when: 2009-04-29 where: ark:/13030/qt0349g1rh
  • 21.
    ARK community andgovernance • ARKs soon to have a mailing list • Topics: governance, community, standardization • Registry maintenance: shoulders and NAANs • N2T consortium with alternative EZID-like services
  • 22.
    For information • http://www.cdlib.org/services/uc3/ezid • Understanding ids and conventions (shoulders, etc) • Choosing the right identifier (ARK vs DOI? ARK and DOI?) • EZID FAQs and N2T vision • EZID Service Guidelines • EZID Handout/brochure • EZID webinars & slides Contact Joan Starr at uc3@ucop.edu
  • 23.
    For (even) moreinformation EZID http://n2t.net/ezid/ http://www.cdlib.org/services/uc3/ezid/ UC Curation Center http://www.cdlib.org/uc3 uc3@ucop.edu UC3 webinar series http://www.cdlib.org/uc3/uc3webinars.html UC3/CDL Stephen Abrams David Loy Lisa Colvin Mark Reyes Patricia Cruse Abhishek Salve Scott Fisher Tracy Seneca Erik Hetzner Carly Strasser Greg Janée Joan Starr John Kunze Marisa Strong Margaret Low Perry Willett
  • 24.
    Questions? by HoriaVarlan http://www.flickr.com/photos/horiavarlan/4273168957/in/photostream/

Editor's Notes

  • #4 CDL:Serving the 10 UC campuses226,000 students 134,000 faculty and staffWorking collaborativelylibrariesdata centersmuseums, archivesfaculty and researchersCDL has historically provided strategic, integrated technical and program services in a broad portfolio, including:Groundbreaking licensing agreementsUnion bibliographic servicesData curation & preservation toolsOpen access publishing servicesCDL: http://www.cdlib.org/
  • #5 UC3:The UC Curation Center is creative partnership between the CDL, the ten UC campuses, and peer institutions in the community.An evolving community of shared concern and practice; bringing together diverse experience, expertise, and resources; providing robust curation solutions.
  • #7 helping data authors verify use of their data andhelping future data users identify how others have used the data.Adapted from ESIPhttp://wiki.esipfed.org/index.php/Interagency_Data_Stewardship/Citations/provider_guidelinesEarth Science Information Partners have identified 6 important reasons for data citation.
  • #8 In recognition of this, DataCite was formed in 2009 by 10 Libraries and Research Centers.
  • #9 The number has now grown to 15. In addition there are 3 associate members, including the Korea Institute of Science and Technology Information, so there is a presence in Asia.Mission: “"Helping you find, access, and reuse data"Advocacy, citation
  • #10 CDL is a founding member of DataCite, and our application for offering DataCite DOIs as well as other identifiers is EZID.
  • #11 So, this is our User Interface. EZID also has a machine-to-machine interface, an API, and a link to the documentation is here.If you’d like to try EZID, simply click on the help tab [CLICK] here.
  • #12 Academic, non-profit, government, and commercial
  • #23 Reiterate the testing modelUC3EZID Website