Archival Resource Key (ARK) – the other PID
Did you know…
… that ARK is a PID (Persistent IDentifier), along with DOI, ORCID, and ROR?
… that ARK may be the fastest growing PID scheme you’ve never heard of?
… that an average of 5 new ARK organizations are registered every week?
… that the ARK scheme is more than 22 years old?
… that there are more 8.2 billion ARKs in the world?
… that, as of early 2024, ARK organizations include
10 national libraries, 145 universities, 184 archives,
90 museums, and 75 journals?
Getting Started with
ARK (Archival Resource Key)
Persistent Identifiers
John Kunze, Donny Winston
ARK Alliance, 2024
Why care about ARK identifiers?
● Because robust web links are rare – the average URL lifetime is 100 days
● ARKs can be “persistent identifiers” (PIDs), which serve as permalinks
● “Ten persistent myths about persistent identifiers”
https://n2t.net/ark:/13030/c7gb1xh09
The ARK (Archival Resource Key) identifier scheme
was introduced in 2001.
A labelled URL with a globally unique identity inside it
https://n2t.net/ark:/12345/fk1234
makes ARK
actionable
(the resolver)
core globally unique
identity (independent
of web and hostname)
ARK anatomy
N2T.net is a global
“name” to “thing”
resolver
Why not “ARKresolver.net” like
most other PID schemes?
Because ARKs are inclusive
and resolvers generalize
easily.
ARK organizations
8.2 billion ARKs created by 1400+ institutions –
libraries, archives, museums, publishers, data
centers, educators, etc. For example,
University of California Berkeley
Smithsonian National Museum
National Library of France
University of Chicago
Musée du Louvre
Family Search
British Library
Google
Internet Archive
Bodleian Libraries
Berkeley Law Library
Bibliothèque Mazarine
New York Public Library
French National Archives
National Library of Austria
Library and Archives Canada
https://n2t.net/ark:/53355/cl010066723
What are ARKs used for?
● genealogical records (8 billion FamilySearch)
● publisher content (100 million Portico)
● scientific datasets and records (22 million INIST)
● scanned books and texts (30 million Internet Archive)
● bibliographic records (15 million BnF main catalog)
● museum specimens (15 million Smithsonian Institution)
● public health documents (15 million UCSF IDL)
● historical documents (21 million CDL, 5 million BnF Gallica)
● historical authors and scholars (4 million SNAC)
● fine art museum collections (490,000 Louvre)
● vocabulary terms (30,000 Periodo, YAMZ)
Inist.fr ARKs:
67375
Institute of scientific
and technical
information
French National
Centre for Scientific
Research
LMEC ARKs: 76611
Leventhal Map &
Education Center
archive.org
ARKs: 13960
Protecting content and Linked Data
Controlling costs and broken links
● ARKs as non-paywalled, durable web addresses
ARKs don’t depend on fixed domain names
● decentralized – unlike doi.org, handle.net, purl.org, and w3id.org
● expressible as either HTTP URIs or as compact URIs
○ even if non-resolving, can still learn which org assigned the ARK
Persistent identifier (PID) basics
● PIDs are just permalinks that may be recognizable (eg, has “ark:” in it, or
doi.org/10….)
● No guarantees – PIDs and permalinks break by the thousands
○ With effort and luck, some of them are repaired
● One part of keeping PIDs persistent is indirection, with HTTP redirection
○ Publish your indirect identifiers at a server (resolver) that can redirect
Web access – direct
2. URL
3. page
1. click
4. render page
web
browser
content
server
user
URL
page
USER
CONTENT
SERVER
Web access – indirect
2. URL1
1. click
6. render page web
browser
server1
user
4. URL2
5. page
3. URL2 HTTP
redirect
Example URL 1: archive.example.org/photo123
→ URL 2: photos.example.org/vault/123
A redirect is like forwarding a (request) message to a new address
content
server
URL
page
USER
CONTENT
SERVER
PIDs and resolvers – all similar
PURL (Persistent URL) purl.org/dc/terms/creator
Handle hdl.handle.net/4263537/4000
URN (Uniform Resource Name) urn.fi/urn:isbn:9514005058
DOI (Digital Object Identifier) doi.org/10.5334/dsj-2017-039
ARK (Archival Resource Key) n2t.net/ark:/53355/cl010066723
⬆ ⬆ ⬆
Resolver Name Assigning Name
Authority
≈
What PID schemes can’t do
No PID helps you against the major causes of broken links
● can’t prevent fire, war, flood, attack, bankruptcy, ...
● can’t prevent human or service provider error
● can’t guarantee your links
● can’t repair broken links for you
☹
What PID schemes can do
1. PIDs & permalinks all “aspirational”, but PIDs recognizable in the wild
2. PIDs have aspirationally persistent resolvers (n2t.net, purl.org, doi.org,
hdx.handl.net) in case your own domain name is at risk
Can base your links on the scheme’s aspirationally persistent resolver so they
might still be redirected if your own server goes away and …
if the thing still exists, is legal, and someone willing to host it and fix links
�
Brief history of PID schemes
● PURL – “URLs are fine if you redirect from purl.org”
● URN, DOI, and Handle – “all URLs and domain names are bad – except
for ours – and we redirect”
● Tim Berners-Lee – “Cool URIs don't change” [cool URLs don’t break]
● ARK – “URLs are fine if managed well, but please tell us which of your
URLs are meant for what kind of persistence”
H
PID schemes – pessimist view
Helps with major causes of broken links? PURL Handle URN DOI ARK
Prevents fire, war, flood, attack, bankruptcy, ... No No No No No
Prevents human or service provider error No No No No No
Guarantees your links, or fixes them for you No No No No No
Best practices guard against copy/paste errors No No No No Yes
Global resolver downtime less than 1 day per year No No No No Yes
Identity independence from lost domain/server name No No Yes No Yes
PID schemes – optimist view
Features and costs PURL Handle URN DOI ARK
Decentralized resolution No No No No Yes
Inferenceable syntax (variants, containment) No No No No Yes
Flexible metadata by design, including none No No No No Yes
Inflections (...?info) and content negotiation No No No No Yes
Nuanced persistence statements by design No No No No Yes
Path extensions during resolution (suffix passthrough) Yes No Yes? No Yes
Free, non-paywalled, in unlimited numbers Yes No Yes No Yes
PID schemes – ecosystem view
Identifiers in an Internet context PURL Handle URN DOI ARK
Appear in Data Citation Index, HathiTrust, Wikipedia,
Wikidata, Internet Archive, ORCID profiles
Yes Yes Yes Yes Yes
Major adoption by most academic publishers outside
the global South
No No No Yes No
Free (subsidized) account and admin interface for
one-off use, e.g., purl.org, zenodo.org, archive.org
Yes? No? No? Yes Yes?
IETF standard URI, validated by web browsers No No Yes No No
Replicated global resolver architecture No Yes No No No
Summary: ARK benefits
ARKs can serve as persistent identifiers with metadata
● found in the Data Citation Index, HathiTrust, Wikipedia,
Wikidata, Internet Archive, ORCID profiles, etc.
In contrast to other id schemes, ARKs have
● no fees, no limits, no walled gardens (decentralized)
● very flexible metadata, including none
● can be assigned to anything digital, physical, or conceptual
There is no conflict using ARKs and other identifiers at the same time
Scientific specimens from the National Museum of Natural History
http://n2t.net/ark:/65665/381440f27-3f74-4eb9-ac11-b4d633a7da3d
Cultural artifacts from the National Museum of American History
http://n2t.net/ark:/65665/ng49ca746b2-42dc-704b-e053-15f76fa0b4fa
Sculpture from the Freer Gallery of Art & Arthur M. Sackler Gallery
http://n2t.net/ark:/65665/ye3080ce305-a705-49cc-a70d-99aff8cb65da
Photographs from the National Museum of African American History and Culture
http://n2t.net/ark:/65665/fd5ad97cb86-caaf-4209-8fde-98d70f52f072
Paintings from the Smithsonian American Art Museum
http://n2t.net/ark:/65665/vk7a466371d-0413-451f-bd76-ca0becc46f94
Example Smithsonian ARKs: 65665
Slide credit: Bess Missell
https://example.org/ark:/12345/x54xz321/s3/f8.05v.tiff
_________________/ __/ ___/ ______/____/_______/
| | | | | |
| ARK Label | | Sub-parts Variants
| | |
Name Mapping Authority (NMA) | Assigned Name
|
Name Assigning Authority Number (NAAN)
ARK anatomy: the NAAN
(Name Assigning Authority Number)
What’s in a NAAN record?
A NAAN (Name Assigning Authority Number) is a 5-digit number
● Numbers are opaque, which is good for longevity. But what if you
want to know what’s behind a NAAN? In a browser, try
n2t.net/ark:67375
n2t.net/ark:76611
n2t.net/ark:13960
n2t.net/ark:
n2t.net/pdb:1YOD
Example NAAN record
Record for the National Autonomous University of Mexico
n2t.net/ark:46171 →
ark:/46171:
when: 2017.10.27
name: Universidad Nacional Autónoma de México
target: http://www.morelia.unam.mx/campus
The NAAN registry
The registry is a plain text file: n2t.net/e/pub/naan_registry.txt
Another example record: n2t.net/ark:12148 →
ark:/12148:
when: 2005-07-17
name: Bibliothèque nationale de France
target: http://ark.bnf.fr
Purpose of the NAAN (Name Assigning Authority Number):
● Resolution reference point
● Isolating assignment (autonomy, uniqueness, re-use)
Obtain a NAAN for
your organization
Fill out this form (linked, in case
you forget, from the arks.org
homepage):
n2t.net/e/naan_request
Opacity pros and cons
Can be generated (“minted”) from any source:
● Counter, Noid, UUID, ULID, even content digest
● Anything unique – but best to keep it short
● With Noid (Nice Opaque Identifiers), you get check characters
Opaque ids are a pain for humans
● Difficult to enter correctly (no clues to correct spelling)
● No clues for humans to check for transcription errors
ARK anatomy: suffixes
https://example.org/ark:/12345/x54xz321/s3/f8.05v.tiff
_________________/ __/ ___/ ______/____/_______/
| | | | | |
| ARK Label | | Sub-parts Variants
| | |
Name Mapping Authority (NMA) | Assigned Name
|
Name Assigning Authority Number (NAAN)
Linked Data implicit in ARK syntax
ARK suffix syntax implies related URIs
● Slashes “/” for hierarchy (ark:.../A/B/C implies existence of A/B and A)
● Periods “.” for variant forms (ark:.../foo.jpg is a variant of
ark:.../foo.pdf)
● Inflections “?info” modifies an ARK to request metadata
Suffixes to identify a million possible
image regions with just one ARK
Suffixes
Base Name
French National Library ARKs: 12148
https://gallica.bnf.fr/iiif/ark:/12148/btv1b8449691v/f29/2131,4016,1467,948/full/0/default.jpg
region
coordinates
page
full
quality
file format
Object life stages
ARKs are flexible – can throw away what’s not declared
ARK metadata is flexible – from none to anything you want
● Planning phase, moment of birth, first analysis,
● Creating first draft metadata, later normalized metadata,
● Pre-release feedback and insights based on limited sharing,
● Corrections, abandonment,
● … plus archiving, public release, revision, enhancement, etc.
Finding information about an ARK
An ARK should lead to 3 things:
1. the identified thing
2. metadata about it (very flexible, minimally who|what|when)
3. a nuanced persistence statement – setting expectations
To ask for metadata, append the query string “?info”:
https://example.org/ark:/12345/x54xz321?info
ARK metadata
● An ARK in a URL returns access to the thing it identifies
● To get access to its metadata, it should support ARK + ‘?info’, e.g.,
https://n2t.net/ark:/81431/p3s39k?info →
who: University of Pennsylvania Libraries
what: Walnut Street Theatre. Philadelphia, October 9, 1869.
when: 1869
where: ark:/81431/p3s39k (currently
https://ezid.cdlib.org/id/ark:/81431/p3s39k)
how: (:unav)
id created: 2017.12.06_08:42:02
id updated: 2017.12.21_11:16:02
persistence: (:unav)
who | what |
when | where
ARK metadata flexibility
Example: thousands of ARKs return DataCite DOI metadata
http://legacy-n2t.n2t.net/ark:/81986/caida.data.100004?info →
datacite: <?xml version="1.0"?>
<resource
xsi:schemaLocation="http://datacite.org/schema/kernel-4
http://schema.datacite.org/meta/kernel-4/metadata.xsd"><identifier
identifierType="ARK">81986/caida.data.100004</identifier><creators>
<creator><creatorName>University of California San Diego
Center for Applied Internet Data Analysis (UCSD CAIDA)
</creatorName></creator></creators><titles><title
xml:lang="eng">The IPv4 Routed /24 Topology Dataset</title></titles>
<publisher>University of California San Diego
Center for Applied Internet Data Analysis (UCSD CAIDA)
</publisher><publicationYear>2007</publicationYear><resourceType
resourceTypeGeneral="Dataset">Active measurements of Internet
ARKs + DOI metadata
ARK metadata / inflections
ARK “inflection”: alter the ending to alter the request
● Append ‘?info’ to request metadata (used to be ‘?’ and ‘??’)
● Earlier, append ‘?’ for metadata or ‘??’ for more (harder)
● No conflict with “content negotiation” (harder)
The inflection response format is not fixed, but it should be
human- and machine-readable (JSON, YAML)
do
does
done
doing
…
Permanence is not binary
Persistence is not “on” or “off”. It is nuanced.
● Preservation often demands change, such as
○ larger thumbnail image sizes, better OCR algorithms
○ 3-year-old “long term stable” Linux release gets security patches
○ Typos in cover pages get corrected
● And what about rapidly updated data (earth observation sensor files
that grow every 6 seconds, databases that are annotated regularly)?
Valuable objects tend to be complex, human-managed clusters
What do you mean by persistence?
Persistence statements: describing digital stickiness
John Kunze, Scout Calvert, Jeremy DeBarry, Matthew Hanlon, Greg Janée, Sandra Sweat
22 May 2017
Abstract
In this paper we present a draft vocabulary for making “persistence statements." These are not arcane notions, but simple tools for
pragmatically addressing the concern that anyone feels upon experiencing a broken web link. Scholars increasingly use scientific and
cultural assets in digital form, but choosing which among many objects to cite for the long term can be difficult. There are few
well-defined terms to describe the various kinds and qualities of persistence that object repositories and identifier resolvers do or don’t
provide. Given an object’s identifier, one should be able to query a provider to retrieve human- and machine- readable information to
help judge the level of service to expect and help gauge whether the identifier is durable enough, as a sort of long-term bet, to include
in a citation. The vocabulary should enable providers to articulate persistence policies and set user expectations.
Setting user expectations, part 1
Terms for content variance
● frozen – unchanging bitstream
● keeping – unchanging content
● fixing – subject to correction
● rising – subject to active enhancement
● molting – unchanging essential mission
42
timo_w2s@flickr
sanmartin@flickr
Terms for object availability
● finite – ends at known date or event
● indefinite – no special commitment
● lifetime – as long as the provider exists
● subinfinite – beyond provider’s lifetime
Setting user expectations, part 2
43
. . .
Setting user expectations, part 3
A term for objects that grow in a certain way
● waxing – non-disruptive growth
Examples
● live sensor data feeds
● serial publications
44
stephenliveshere@flickr
Why should we believe you?
Terms specifying the nature of the provider
● name – of organization
● identifier – unique organizational identifier
● mission – is preservation in your mission?
● succession policy
45
Persistence in presence of versions
Terms for content referencing
● extraversioned – “10.2345/67, Version 4”
● intraversioned – “10.2345/67.V4”
● introversioned – “10.2345/6789”
46
The landing page debate
What if you could get either experience?
● plunging – for machine consumption
● landing – for human consumption
47
Naming policy
Forming identifier strings
NR – non-reassignment
OP – opaque identifiers
CC – check character added
48
YAMZ.net ARKs: 99152/h1
● ARKs for metadata terms
● Note: shared NAAN with reserved “shoulder”: /h1
(YAMZ = Yet Another Metadata Zoo)
● Vocabulary builder – term creation, sharing, and consensus
○ Big task: narrow down among many alternate terms/definitions
○ Rare ask: constant, immediate feedback from end metaloguers
○ Not a standard, but helps standards be better, faster, and cheaper
Reputation-based voting: example from Stackoverflow
Who needs a vocabulary builder?
Answer: everyone who needs controlled vocabulary terms
● There’s a flood of metadata standards and dialects
○ Per institution, per laboratory, per project
○ All or nothing buy-in to a big bundle of 144 terms and definitions, plus a grammar rules
● Large metadata investment, poor interoperation
○ See Metadata's Bitter Harvest, Library Journal, 2004
YAMZ.net vocabulary builder – online dictionary of draft terms and definitions
● Foundation for sharing, discussing, voting, and reaching consensus
Tower of Babel, P. Brueghel
The unofficial story of institutional
metadata adoption
Theory
“Interoperation? Solved – thanks to Dublin Core, PREMIS, schema.org, ….”
Reality
“Yeah, no. We have our own modifications.”
“We have no clout with and couldn’t wait for standards bodies.”
“If you promise not to share, maybe I could get you a PDF of our changes.”
Metadata? Yes!
✅ Dublin Core
✅ Darwin Core
✅ DataCite
✅ schema.org
…
The unofficial story of drafting
metadata standards
Theory: Senior experts share their wisdom with the world
Reality: Metadata design-by-committee
● Non-practitioners – workflow expertise may have peaked 10-20 years ago
● Little testing or evidence – lots of opinion, conjecture, ego
● Huge time sink – years to get to Version 1; more years to Version 2; …
● Out of date as soon as published – requirements and models have moved on
● Hard to reach consensus – committee agrees to agree when it’s exhausted
The Metadata Universe
Jenn Riley, IU
Domain dialects – similar but different
Example: Earth Science > Cryospheric (frozen water) Science
● 28 different definitions of “glacier”
● 8 different definitions of “puddle”
● 13 different definitions of “firn” (old snow)
● 10 different definitions of “frazil ice” (fine spicules of floating ice)
● 7 different definitions of “ogive” (bands of light and dark ice in a glacier)
● … and so on
Sound familiar? What about your domain?
YAMZ.net (Yet Another Metadata Zoo)
Pronunciation: “yams”
Not a standard, not an ontology
● YAMZ is a living dictionary of metadata terms and jargon
● Each term gets an ARK permalink (PID), a proposed nano-standard
○ some are upvoted and rise in search results, others are downvoted or ignored
● Reputation-based voting (like Stackoverflow) helps you choose
● All parts of metadata “speech”, all domains
SimonRobertson@flickr
Crowdsourced, but with voting and fences
vernacular
canonical
deprecated
3 classes
of term
← all terms are born here
← these don’t evolve …
← and they never go away
Each term gets a unique persistent id (ARK). Example:
term: iba
definition: other (origin language: Tagalog)
identifier: https://n2t.net/ark:/99152/h1193
YAMZ patterns for working groups
Import your
group’s 300
draftwords
and
definitions
bulk upload CSV file terms get ARK PIDs
watch and edit your terms reviewers comment
and vote
cherry pick final group terms group decides, looking at
comments, votes, etc.
publish terms linking each term is like a
nano-standard
back to yamz.net
YAMZ patterns for individual practitioners
Search for
terms
(words and
definitions)
find a term you love great – use and link to it
find a term you kind of love test it, comment, ask
author for changes
no workable term found instantly add own term
and watch for comments
find a word you love
“I want that word!”, so
enter a competing term
but an unworkable definition
Some discipline-specific subsets in YAMZ
Global Cryosphere Watch (GCW)
Citizen Science (Sloan)
DesignSafe (UTA)
Persistence statements (CDL, UCLA, TACC)
Space Science – Heliophysics (AGU, NASA, JPL)
Is it data? an identifier? a PID (persistent id)?
https://example.org/FFE4-2C6E-434C-345B-C5B0-T
Well, it’s for kids: The Super Mario Bros. Movie, 2023
https://doi.org/10.5072/FFE4-2C6E-434C-345B-C5B0-T
FFE4-2C6E-434C-345B-C5B0-T
https://doi.org/10.5240/FFE4-2C6E-434C-345B-C5B0-T
A valid DOI, so… trusted scholarly content?
Lesson: don’t judge an identifier by its looks
Tools
Documentation and Software:
arks.org/resources
● Minters and resolvers: Noid, arknoid, arklet, and arklet-Frick
● Other minters: counters, UUID, ULID, …
● Journal minter/resolver: OJS Plugin
● In-house library ARK system: ARKs Service UTScarborough
● Consider Suffix Passthrough in the style of N2T, EZID
��
Slide credit:
Dave Vieglais
The ARK community snapshot
All are registered to use ARKs – open, mainstream,
non-paywalled, decentralized persistent identifiers that
you can start creating in under 48 hours.
10 national libraries
145 universities
184 archives
90 museums
75 journals
1400+ organizations such as
● UNESCO
● The Frick Collection
● The National Gallery, London
● California Academy of Sciences
https://arks.org
● We welcome contributions and volunteers for our technical,
outreach, advisory, and NAAN record curation working groups
Discussion forums in English, French, and Spanish/Portuguese
● arks-forum@googlegroups.com
● arks-forum-fr@framalistes.org
● arks-forum-ib@googlegroups.com
ARK Alliance: how to participate
Thank you. Questions?
ARK Alliance
info@arks.org
arks.org
John Kunze, jakkbl@gmail.com
Donny Winston, donny@polyneme.xyz
Form to
request a
NAAN

DCMI ARK Tutorial 2024.10.20, slides and notes, 120 mins.pdf

  • 1.
    Archival Resource Key(ARK) – the other PID Did you know… … that ARK is a PID (Persistent IDentifier), along with DOI, ORCID, and ROR? … that ARK may be the fastest growing PID scheme you’ve never heard of? … that an average of 5 new ARK organizations are registered every week? … that the ARK scheme is more than 22 years old? … that there are more 8.2 billion ARKs in the world? … that, as of early 2024, ARK organizations include 10 national libraries, 145 universities, 184 archives, 90 museums, and 75 journals?
  • 2.
    Getting Started with ARK(Archival Resource Key) Persistent Identifiers John Kunze, Donny Winston ARK Alliance, 2024
  • 3.
    Why care aboutARK identifiers? ● Because robust web links are rare – the average URL lifetime is 100 days ● ARKs can be “persistent identifiers” (PIDs), which serve as permalinks ● “Ten persistent myths about persistent identifiers” https://n2t.net/ark:/13030/c7gb1xh09 The ARK (Archival Resource Key) identifier scheme was introduced in 2001.
  • 4.
    A labelled URLwith a globally unique identity inside it https://n2t.net/ark:/12345/fk1234 makes ARK actionable (the resolver) core globally unique identity (independent of web and hostname) ARK anatomy
  • 5.
    N2T.net is aglobal “name” to “thing” resolver Why not “ARKresolver.net” like most other PID schemes? Because ARKs are inclusive and resolvers generalize easily.
  • 6.
    ARK organizations 8.2 billionARKs created by 1400+ institutions – libraries, archives, museums, publishers, data centers, educators, etc. For example, University of California Berkeley Smithsonian National Museum National Library of France University of Chicago Musée du Louvre Family Search British Library Google Internet Archive Bodleian Libraries Berkeley Law Library Bibliothèque Mazarine New York Public Library French National Archives National Library of Austria Library and Archives Canada https://n2t.net/ark:/53355/cl010066723
  • 7.
    What are ARKsused for? ● genealogical records (8 billion FamilySearch) ● publisher content (100 million Portico) ● scientific datasets and records (22 million INIST) ● scanned books and texts (30 million Internet Archive) ● bibliographic records (15 million BnF main catalog) ● museum specimens (15 million Smithsonian Institution) ● public health documents (15 million UCSF IDL) ● historical documents (21 million CDL, 5 million BnF Gallica) ● historical authors and scholars (4 million SNAC) ● fine art museum collections (490,000 Louvre) ● vocabulary terms (30,000 Periodo, YAMZ)
  • 8.
    Inist.fr ARKs: 67375 Institute ofscientific and technical information French National Centre for Scientific Research
  • 9.
    LMEC ARKs: 76611 LeventhalMap & Education Center
  • 10.
  • 11.
    Protecting content andLinked Data Controlling costs and broken links ● ARKs as non-paywalled, durable web addresses ARKs don’t depend on fixed domain names ● decentralized – unlike doi.org, handle.net, purl.org, and w3id.org ● expressible as either HTTP URIs or as compact URIs ○ even if non-resolving, can still learn which org assigned the ARK
  • 12.
    Persistent identifier (PID)basics ● PIDs are just permalinks that may be recognizable (eg, has “ark:” in it, or doi.org/10….) ● No guarantees – PIDs and permalinks break by the thousands ○ With effort and luck, some of them are repaired ● One part of keeping PIDs persistent is indirection, with HTTP redirection ○ Publish your indirect identifiers at a server (resolver) that can redirect
  • 13.
    Web access –direct 2. URL 3. page 1. click 4. render page web browser content server user URL page USER CONTENT SERVER
  • 14.
    Web access –indirect 2. URL1 1. click 6. render page web browser server1 user 4. URL2 5. page 3. URL2 HTTP redirect Example URL 1: archive.example.org/photo123 → URL 2: photos.example.org/vault/123 A redirect is like forwarding a (request) message to a new address content server URL page USER CONTENT SERVER
  • 15.
    PIDs and resolvers– all similar PURL (Persistent URL) purl.org/dc/terms/creator Handle hdl.handle.net/4263537/4000 URN (Uniform Resource Name) urn.fi/urn:isbn:9514005058 DOI (Digital Object Identifier) doi.org/10.5334/dsj-2017-039 ARK (Archival Resource Key) n2t.net/ark:/53355/cl010066723 ⬆ ⬆ ⬆ Resolver Name Assigning Name Authority ≈
  • 16.
    What PID schemescan’t do No PID helps you against the major causes of broken links ● can’t prevent fire, war, flood, attack, bankruptcy, ... ● can’t prevent human or service provider error ● can’t guarantee your links ● can’t repair broken links for you ☹
  • 17.
    What PID schemescan do 1. PIDs & permalinks all “aspirational”, but PIDs recognizable in the wild 2. PIDs have aspirationally persistent resolvers (n2t.net, purl.org, doi.org, hdx.handl.net) in case your own domain name is at risk Can base your links on the scheme’s aspirationally persistent resolver so they might still be redirected if your own server goes away and … if the thing still exists, is legal, and someone willing to host it and fix links �
  • 18.
    Brief history ofPID schemes ● PURL – “URLs are fine if you redirect from purl.org” ● URN, DOI, and Handle – “all URLs and domain names are bad – except for ours – and we redirect” ● Tim Berners-Lee – “Cool URIs don't change” [cool URLs don’t break] ● ARK – “URLs are fine if managed well, but please tell us which of your URLs are meant for what kind of persistence” H
  • 19.
    PID schemes –pessimist view Helps with major causes of broken links? PURL Handle URN DOI ARK Prevents fire, war, flood, attack, bankruptcy, ... No No No No No Prevents human or service provider error No No No No No Guarantees your links, or fixes them for you No No No No No Best practices guard against copy/paste errors No No No No Yes Global resolver downtime less than 1 day per year No No No No Yes Identity independence from lost domain/server name No No Yes No Yes
  • 20.
    PID schemes –optimist view Features and costs PURL Handle URN DOI ARK Decentralized resolution No No No No Yes Inferenceable syntax (variants, containment) No No No No Yes Flexible metadata by design, including none No No No No Yes Inflections (...?info) and content negotiation No No No No Yes Nuanced persistence statements by design No No No No Yes Path extensions during resolution (suffix passthrough) Yes No Yes? No Yes Free, non-paywalled, in unlimited numbers Yes No Yes No Yes
  • 21.
    PID schemes –ecosystem view Identifiers in an Internet context PURL Handle URN DOI ARK Appear in Data Citation Index, HathiTrust, Wikipedia, Wikidata, Internet Archive, ORCID profiles Yes Yes Yes Yes Yes Major adoption by most academic publishers outside the global South No No No Yes No Free (subsidized) account and admin interface for one-off use, e.g., purl.org, zenodo.org, archive.org Yes? No? No? Yes Yes? IETF standard URI, validated by web browsers No No Yes No No Replicated global resolver architecture No Yes No No No
  • 22.
    Summary: ARK benefits ARKscan serve as persistent identifiers with metadata ● found in the Data Citation Index, HathiTrust, Wikipedia, Wikidata, Internet Archive, ORCID profiles, etc. In contrast to other id schemes, ARKs have ● no fees, no limits, no walled gardens (decentralized) ● very flexible metadata, including none ● can be assigned to anything digital, physical, or conceptual There is no conflict using ARKs and other identifiers at the same time
  • 23.
    Scientific specimens fromthe National Museum of Natural History http://n2t.net/ark:/65665/381440f27-3f74-4eb9-ac11-b4d633a7da3d Cultural artifacts from the National Museum of American History http://n2t.net/ark:/65665/ng49ca746b2-42dc-704b-e053-15f76fa0b4fa Sculpture from the Freer Gallery of Art & Arthur M. Sackler Gallery http://n2t.net/ark:/65665/ye3080ce305-a705-49cc-a70d-99aff8cb65da Photographs from the National Museum of African American History and Culture http://n2t.net/ark:/65665/fd5ad97cb86-caaf-4209-8fde-98d70f52f072 Paintings from the Smithsonian American Art Museum http://n2t.net/ark:/65665/vk7a466371d-0413-451f-bd76-ca0becc46f94 Example Smithsonian ARKs: 65665 Slide credit: Bess Missell
  • 24.
    https://example.org/ark:/12345/x54xz321/s3/f8.05v.tiff _________________/ __/ ___/______/____/_______/ | | | | | | | ARK Label | | Sub-parts Variants | | | Name Mapping Authority (NMA) | Assigned Name | Name Assigning Authority Number (NAAN) ARK anatomy: the NAAN (Name Assigning Authority Number)
  • 25.
    What’s in aNAAN record? A NAAN (Name Assigning Authority Number) is a 5-digit number ● Numbers are opaque, which is good for longevity. But what if you want to know what’s behind a NAAN? In a browser, try n2t.net/ark:67375 n2t.net/ark:76611 n2t.net/ark:13960 n2t.net/ark: n2t.net/pdb:1YOD
  • 26.
    Example NAAN record Recordfor the National Autonomous University of Mexico n2t.net/ark:46171 → ark:/46171: when: 2017.10.27 name: Universidad Nacional Autónoma de México target: http://www.morelia.unam.mx/campus
  • 27.
    The NAAN registry Theregistry is a plain text file: n2t.net/e/pub/naan_registry.txt Another example record: n2t.net/ark:12148 → ark:/12148: when: 2005-07-17 name: Bibliothèque nationale de France target: http://ark.bnf.fr Purpose of the NAAN (Name Assigning Authority Number): ● Resolution reference point ● Isolating assignment (autonomy, uniqueness, re-use)
  • 28.
    Obtain a NAANfor your organization Fill out this form (linked, in case you forget, from the arks.org homepage): n2t.net/e/naan_request
  • 29.
    Opacity pros andcons Can be generated (“minted”) from any source: ● Counter, Noid, UUID, ULID, even content digest ● Anything unique – but best to keep it short ● With Noid (Nice Opaque Identifiers), you get check characters Opaque ids are a pain for humans ● Difficult to enter correctly (no clues to correct spelling) ● No clues for humans to check for transcription errors
  • 30.
    ARK anatomy: suffixes https://example.org/ark:/12345/x54xz321/s3/f8.05v.tiff _________________/__/ ___/ ______/____/_______/ | | | | | | | ARK Label | | Sub-parts Variants | | | Name Mapping Authority (NMA) | Assigned Name | Name Assigning Authority Number (NAAN)
  • 31.
    Linked Data implicitin ARK syntax ARK suffix syntax implies related URIs ● Slashes “/” for hierarchy (ark:.../A/B/C implies existence of A/B and A) ● Periods “.” for variant forms (ark:.../foo.jpg is a variant of ark:.../foo.pdf) ● Inflections “?info” modifies an ARK to request metadata
  • 32.
    Suffixes to identifya million possible image regions with just one ARK Suffixes Base Name
  • 33.
    French National LibraryARKs: 12148 https://gallica.bnf.fr/iiif/ark:/12148/btv1b8449691v/f29/2131,4016,1467,948/full/0/default.jpg region coordinates page full quality file format
  • 34.
    Object life stages ARKsare flexible – can throw away what’s not declared ARK metadata is flexible – from none to anything you want ● Planning phase, moment of birth, first analysis, ● Creating first draft metadata, later normalized metadata, ● Pre-release feedback and insights based on limited sharing, ● Corrections, abandonment, ● … plus archiving, public release, revision, enhancement, etc.
  • 35.
    Finding information aboutan ARK An ARK should lead to 3 things: 1. the identified thing 2. metadata about it (very flexible, minimally who|what|when) 3. a nuanced persistence statement – setting expectations To ask for metadata, append the query string “?info”: https://example.org/ark:/12345/x54xz321?info
  • 36.
    ARK metadata ● AnARK in a URL returns access to the thing it identifies ● To get access to its metadata, it should support ARK + ‘?info’, e.g., https://n2t.net/ark:/81431/p3s39k?info → who: University of Pennsylvania Libraries what: Walnut Street Theatre. Philadelphia, October 9, 1869. when: 1869 where: ark:/81431/p3s39k (currently https://ezid.cdlib.org/id/ark:/81431/p3s39k) how: (:unav) id created: 2017.12.06_08:42:02 id updated: 2017.12.21_11:16:02 persistence: (:unav) who | what | when | where
  • 37.
    ARK metadata flexibility Example:thousands of ARKs return DataCite DOI metadata http://legacy-n2t.n2t.net/ark:/81986/caida.data.100004?info → datacite: <?xml version="1.0"?> <resource xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4/metadata.xsd"><identifier identifierType="ARK">81986/caida.data.100004</identifier><creators> <creator><creatorName>University of California San Diego Center for Applied Internet Data Analysis (UCSD CAIDA) </creatorName></creator></creators><titles><title xml:lang="eng">The IPv4 Routed /24 Topology Dataset</title></titles> <publisher>University of California San Diego Center for Applied Internet Data Analysis (UCSD CAIDA) </publisher><publicationYear>2007</publicationYear><resourceType resourceTypeGeneral="Dataset">Active measurements of Internet ARKs + DOI metadata
  • 38.
    ARK metadata /inflections ARK “inflection”: alter the ending to alter the request ● Append ‘?info’ to request metadata (used to be ‘?’ and ‘??’) ● Earlier, append ‘?’ for metadata or ‘??’ for more (harder) ● No conflict with “content negotiation” (harder) The inflection response format is not fixed, but it should be human- and machine-readable (JSON, YAML) do does done doing …
  • 39.
    Permanence is notbinary Persistence is not “on” or “off”. It is nuanced. ● Preservation often demands change, such as ○ larger thumbnail image sizes, better OCR algorithms ○ 3-year-old “long term stable” Linux release gets security patches ○ Typos in cover pages get corrected ● And what about rapidly updated data (earth observation sensor files that grow every 6 seconds, databases that are annotated regularly)? Valuable objects tend to be complex, human-managed clusters
  • 40.
    What do youmean by persistence? Persistence statements: describing digital stickiness John Kunze, Scout Calvert, Jeremy DeBarry, Matthew Hanlon, Greg Janée, Sandra Sweat 22 May 2017 Abstract In this paper we present a draft vocabulary for making “persistence statements." These are not arcane notions, but simple tools for pragmatically addressing the concern that anyone feels upon experiencing a broken web link. Scholars increasingly use scientific and cultural assets in digital form, but choosing which among many objects to cite for the long term can be difficult. There are few well-defined terms to describe the various kinds and qualities of persistence that object repositories and identifier resolvers do or don’t provide. Given an object’s identifier, one should be able to query a provider to retrieve human- and machine- readable information to help judge the level of service to expect and help gauge whether the identifier is durable enough, as a sort of long-term bet, to include in a citation. The vocabulary should enable providers to articulate persistence policies and set user expectations.
  • 41.
    Setting user expectations,part 1 Terms for content variance ● frozen – unchanging bitstream ● keeping – unchanging content ● fixing – subject to correction ● rising – subject to active enhancement ● molting – unchanging essential mission 42 timo_w2s@flickr sanmartin@flickr
  • 42.
    Terms for objectavailability ● finite – ends at known date or event ● indefinite – no special commitment ● lifetime – as long as the provider exists ● subinfinite – beyond provider’s lifetime Setting user expectations, part 2 43 . . .
  • 43.
    Setting user expectations,part 3 A term for objects that grow in a certain way ● waxing – non-disruptive growth Examples ● live sensor data feeds ● serial publications 44 stephenliveshere@flickr
  • 44.
    Why should webelieve you? Terms specifying the nature of the provider ● name – of organization ● identifier – unique organizational identifier ● mission – is preservation in your mission? ● succession policy 45
  • 45.
    Persistence in presenceof versions Terms for content referencing ● extraversioned – “10.2345/67, Version 4” ● intraversioned – “10.2345/67.V4” ● introversioned – “10.2345/6789” 46
  • 46.
    The landing pagedebate What if you could get either experience? ● plunging – for machine consumption ● landing – for human consumption 47
  • 47.
    Naming policy Forming identifierstrings NR – non-reassignment OP – opaque identifiers CC – check character added 48
  • 48.
    YAMZ.net ARKs: 99152/h1 ●ARKs for metadata terms ● Note: shared NAAN with reserved “shoulder”: /h1 (YAMZ = Yet Another Metadata Zoo) ● Vocabulary builder – term creation, sharing, and consensus ○ Big task: narrow down among many alternate terms/definitions ○ Rare ask: constant, immediate feedback from end metaloguers ○ Not a standard, but helps standards be better, faster, and cheaper
  • 50.
  • 51.
    Who needs avocabulary builder? Answer: everyone who needs controlled vocabulary terms ● There’s a flood of metadata standards and dialects ○ Per institution, per laboratory, per project ○ All or nothing buy-in to a big bundle of 144 terms and definitions, plus a grammar rules ● Large metadata investment, poor interoperation ○ See Metadata's Bitter Harvest, Library Journal, 2004 YAMZ.net vocabulary builder – online dictionary of draft terms and definitions ● Foundation for sharing, discussing, voting, and reaching consensus Tower of Babel, P. Brueghel
  • 52.
    The unofficial storyof institutional metadata adoption Theory “Interoperation? Solved – thanks to Dublin Core, PREMIS, schema.org, ….” Reality “Yeah, no. We have our own modifications.” “We have no clout with and couldn’t wait for standards bodies.” “If you promise not to share, maybe I could get you a PDF of our changes.” Metadata? Yes! ✅ Dublin Core ✅ Darwin Core ✅ DataCite ✅ schema.org …
  • 53.
    The unofficial storyof drafting metadata standards Theory: Senior experts share their wisdom with the world Reality: Metadata design-by-committee ● Non-practitioners – workflow expertise may have peaked 10-20 years ago ● Little testing or evidence – lots of opinion, conjecture, ego ● Huge time sink – years to get to Version 1; more years to Version 2; … ● Out of date as soon as published – requirements and models have moved on ● Hard to reach consensus – committee agrees to agree when it’s exhausted
  • 54.
  • 55.
    Domain dialects –similar but different Example: Earth Science > Cryospheric (frozen water) Science ● 28 different definitions of “glacier” ● 8 different definitions of “puddle” ● 13 different definitions of “firn” (old snow) ● 10 different definitions of “frazil ice” (fine spicules of floating ice) ● 7 different definitions of “ogive” (bands of light and dark ice in a glacier) ● … and so on Sound familiar? What about your domain?
  • 56.
    YAMZ.net (Yet AnotherMetadata Zoo) Pronunciation: “yams” Not a standard, not an ontology ● YAMZ is a living dictionary of metadata terms and jargon ● Each term gets an ARK permalink (PID), a proposed nano-standard ○ some are upvoted and rise in search results, others are downvoted or ignored ● Reputation-based voting (like Stackoverflow) helps you choose ● All parts of metadata “speech”, all domains SimonRobertson@flickr
  • 57.
    Crowdsourced, but withvoting and fences vernacular canonical deprecated 3 classes of term ← all terms are born here ← these don’t evolve … ← and they never go away Each term gets a unique persistent id (ARK). Example: term: iba definition: other (origin language: Tagalog) identifier: https://n2t.net/ark:/99152/h1193
  • 58.
    YAMZ patterns forworking groups Import your group’s 300 draftwords and definitions bulk upload CSV file terms get ARK PIDs watch and edit your terms reviewers comment and vote cherry pick final group terms group decides, looking at comments, votes, etc. publish terms linking each term is like a nano-standard back to yamz.net
  • 59.
    YAMZ patterns forindividual practitioners Search for terms (words and definitions) find a term you love great – use and link to it find a term you kind of love test it, comment, ask author for changes no workable term found instantly add own term and watch for comments find a word you love “I want that word!”, so enter a competing term but an unworkable definition
  • 60.
    Some discipline-specific subsetsin YAMZ Global Cryosphere Watch (GCW) Citizen Science (Sloan) DesignSafe (UTA) Persistence statements (CDL, UCLA, TACC) Space Science – Heliophysics (AGU, NASA, JPL)
  • 61.
    Is it data?an identifier? a PID (persistent id)? https://example.org/FFE4-2C6E-434C-345B-C5B0-T Well, it’s for kids: The Super Mario Bros. Movie, 2023 https://doi.org/10.5072/FFE4-2C6E-434C-345B-C5B0-T FFE4-2C6E-434C-345B-C5B0-T https://doi.org/10.5240/FFE4-2C6E-434C-345B-C5B0-T A valid DOI, so… trusted scholarly content? Lesson: don’t judge an identifier by its looks
  • 62.
    Tools Documentation and Software: arks.org/resources ●Minters and resolvers: Noid, arknoid, arklet, and arklet-Frick ● Other minters: counters, UUID, ULID, … ● Journal minter/resolver: OJS Plugin ● In-house library ARK system: ARKs Service UTScarborough ● Consider Suffix Passthrough in the style of N2T, EZID ��
  • 64.
  • 65.
    The ARK communitysnapshot All are registered to use ARKs – open, mainstream, non-paywalled, decentralized persistent identifiers that you can start creating in under 48 hours. 10 national libraries 145 universities 184 archives 90 museums 75 journals 1400+ organizations such as ● UNESCO ● The Frick Collection ● The National Gallery, London ● California Academy of Sciences
  • 66.
    https://arks.org ● We welcomecontributions and volunteers for our technical, outreach, advisory, and NAAN record curation working groups Discussion forums in English, French, and Spanish/Portuguese ● arks-forum@googlegroups.com ● arks-forum-fr@framalistes.org ● arks-forum-ib@googlegroups.com ARK Alliance: how to participate
  • 67.
    Thank you. Questions? ARKAlliance info@arks.org arks.org John Kunze, jakkbl@gmail.com Donny Winston, donny@polyneme.xyz Form to request a NAAN