The Global Biodiversity Information Facility (GBIF) provides an open and global e-infrastructure for publishing species occurrence information. When integrating specimen collection data from Natural History History museums, the catalog numbers used as locally unique specimen identifiers are no longer sufficiently unique on this global platform. A system of institute prefixes (institute code + collection code + catalog number) has been tested with various success. The Norwegian participant node of GBIF hosted by the Natural History Museum at the University of Oslo have developed a set of new, globally unique and machine readable (resolvable) identifiers. Standard tools are used for generation of Universally Unique Identifiers (UUIDs). The UUIDs are prefixed by a PURL to provide a persistent redirection (HTTP 303 "see other") to a resolver established at http://purl.org/nhmuio/[UUID]. Using content negotiation the user or machine can access descriptive information as html, comma-separated-values (csv), tab-delimited text-files, n3/turtle RDF data, and JSON. These formats can also be accessed directly (without calling content negotiation) using the file extension "http://purl.org/nhmuio/[UUID].[html|csv|txt|n3|json]".
--
See also: doi.org/10.13140/2.1.4516.9606
https://dagendresen.wordpress.com/2016/07/13/persistent-identifiers-at-gbif-no/
--
NeIC - Oslo, 26th August 2015
Workshop Collaboration on e-Infrastructures for Nordic Biodiversity Informatics
https://wiki.neic.no/wiki/Workshop_Collaboration_on_e-Infrastructures_for_Nordic_Biodiversity_Informatics
https://goo.gl/n02KJ6
4. When
is
the
iden.fier
“good
enough”?
Unique
and
persistent
-‐
within
a
given
context.
“The
common
experience
is
that
an
idenEfier
is
created
within
a
system
or
within
a
context,
and
that
at
a
later
date
it
needs
to
be
used
in
another
or
larger
context”
(Karen
Coyle
2006).
Expanding
context:
1. Within
one
museum
collec+on
(catalog
number).
2. Within
a
network
between
museum
collec+ons
(collec+on
code
+
catalogue
number).
3. Within
biodiversity
informa.on
network
(ins+tu+on
code
+
collec+on/dataset
code
+
catalogue
number).
4. At
the
Internet
(e.g.
hep
URI,
DOI,
LSID,
etc…)
5. …
larger
contexts
are
possible
to
imagine
in
the
future!!
4
6. Iden.fiers
for
museum
collec.ons
The
longevity
of
museums
lead
to:
“The
need
to
use
iden3fiers
from
our
past
in
the
current
highly-‐
networked
digital
systems”
(Karen
Coyle
2006
[talking
about
libraries]).
Specify
a
namespace
for
the
iden+fiers?
• URI
–
uniform
resource
iden+fier
(unique
in
the
context
of
the
web).
• URN
–
uniform
resource
name
(name
not
+ed
to
loca+on).
• URL
–
uniform
resource
locator
(network
loca+on
as
iden+fier).
• PURL
–
persistent
URL
(commitment
to
service
longevity).
Something
else…?
• DOI
–
digital
object
iden+fier
• ARK
–
archival
resource
key
• UUID
–
universal
unique
iden+fier
6
8. Photo:
Smithsonian
Na+onal
Museum
of
Natural
History,
USNM-‐445024-‐Eutoxeres-‐aquila
PURL
Reuse
exis3ng
iden3fiers
8
9. • Globally
unique
• Scalability,
number
of
IDs
• Community
acceptance
• Long-‐term
life-‐cycle
• Resolvable,
resolu+on
service(s)
• Cost
per
iden+fier
• People-‐friendly
or
machine-‐friendly
• Solu+on
for
the
genera+on
of
new
IDs
– Central
genera+on,
PID
issuer
– Distributed
genera.on
at
source
9
10. • A
UUID
is
a
16-‐octet
(128-‐bit)
36-‐chars
number.
• Example:
41d9cbb4-‐4590-‐4265-‐8079-‐ca44d46d27c3
• The
probability
of
one
duplicate
would
be
about
50%
if
every
person
on
earth
create
600
million
UUIDs.
• Allows
for
easy
genera.on
at
source
in
a
distributed
network.
10
11. hep
–
PURL
–
UUID
hep://purl.org/nhmuio/id/41d9cbb4-‐4590-‐4265-‐8079-‐ca44d46d27c3
11
12. Iden+fier
Resolver
Loca+on
Specimen
The
resolver
is
a
system
to
resolve
loca+ons
from
iden+fiers,
enabling
retrieval
even
when
the
loca+on
changes.
hep://purl.org/nhmuio/id/[UUID]
hep://gbif.no/resolver/[UUID]
No-‐informaEon
object
(hMp
redirect)
hMp
303
redirect
16. UUID
QR
codes
for
museum
objects
at
NHM-‐UiO
provides:
• Machine-‐readable
iden.fiers
(using
a
simple
smart
phone
-‐
or
a
barcode
reader)
• Allows
for
new
and
efficient
workflows
for
collec+on
management.
• Deployment
for
stable
iden.fiers
appropriate
for
data-‐basing.
16
19. 19
Some
key
challenges
for
the
group
work
• Many
of
the
original
source
datasets
indexed
by
GBIF
are
regularly
updated
and
re-‐indexed
by
the
GBIF
portal.
Without
stable
and
persistent
iden+fiers
informa+on
on
the
same
herbarium
specimen
(or
species
observa+on)
are
some+mes
included
more
than
one
.me,
leading
to
duplicated
informa.on
-‐
duplicated
in
the
sense
of
more
than
one
(unlinked)
data
record
for
the
same
Real
World
en+ty.
• Without
stable
and
persistent
iden+fiers
for
herbarium
specimens
(and
species
observa+ons)
it
is
difficult
to
link
the
same
data
record
indexed
at
different
re-‐indexing
cycles
of
the
GBIF
portal.
When
a
data
record
previously
indexed
is
not
re-‐iden+fied
in
a
new
version
of
a
given
dataset,
then
the
record
is
deleted
from
the
portal,
and
the
link
to
previous
versions
of
this
data
record
is
lost.
• A
composite
key
iden.fier
(such
as
the
Darwin
Core
triplet)
based
on
a
combina.on
the
metadata
aIributes
for
ins+tute
code
(dwc:ins+tuteCode),
collec+on
code
(dwc:collec+onCode),
and
the
local
specimen
iden+fier
(dwc:catalogNumber)
is
generally
used
as
the
specimen
iden+fier
in
GBIF.
However,
all
three
metadata
aeributes
can
(and
do)
some+mes
change.
• What
could
be
a
best
prac+ce
guideline
for
iden.fier
resolu.on.
Is
it
useful
to
define
and
agree
on
a
(set
of)
common
and
well-‐defined
response
format?
Is
it
useful
to
provide
recommenda+ons
for
a
set
of
metadata
profiles
with
a
clear
set
of
defined
metadata
aeributes?
Or
would
more
general
principles
and
more
open
recommenda+ons
be
more
likely
to
stand
the
test
of
+me
and
remain
relevant
with
the
emergence
of
new
informa+on
infrastructure
technologies?
• Challenges,
pros
and
cons
of
reusing
object
iden.fiers
and
metadata
aIribute
terms
declared
by
others
without
full
control
of
how
these
objects
and
terms
are
maintained.
Objects
and
concepts
declared
for
a
par+cular
purpose
will
oren
not
match
exactly
the
needs
suitable
for
another
purpose.
How
to
op+mally
reuse
each
others
OWL
ontologies,
metadata
vocabularies
and
data
object
models?
• Iden.fiers
iden.fying
the
Real
World
physical
objects,
the
en++es
that
the
collec+on
curators
and
users
of
the
informa+on
care
about.
Or
should
the
iden+fier
be
assigned
to
database
records?
Real
World
en++es
will
not
have
a
signature
byte-‐sequence
and
will
rely
of
interpreta+on
of
when
an
object
is
considered
to
be
the
same
thing.
20. gbif-‐drir@nhm.uio.no
Dag
Endresen
dag.endresen@nhm.uio.no
Chris+an
Svindseth
chris+an.svindseth@nhm.uio.no
Gary Larson, 1987
20
Workshop
in Oslo
26th Aug