Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Persistent 
Iden+fiers, 
Herbarium-­‐workshop 
at 
Kongsvold 
Fjellstue, 
September 
1-­‐4, 
2014. 
Dag 
Endresen, 
NHM-­‐...
The 
purpose 
of 
iden.fiers 
…is 
to 
name 
things, 
making 
it 
possible 
to 
refer 
to 
them. 
2
Name 
ambiguity: 
George 
Many 
things 
are 
named 
George 
3
What 
is 
an 
iden.fier: 
“Each 
iden3fier 
refers 
to 
one 
and 
only 
one 
thing” 
(Coyle 
2006). 
“An 
associa(on 
betw...
5
When 
is 
the 
iden.fier 
“good 
enough”? 
Unique 
and 
persistent 
-­‐ 
within 
a 
given 
context. 
“The 
common 
experie...
Expanding 
context 
7
Iden+fy 
the 
thing 
that 
you 
care 
about 
• The 
specimen 
itself 
(the 
physical 
en+ty) 
• Image 
of 
the 
specimen 
...
Record-­‐level 
Terms 
dcterms:type 
| 
dcterms:modified 
| 
dcterms:language 
| 
dcterms:rights 
| 
dcterms:rightsHolder ...
Term 
name: 
occurrenceID 
Iden+fier: 
hbp://rs.tdwg.org/dwc/terms/occurrenceID 
Class: 
hbp://rs.tdwg.org/dwc/terms/Occur...
Iden.fiers 
for 
museum 
collec.ons 
The 
longevity 
of 
museums 
lead 
to: 
“The 
need 
to 
use 
iden(fiers 
from 
our 
p...
• Persistent 
Iden+fier 
(PID) 
• Globally 
Unique 
Iden+fier 
(GUID) 
• Universal 
Resource 
Iden+fier 
(URI) 
• Persiste...
Reuse 
exis(ng 
iden(fiers 
PURL 
Photo: 
Smithsonian 
Na+onal 
Museum 
of 
Natural 
History, 
USNM-­‐445024-­‐Eutoxeres-­...
hbp://purl.org/nhmuio/id/41d9cbb4-­‐4590-­‐4265-­‐8079-­‐ca44d46d27c3 
Illustra+on 
by 
Miroslav 
Šašek 
(1963) 
Reuse 
id...
• Globally 
unique 
• Scalability, 
number 
of 
IDs 
• Community 
acceptance 
• Long-­‐term 
life-­‐cycle 
• Resolvable, 
...
• A 
UUID 
is 
a 
16-­‐octet 
(128-­‐bit) 
36-­‐chars 
number. 
• Example: 
C37E3F9B-­‐BCAF-­‐4479-­‐8EB7-­‐3346A2DB2373 
...
Iden+fier 
Resolver 
Specimen 
Loca+on 
The 
resolver 
is 
a 
system 
to 
resolve 
loca+ons 
from 
iden+fiers, 
enabling 
...
PURL 
technology 
provides 
a 
robust 
resolu+on 
service 
ready 
for 
the 
future 
-­‐ 
and 
a 
stable 
solu+on 
that 
is...
hbp 
– 
PURL 
– 
UUID 
hbp://purl.org/nhmuio/id/41d9cbb4-­‐4590-­‐4265-­‐8079-­‐ca44d46d27c3 
19
hbp://purl.org/nhmuio/id/UUID 
à 
hbp://gbif.no/resolver/UUID 
hbp://purl.org/gbifnorway/id/UUID 
à 
hbp://gbif.no/resol...
Including 
machine 
readable 
formats 
21
Catalog 
number: 
O-­‐L-­‐000014 
hbp://purl.org/nhmuio/id/41d9cbb4-­‐4590-­‐4265-­‐8079-­‐ca44d46d27c3 
22
Machine 
readable 
labels 
23
• Quick 
Response 
Code 
(QR 
code). 
• A 
type 
of 
matrix 
barcode 
(or 
two-­‐ 
dimensional 
code). 
• Popular 
due 
to...
hbp://purl.org/nhmuio/id/d91e8253-­‐0ac1-­‐4681-­‐ac69-­‐e50070af86a2 
25
UUID 
QR 
codes 
for 
museum 
objects 
at 
NHM-­‐UiO 
provides: 
• Machine-­‐readable 
iden.fiers 
(using 
a 
simple 
smar...
Efficient 
workflow 
rou+nes 
27
hbp://gbif.no/dugnad/ 
28
• Peer 
review 
op+on 
for 
biodiversity 
data 
sets. 
• Authors 
get 
scien+fic 
credit 
for 
data 
publica+on. 
• Mee+ng...
Why 
publish 
your 
data 
• Citable 
publica+on 
• Establish 
scien+fic 
priority 
• Increase 
collabora+on 
• Link 
data ...
Globally 
unique 
iden+fiers 
are 
one 
of 
the 
three 
core 
components 
in 
the 
TDWG 
technical 
architecture. 
31
Status 
27. 
August 
2014 
GBIF 
enables 
free 
and 
open 
access 
to 
biodiversity 
data 
online. 
We 
are 
an 
interna+o...
GBIF 
provides 
a 
data 
discovery 
system 
that 
is 
dependent 
on 
resolvable 
stable 
iden3fiers 
for 
efficient 
func3...
Dag 
Endresen 
dag.endresen@nhm.uio.no 
Herbarium-­‐workshop 
at 
Kongsvold 
{ellstue, 
September 
1 
to 
4, 
2014 
Gary L...
Slide 
1: 
Image 
source: 
TU 
GRAZ, 
Austria, 
hbp://campusonline.tugraz.at/organisa+on/campusonline. 
Fair 
use 
ra+onal...
Upcoming SlideShare
Loading in …5
×

2014-09 Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014

1,543 views

Published on

Implementation of persistent and globally unique identifiers for specimens held in natural history collections worldwide will open up new opportunities for referring to these physical resources in an interlinked digital context such as the Internet. Here, we will describe the approach for persistent identification of collection specimens developed and implemented at the Natural History Museum in Oslo (NHM-UiO) by the the Norwegian participant node to the Global Biodiversity Information Facility (GBIF-Norway). The Norwegian university museums are invited to use our resolver service at "http://purl.org/gbifnorway/id/<uuid>" when publishing biodiversity data to GBIF. All occurrence records published through GBIF-Norway, with appropriate PURL-UUID identifiers mapped to the Darwin Core occurrenceID, will automatically be added to our resolver service and kept updated.

Published in: Technology
  • Be the first to comment

2014-09 Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014

  1. 1. Persistent Iden+fiers, Herbarium-­‐workshop at Kongsvold Fjellstue, September 1-­‐4, 2014. Dag Endresen, NHM-­‐UiO, GBIF-­‐Norway
  2. 2. The purpose of iden.fiers …is to name things, making it possible to refer to them. 2
  3. 3. Name ambiguity: George Many things are named George 3
  4. 4. What is an iden.fier: “Each iden3fier refers to one and only one thing” (Coyle 2006). “An associa(on between a string and a thing” (Kunze 2003). “A stated associa(on between a symbol and a thing; that the symbol may be used to unambiguously refer to the thing within a given context” (Campbell 2007). 4
  5. 5. 5
  6. 6. When is the iden.fier “good enough”? Unique and persistent -­‐ within a given context. “The common experience is that an iden3fier is created within a system or within a context, and that at a later date it needs to be used in another or larger context” (Coyle 2006). Expanding context: • Within one museum collec+on (catalog number). • Within a network between museum collec+ons (collec+on code + catalogue number). • Within biodiversity informa.on network (ins+tu+on code + collec+on/dataset code + catalogue number). • At the Internet (e.g. hbp URI, DOI, LSID, etc…) • … larger contexts are possible to imagine in the future!! 6
  7. 7. Expanding context 7
  8. 8. Iden+fy the thing that you care about • The specimen itself (the physical en+ty) • Image of the specimen • Descrip+on of the specimen • Loca+on where the specimen was captured • The occurrence event when the specimen was captured • … 8
  9. 9. Record-­‐level Terms dcterms:type | dcterms:modified | dcterms:language | dcterms:rights | dcterms:rightsHolder | dcterms:accessRights | dcterms:bibliographicCita+on | dcterms:references | ins.tu.onID | collec.onID | datasetID | ins.tu.onCode | collec.onCode | datasetName | ownerIns+tu+onCode | basisOfRecord | informa+onWithheld | dataGeneraliza+ons | dynamicProper+es Occurrence occurrenceID | catalogNumber | occurrenceRemarks | recordNumber | recordedBy | individualID | individualCount | sex | lifeStage | reproduc+veCondi+on | behavior | establishmentMeans | occurrenceStatus | prepara+ons | disposi+on | otherCatalogNumbers | previousIden+fica+ons | associatedMedia | associatedReferences | associatedOccurrences | associatedSequences | associatedTaxa MaterialSample materialSampleID Event eventID | samplingProtocol | samplingEffort | eventDate | eventTime | startDayOfYear | endDayOfYear | year | month | day | verba+mEventDate | habitat | fieldNumber | fieldNotes | eventRemarks dcterms:Loca.on loca.onID | higherGeographyID | higherGeography | con+nent | waterBody | islandGroup | island | country | countryCode | stateProvince | county | municipality | locality | verba+mLocality | verba+mEleva+on | minimumEleva+onInMeters | maximumEleva+onInMeters | verba+mDepth | minimumDepthInMeters | maximumDepthInMeters | minimumDistanceAboveSurfaceInMeters | maximumDistanceAboveSurfaceInMeters | loca+onAccordingTo | loca+onRemarks | verba+mCoordinates | verba+mLa+tude | verba+mLongitude | verba+mCoordinateSystem | verba+mSRS | decimalLa+tude | decimalLongitude | geode+cDatum | coordinateUncertaintyInMeters | coordinatePrecision | pointRadiusSpa+alFit | footprintWKT | footprintSRS | footprintSpa+alFit | georeferencedBy | georeferencedDate | georeferenceProtocol | georeferenceSources | georeferenceVerifica+onStatus | georeferenceRemarks GeologicalContext geologicalContextID | earliestEonOrLowestEonothem | latestEonOrHighestEonothem | earliestEraOrLowestErathem | latestEraOrHighestErathem | earliestPeriodOrLowestSystem | latestPeriodOrHighestSystem | earliestEpochOrLowestSeries | latestEpochOrHighestSeries | earliestAgeOrLowestStage | latestAgeOrHighestStage | lowestBiostra+graphicZone | highestBiostra+graphicZone | lithostra+graphicTerms | group | forma+on | member | bed Iden.fica.on iden.fica.onID | iden+fiedBy | dateIden+fied | iden+fica+onReferences | iden+fica+onVerifica+onStatus | iden+fica+onRemarks | iden+fica+onQualifier | typeStatus Taxon taxonID | scien.ficNameID | acceptedNameUsageID | parentNameUsageID | originalNameUsageID | nameAccordingToID | namePublishedInID | taxonConceptID | scien+ficName | acceptedNameUsage | parentNameUsage | originalNameUsage | nameAccordingTo | namePublishedIn | namePublishedInYear | higherClassifica+on | kingdom | phylum | class | order | family | genus | subgenus | specificEpithet | infraspecificEpithet | taxonRank | verba+mTaxonRank | scien+ficNameAuthorship | vernacularName | nomenclaturalCode | taxonomicStatus | nomenclaturalStatus | taxonRemarks ResourceRela.onship (Auxiliary Terms) resourceRela.onshipID | resourceID | relatedResourceID | rela+onshipOfResource | rela+onshipAccordingTo | rela+onshipEstablishedDate | rela+onshipRemarks MeasurementOrFact (Auxiliary Terms) measurementID | measurementType | measurementValue | measurementAccuracy | measurementUnit | measurementDeterminedDate | measurementDeterminedBy | measurementMethod | measurementRemarks 9
  10. 10. Term name: occurrenceID Iden+fier: hbp://rs.tdwg.org/dwc/terms/occurrenceID Class: hbp://rs.tdwg.org/dwc/terms/Occurrence Defini+on: An iden+fier for the Occurrence (as opposed to a par+cular digital record of the occurrence). In the absence of a persistent global unique iden.fier, construct one from a combina+on of iden+fiers in the record that will most closely make the occurrenceID globally unique. Comment: For a specimen in the absence of a bona fide global unique iden+fier, for example, use the form: "urn:catalog:[ins.tu.onCode]: [collec.onCode]:[catalogNumber]". Examples: "urn:lsid:nhm.ku.edu:Herps:32", "urn:catalog:FMNH:Mammal:145732". For discussion see hbp://code.google.com/p/darwincore/wiki/ Occurrence 10
  11. 11. Iden.fiers for museum collec.ons The longevity of museums lead to: “The need to use iden(fiers from our past in the current highly-­‐ networked digital systems” (Coyle 2006 [talking about libraries]). Specify a namespace for the iden+fiers? • URI – uniform resource iden+fier (unique in the context of the web). • URN – uniform resource name (name not +ed to loca+on). • URL – uniform resource locator (network loca+on as iden+fier). • PURL – persistent URL (commitment to service longevity). Something else…? • DOI – digital object iden+fier • ARK – archival resource key • UUID – universal unique iden+fier 11
  12. 12. • Persistent Iden+fier (PID) • Globally Unique Iden+fier (GUID) • Universal Resource Iden+fier (URI) • Persistent Uniform Resource Locator (PURL) • Life Science Iden+fier (LSID) • Digital Object Iden+fier (DOI) • Handle system (Handle) • Archival Resource Key (ARK, EZID) • Universally Unique Iden+fier (UUID) • … 12
  13. 13. Reuse exis(ng iden(fiers PURL Photo: Smithsonian Na+onal Museum of Natural History, USNM-­‐445024-­‐Eutoxeres-­‐aquila 13
  14. 14. hbp://purl.org/nhmuio/id/41d9cbb4-­‐4590-­‐4265-­‐8079-­‐ca44d46d27c3 Illustra+on by Miroslav Šašek (1963) Reuse iden(fiers 14
  15. 15. • Globally unique • Scalability, number of IDs • Community acceptance • Long-­‐term life-­‐cycle • Resolvable, resolu+on service(s) • Cost per iden+fier • People-­‐friendly or machine-­‐friendly • Solu+on for the genera+on of new IDs – Central genera+on, PID issuer – Distributed genera.on at source 15
  16. 16. • A UUID is a 16-­‐octet (128-­‐bit) 36-­‐chars number. • Example: C37E3F9B-­‐BCAF-­‐4479-­‐8EB7-­‐3346A2DB2373 • The probability of one duplicate would be about 50% if every person on earth create 600 million UUIDs. • Allows for easy genera.on at source in a distributed network. 16
  17. 17. Iden+fier Resolver Specimen Loca+on The resolver is a system to resolve loca+ons from iden+fiers, enabling retrieval even when the loca+on changes. 17
  18. 18. PURL technology provides a robust resolu+on service ready for the future -­‐ and a stable solu+on that is working well right now. PURL for the NHM-­‐resolver: hbp://purl.org/nhmuio/id/[PID] The NHM-­‐PURL redirects here: hbp://gbif.no/resolver/[PID] Could with few modifica+ons redirect e.g. here: hCp://gbif.org/resolver/[PID] 18
  19. 19. hbp – PURL – UUID hbp://purl.org/nhmuio/id/41d9cbb4-­‐4590-­‐4265-­‐8079-­‐ca44d46d27c3 19
  20. 20. hbp://purl.org/nhmuio/id/UUID à hbp://gbif.no/resolver/UUID hbp://purl.org/gbifnorway/id/UUID à hbp://gbif.no/resolver/UUID 20
  21. 21. Including machine readable formats 21
  22. 22. Catalog number: O-­‐L-­‐000014 hbp://purl.org/nhmuio/id/41d9cbb4-­‐4590-­‐4265-­‐8079-­‐ca44d46d27c3 22
  23. 23. Machine readable labels 23
  24. 24. • Quick Response Code (QR code). • A type of matrix barcode (or two-­‐ dimensional code). • Popular due to its fast readability and large storage capacity. • The use of QR Codes is free of any license. • The QR Code is clearly defined and published as an ISO standard. • Invented in Japan by the Toyota subsidiary Denso Wave in 1994. 24
  25. 25. hbp://purl.org/nhmuio/id/d91e8253-­‐0ac1-­‐4681-­‐ac69-­‐e50070af86a2 25
  26. 26. UUID QR codes for museum objects at NHM-­‐UiO provides: • Machine-­‐readable iden.fiers (using a simple smart phone -­‐ or a barcode reader) • Allows for new and efficient workflows for collec+on management. • Deployment for stable iden.fiers appropriate for data-­‐basing. 26
  27. 27. Efficient workflow rou+nes 27
  28. 28. hbp://gbif.no/dugnad/ 28
  29. 29. • Peer review op+on for biodiversity data sets. • Authors get scien+fic credit for data publica+on. • Mee+ng concerns over data quality. • Mee+ng concerns over data cita.on mechanism. • Towards à Each data set published through GBIF accompanied by a data paper…? 29
  30. 30. Why publish your data • Citable publica+on • Establish scien+fic priority • Increase collabora+on • Link data to bigger network • Re-­‐use and mul+ply effect • Respond to funding requirements hbp://biodiversitydatajournal.com/ Smith V, Georgiev T, Stoev P, Biserkov J, Miller J, Livermore L, Baker E, Mietchen D, Couvreur T, Mueller G, Dikow T, Helgen K, Frank J, Agos+ D, Roberts D, Penev L (2013) Beyond dead trees: integra+ng the scien+fic process in the Biodiversity Data Journal. Biodiversity Data Journal 1: e995. DOI: 10.3897/BDJ.1.e995 30
  31. 31. Globally unique iden+fiers are one of the three core components in the TDWG technical architecture. 31
  32. 32. Status 27. August 2014 GBIF enables free and open access to biodiversity data online. We are an interna+onal government-­‐ini+ated and funded ini+a+ve focused on making biodiversity data available to all and anyone, for scien+fic research, conserva+on and sustainable development. 32
  33. 33. GBIF provides a data discovery system that is dependent on resolvable stable iden3fiers for efficient func3onality global registry data portal 33
  34. 34. Dag Endresen dag.endresen@nhm.uio.no Herbarium-­‐workshop at Kongsvold {ellstue, September 1 to 4, 2014 Gary Larson, 1987 34
  35. 35. Slide 1: Image source: TU GRAZ, Austria, hbp://campusonline.tugraz.at/organisa+on/campusonline. Fair use ra+onale: The image is used to illustrate the principle of stable and persistent iden+fiers forming the glue to connect data objects. Slide 3: George: George Orwell, George Harrison, George Bush, George Bush jr, George Soros, George Washington, Boy George, George (Seinfeld), George Lucas, George Clooney, Prince George of Cambridge, King George III of England, George Armstrong Custer, Georges Enescu, Curious George, St George in New Brunswick, George Coleman, George Eliot. Fair use ra+onale: Images of people and places named George from an Internet search. These images are used here to illustrate the weakness of using a human-­‐friendly iden+fier/name, and that in the global society context, many people and places are named George, leading to a name ambiguity problem. We will not know which George it is referred to. Slide 5: Photo: Sancya/AP./ Published: 03/31/2009 3:58:00, hbp://www.nydailynews.com/news/money/pile-­‐unsold-­‐cars-­‐graveyards-­‐gallery-­‐1.45144 Fair use ra+onale: The image is used to illustrate the principle of uniqueness of iden+fiers within a given context -­‐ such as here car license number plates. The car license number is unlikely to be globally unique in a larger context such as e.g. the Internet. Slide 6: Illustra+on retrieved from hbp://www.hypnosisinmelbourne.com.au/index.php?p=49. Fair use ra+onale: The image is used to illustrate the principle of expanding context that stable iden+fiers can be subject to. An iden+fier used in a par+cular context, such as the Internet, could be exposed to a larger context at a later future +me. Slide 7: Fair use ra+onale: The image is of unknown source, retrieved from an Internet search. The image is used to illustrate the principle of expanding context that stable iden+fiers can be subject to. An iden+fier used in a par+cular context, such as the Internet, could be exposed to a larger context at a later future +me. Slide 14: Image: This is Cape Canaveral (M. Sasek, 1963), hbp://blog.miroslavsasek.com/wp-­‐content/uploads/2009/05/moon-­‐birdwatchers-­‐400.jpg by Miroslav Šašek(1916-­‐1980), hbp://www.miroslavsasek.com/, hbp://www.ilike.org.uk/2009/05/this_is_m_sasek.html. Fair use ra+onale: The image is used here to illustrate the principle of aiming at naming an observed organism re-­‐using common exis+ng persistent iden+fiers. Slide 23: Photo: J.Schulzki. Fair use ra+onale: The image is used to illustrate the principle of machine-­‐readable labels. The handling of luggage n an airport context (or the handling of parcels and lebers in a postal service context) could serve as an inspira+on for developing robo+zed handling of museum specimens -­‐ if these specimens are given machine-­‐readable labels. Slide 34: Image: Gary Larson, The Far Side Observer, October 1987, hbp://i227.photobucket.com/albums/dd202/tomcat600/gary-­‐larson-­‐oct-­‐1987.gif. Fair use ra+onale: This drawing is assumed to be copyrighted by Gary Larson and used here under a fair use claim. The image is used to illustrate the principle of naming all things using persistent iden+fiers. The images are used in an educa+onal and not-­‐for-­‐profit, non-­‐commercial purpose. 35

×