• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Paul2 ecn 2012

Paul2 ecn 2012






Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • iDigBio Summit 2011input from initial members
  • Careful Id1=id2 means same objectId1!= id2 does not mean different objects
  • Aggregation and resolution are separate issues, comingled by HTTP URIsuniform resource identifier (URI) is a string of characters used to identify a name or a resource.URIs can be classified as locators (URLs), as names (URNs), or as both. A uniform resource name (URN) functions like a person's name, while a uniform resource locator (URL) resembles that person's street address. In other words: the URN defines an item's identity, while the URL provides a method for finding it.from wikipediaOne can use a URN to talk about a resource without implying its location or how to access it. The resource does not need necessarily to be accessible over a network. For example, the URN urn:isbn:0-395-36341-1 is a URI that specifies the identifier system, i.e. international standard book number (ISBN), as well as the unique reference within that system and allows one to talk about a book, but the URI doesn't suggest where and how to obtain an actual copy of it.
  • Not comingled. Identifier and resolution (proxy) are separateThe consumer has to know somewhere to look for infoRequires organization to manage allocation of id space and proxy resolutionMembers pay for service
  • last of digressionCareful Id1=id2 means same objectId1!= id2 does not mean different objects
  • Back to the primary purpose, managing identifiers as a provider/creator
  • The standard for identification advocated by W3C is to use Universal (uniform)Resource Identifiers (URIs).-- a URI is a string that begins with a scheme name (or protocol). (http, https, mailto, doi, ftp, urn).UUID (sometimes GUID)definitely uniqueE.g. 954c8760-e1a6-4b4b-ab82-6bf7311c25f3Hard to type inNot resolvableNot always DB friendlyOpaqueurn:lsid:authority:namespace:identifierhttp://lsid.tdwg.org/urn:lsid:authority:namespace:identifier
  • The standard for identification advocated by W3C is to use Universal Resource Identifiers (URIs).-- a URI is a string that begins with a scheme name (or protocol). (http, https, mailto, doi, ftp, urn).Second URI has hex encoding of “0014097”UUID (sometimes GUID)Assured uniqueE.g. d6610130-5248-11e1-b86c-0800200c9a66Hard to type inNot resolvableNot always DB friendlyOpaque
  • Emerging Trends in Data Collection, Data Sharing, Data Integration for research, Data citation.Little science to Big Science.Imagine getting credit for all your digitization efforts!
  • Example – Jeremy Miller – link to collections instead of each identifier.
  • ‘target’ is a property of the annotation
  • For more on GUIDs for upload to iDigBio,see our suggestions / policy at:https://www.idigbio.org/sites/default/files/iDigBio-GUID-Statement20MAR2012.pdfServe data for your objects.If you are serving data for other institutions, it needs to be clear in fields likeDarwin Core: Owner Instituion ID, Institution Code, Collection Code fieldsIf this place starts serving their own data, stop serving it for them.
  • 1. go through community process of extending the Darwin Core2. extend uniquely in your community - as a set of terms needed by your community to share data concepts not currently in Darwin Core (example might be paleo extension)3. GBIF extension process - to be able to extend the IPT.    3a. it is possible to create needed extensions.    http://vocabularies.gbif.org/node/124372    http://vocabularies.gbif.org/extensionsNote: base your database on your needs. It does make it easier to match (map) if you use standard terms where possible. So, if adding georef fields to your database, try to use the standard terms if they exist.

Paul2 ecn 2012 Paul2 ecn 2012 Presentation Transcript

  • IDs in and out of the database Entomological Collection Network (ECN) 2012 November 10 – 11, Knoxville, TN Debbie Paul, Greg Riccardi
  • • What good is identification? • How are identifiers used by consumers • Providing IDs • Resolving IDs in a server –Strategies for storing IDs in databases • Linked Data • Annotations ~ all sorts • Feedback Overview
  • What good is identification? • Aggregation – If you get info from 2 sources that are about the same object, you can combine the info • Resolution (finding information about object) – Types of resolution • Determine where to get information • Determine how to get information • Providing information – How to create IDs – How to publish IDs – How to fetch database information for IDs
  • HTTP URIs • Biggest problem – Identification and 2 types of resolution are comingled • Resolution – Where to get information • Look somewhere – How to get information • Fetch information using some protocol
  • DOI example • The DOI is • 10.3897/zookeys.209.3135 • URI (for aggregating) is • doi:10.3897/zookeys.209.3135 • A URL for information retrieval (proxy resolution) is • http://dx.doi.org/10.3897/zookeys.209.3135 • Information fetched from – HTML: • http://www.pensoft.net/journals/zookeys/article/3 135/abstract/five-task-clusters-that-enable- efficient-and-effective-digitization-of-biological- collections – RDF: • http://data.crossref.org/10.3897/zookeys.209.3135
  • What’s in an ID? • For consumer: – NOTHING! No information – Might as well be UUID • Can’t type it, remember it, parse it, resolve it – Useful for comparison and aggregation • Equal strings (persistence) • Different strings about the same object – fetching information • Send the ID somewhere for info
  • What’s in an ID? • For Provider/resolver: – Use ID to find local storage of information – E.g. • parse out the DWC triple • Extract the database table and primary key • Look up the ID in a table of IDs • Look up ID in a URI field of a database table
  • What’s in an id for the provider? • record id 112234 • uuid 954c8760-e1a6-4b4b-ab82-6bf7311c25f3 • lsid urn:lsid:example.org:specimen:22545 • uri • ezid http://n2t.net/ark:/99999/fk42b9hdf • doi doi:10.1038/ng0609-637
  • What about Specimen identifiers? • identifier on the specimen? – readable text – encoded data – barcode is a contextual identifier • identifier in the database? – http://ids.usms.edu/herb/0014097 – http://ids.usms.edu/herb/0303134303937
  • How do providers identify?  Notice online databases and your database and find the identifiers of the various objects  Some identifiers are local (e.g. primary key)  Some identifiers are globally unique  Some identifiers are URIs
  • Identification in the field
  • Storing IDs in databases • your contextual ids?, your guids? • What to use for IDs? –record id –uuid –lsid –uri • what’s in your wallet database? • Morphbank Example
  • IDs in Morphbank • Morphbank Example • http://www.morphbank.net/818505
  • IDs in Morphbank • Morphbank Example • http://www.morphbank.net/643261
  • Sharing data with IDs • into a publication • uploaded to the web • data shared with a database integrator / aggregator – GBIF – iDigBio – VertNet – Morphbank • what is it exactly in the publication? – an id?, a guid? a link to more information? – what will be cited? searched for?
  • Feedback with IDs • Annotations – Target of annotation • http://www.morphbank.net/818505 – filtered PUSH • linked data ~ the semantic web – (benefits – in a minute) • updating the database – be(a)ware – Remember previous IDs
  • What’s coming up next? • expect guids for all sorts of objects –collection objects (example: specimen) –georeferences –taxon concepts –determinations –people
  • GUIDs are key • 1 to many IDs known for a given object • store and share the ones you know about Specimen RecordID 19537 Specimen Previous Catalog Number 212345 Specimen Catalog Number / bar code bbbrc000123 Darwin Core Triplet (DwC) flmnh:herb:bbbrc000123 DwC Occurrence URI urn:catalog:flmnh:herb:bbbrc000123 Specimen GUID of type lsid urn:lsid:biocol.org:flmnh:bbbrc000123 Specimen Opaque Identifier (UUID) 424854d7-baec-42cf-a142-805b64117b9f URI for UUID urn:uuid:424854d7-baec-42cf-a142-805b64117b9f Specimen GUID of type HTTP-URI http://ids.flmnh.ufl.edu/herb/bbbrc000123 *Cannot enforce single identifier per object
  • caring for guids • store them – database adjustments – tweaking current standard practices • share them – data standards – 3 ways to modify darwin core • reap the benefits
  • caring for guids – reap the benefits • Data quality feedback • Dialog based on annotation • Tracking objects through analysis and use • Maintaining attribution to provider • Find related objects • Find a way to take advantage of efforts of many smart dedicated people – BHL, biscicol, filtered PUSH, GNA, TNRS, SGR,…
  • Thanks from iDigBio