Gbrds Tech Issues Op


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Gbrds Tech Issues Op

  1. 1. GLOBAL BIODIVERSITY INFORMATION FACILITY Tim Robertson Systems Architect September 2009 WWW.GBIF.ORG Technical Issues and Opportunities for Resource Discovery
  2. 2. Content <ul><li>A look at the past, present and future of the GBIF registry and portals for biodiversity resources discovery. </li></ul><ul><ul><ul><li>Register existence </li></ul></ul></ul><ul><ul><ul><li>Associate metadata </li></ul></ul></ul><ul><ul><ul><li>Enable discovery through search </li></ul></ul></ul>
  3. 3. Registry: The past… Universal Description Discovery and Integration (UDDI) “ … XML-based registry for businesses worldwide to list themselves on the Internet …” UDDI GBIF Businesses Institutions + Services + Collections + Service Bindings + Endpoints (DiGIR etc) + TModels + Application Schemas (DwC etc)
  4. 4. UDDI: Metadata <ul><li>Limited by-in-large to: </li></ul><ul><ul><ul><li>Contact Information (emails, addresses etc) </li></ul></ul></ul><ul><ul><ul><li>Key-Value pairs </li></ul></ul></ul><ul><ul><ul><ul><li>ISO country code </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Endorsing node </li></ul></ul></ul></ul><ul><ul><ul><li>Allows for search by title, contact etc </li></ul></ul></ul><ul><ul><ul><li>2 levels of credit </li></ul></ul></ul><ul><ul><ul><ul><li>Data provenance is lost – lack of recognition! </li></ul></ul></ul></ul>
  5. 5. Past: Search capabilities <ul><li>Recognising the federated search was limited, GBIF built the Data Portal ( ) </li></ul><ul><li>Harvesting of resources registered in the UDDI </li></ul><ul><li>TAPIR, DiGIR, BioCASe </li></ul><ul><li>Rich search for individual records and resources by Darwin Core type terms (the what, where, when etc) by building indexes </li></ul><ul><li>Limited metadata search capabilities </li></ul><ul><ul><li>DiGIR, BioCASe, TAPIR etc offer TECHNICAL metadata only </li></ul></ul>
  6. 6. GBIF Network: The real scenario <ul><li>Challenge #1: </li></ul><ul><li>Model the true nature of the network makeup. </li></ul><ul><li>A graph and not a tree </li></ul><ul><li>Multiple entity types </li></ul><ul><ul><li>Institutions, networks, collections, GBIF Nodes </li></ul></ul><ul><li>Many relationship types </li></ul>
  7. 7. <ul><li>Benefits: </li></ul><ul><ul><li>Accurate data provenance </li></ul></ul><ul><ul><li>Duplicate record detection </li></ul></ul><ul><ul><li>Ability to model sub networks </li></ul></ul><ul><li>Opportunity: </li></ul><ul><ul><li>Re-use of registry for your own purposes </li></ul></ul>Registry: A graph based model
  8. 8. <ul><li>Challenge #2: </li></ul><ul><ul><li>Scalable deployment supporting this </li></ul></ul><ul><ul><ul><li>reuse (99.9%, 24/7) </li></ul></ul></ul><ul><ul><li>Authentication model </li></ul></ul><ul><ul><ul><ul><li>Identity management? </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Cascading permissions? </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Wiki style? </li></ul></ul></ul></ul><ul><ul><ul><li>Or perhaps copy the model of ? </li></ul></ul></ul><ul><ul><ul><li>“ Institution X requests to be associated with you. Would you like to accept this association?” </li></ul></ul></ul>Registry: A graph based model
  9. 9. <ul><li>Challenge #2 (cont.): </li></ul><ul><ul><li>Who should curate? </li></ul></ul><ul><ul><ul><li>Private and community copies? </li></ul></ul></ul><ul><ul><li>Single ( scalable ) instance or multiple masters ? </li></ul></ul><ul><li>Opportunity: </li></ul><ul><ul><li>Offering tagging (machine and human) allows for people to make use of the registry in ways we would not envision </li></ul></ul><ul><li> : containsTypesInTaxon = Leiopelmatidae </li></ul>Registry: A graph based model
  10. 10. Endpoint monitoring (Rod Page) Provider monitoring
  11. 11. Enabling discoverability <ul><ul><li>Combination of human authored with machine generated metadata? </li></ul></ul><ul><ul><li>“ … artificial intelligence is just that; ‘ARTIFICIAL intelligence’. For a system to feel smart to humans, you need human crafted metadata… ” </li></ul></ul>
  12. 12. <ul><li>Challenge #3: </li></ul><ul><li>If there is agreement to improve discoverability by associating automatically generated metadata with a registered entity: </li></ul><ul><ul><li>How to uniquely identify resources within the registry? </li></ul></ul><ul><ul><ul><li>Preserve existing ( multiple ) identifiers </li></ul></ul></ul><ul><ul><li>Where does one stop? (Inventory of Taxa for example?) </li></ul></ul><ul><ul><li>What services are required to enable this association? </li></ul></ul><ul><ul><ul><li>E.g. Find resource for “ DwC:collectionCode” </li></ul></ul></ul>Associating data and metadata
  13. 13. Existing metadata stores <ul><li>There are many existing resources … </li></ul><ul><ul><li>Identification of the master copy is critical for success </li></ul></ul><ul><ul><ul><li>Conflict resolution – how do we achieve this? </li></ul></ul></ul><ul><ul><li>Complete copies or subset copies? </li></ul></ul><ul><ul><ul><li>Wikipedia style, make copies available? </li></ul></ul></ul>
  14. 14. Service registration To enable a service oriented architecture (SOA) workflow definition <ul><li>Requires the definition of </li></ul><ul><ul><li>Service endpoints </li></ul></ul><ul><ul><li>Input formats </li></ul></ul><ul><ul><li>Output formats </li></ul></ul>Remember:
  15. 15. GUID Resolution <ul><li>Awaiting recommendation from the task group </li></ul><ul><li>D o we envisage GBIF running a generic resolver (multiple)? </li></ul><ul><li>Act as a cache? </li></ul><ul><li>Include endpoint monitoring and early warning system? </li></ul>
  16. 16. Vocabulary definitions <ul><li>Requires consensus within the community that terms adequately describe the content. </li></ul><ul><li>Community site for authoring vocabularies? </li></ul><ul><li>The same applies for extensions to the Darwin Core </li></ul><ul><li>The GBIF Integrated Publishing Toolkit (IPT) uses the GBRDS as the source for extension definition and vocabulary definition. </li></ul>
  17. 17. Be smart with our limited resources
  18. 18. Contact <ul><ul><ul><li>Web site: </li></ul></ul></ul><ul><ul><ul><li>Data portal: </li></ul></ul></ul><ul><ul><ul><li>GBIF Secretariat </li></ul></ul></ul><ul><ul><ul><li>Universitetsparken 15 </li></ul></ul></ul><ul><ul><ul><li>2100 Copenhagen </li></ul></ul></ul><ul><ul><ul><li>Denmark </li></ul></ul></ul><ul><ul><ul><li>E-mail: [email_address] </li></ul></ul></ul><ul><ul><ul><li>Phone: +45 3532 1487 </li></ul></ul></ul>