Global Biodiversity Information Facility - 2013


Presentation of the Global Biodiversity Information Facility (GBIF), GBIF-Norway and the Norwegian Biodiversity Information Centre (NBIC, Artsdatabanken) at the Norwegian Institute for Forestry and Landscape (Skog og Landskap) at Ås outside Oslo on the 17th October 2013. Seminar together with the Norwegian Biodiversity Information Centre (NBIC, Artsdatabanken).

  1. 1. Seminar at the Norwegian Forest and Landscape Institute     Global Biodiversity Information Facility (GBIF) A global infrastructure for publishing biodiversity data Dag Endresen and Christian Svindseth GBIF Norway, Natural History Museum of the University in Oslo (NHM-UiO) Global Biodiversity Information Facility (GBIF) 17. October 2013
  2. 2. Topics   •  •  •  •  •  •  •  What is GBIF? International partners Darwin Core terminology GBIF data portal and services Norwegian collection portals Persistent identifiers (PID) Data paper 2  
  3. 3. GBIF enables free and open access to biodiversity data online. We are an international government-initiated and funded initiative focused on making biodiversity data available to all and anyone, for scientific research, conservation and sustainable development. Status GBIF data-portal Oktober 2013 3  
  4. 4. Slide  by  Donald  Hobern,  2012 GBIF’s  unique  role   •  Registry  of  biodiversity  data  resources.   •  Tools  and  support  for  biodiversity  data  publica?on.   •  Network  development  at  na?onal,  regional  and   global  levels.   •  Global  virtual  natural  history  collec?on.   •  Cross-­‐domain  linkage  between  data  from   collec?ons,  ecology  and  genomics.   •  Access  to  global  biodiversity  data  for  GIS  analysis   and  environmental  monitoring.   –  Aggregated  presence  data   –  Site-­‐based  survey  data  (samples,  presence/absence)   4  
  5. 5. Norway joined GBIF in February 2004. The  low  membership  coverage  in  Africa  and  Asia  is  an  important  gap! 5  
  6. 6. OECD  Global  Science  Forum  (1999):     “establish  and  support  a  distributed  system  of  interlinked  and   interoperable  modules  (databases,  so6ware  and  networking   tools,  search  engines,  analy:cal  algorithms,  etc.)  that  together   will  form  a  Global  Biodiversity  Informa:on  Facility  (GBIF)”.   6  
  7. 7. The Millennium Ecosystem Assessment showed that human actions often lead to irreversible losses in the diversity of life, and these losses have been more rapid in the past 50 years than ever before in human history.     Biological diversity is key to resilience – the ability of natural and social systems to adapt to change, and is essential for nearly every aspect of human well-being. Because human threats to biodiversity occur across large spatial and temporal scales, biodiversity and ecosystem monitoring, forecasting, and risk assessments require data to be organised in a globallyaccessible, integrated infrastructure. GBIF’s Data Portal provides this infrastructure. 7  
  8. 8. Based  on  slide  by  Donald  Hobern,  2012 Organisa?onal  partnerships   •  Some  poten?al  data  collabora?ons   – Taxon  names  and  nomenclature   •  Catalog  of  Life  (CoL)   •  IPT  to  publish  global  and  regional  species  databases   •  GBIF  infrastructure  to  support  construc?on  of  CoL   – Biodiversity  literature   •  Biodiversity  Heritage  Library  (BHL)   •  User  annota?ons  to  extract  occurrence  records   •  Link  original  (and  other)  descrip?ons  to  taxonomy   – Species  informa?on  and  traits   •  Encyclopedia  of  Life  (EoL)   •  Support  EOL  as  global  species  informa?on  aggregator   •  Include  EOL  summary  box  on  each  GBIF  species  page   8  
  9. 9. GBIF and GEO Intergovernmental group on earth observations GEO  BON   Biodiversity observation network Data Integration & Interoperability GBIF provides the infrastructure delivering species occurrence data. 9  
  10. 10. GIASIP   Global Invasive Alien Species Information Partnership GBIF provides the infrastructure delivering species occurrence data. Launched at CBD COP11 October 2012 in Hyderabad, India. 10  
  11. 11. GBIF and IPBES (Naturpanelet) Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (IPBES) IPBES  provides  informa?on  to  support   policy  decisions  and  scien?fic  research   on  biodiversity.     GBIF  operate  within  data,  informa?on   and  knowledge  domain  of  biodiversity   informa?cs.     GBIF  GBIF  provides  the  infrastructure   delivering  species  occurrence  data  in   IPBES. Science Biodiversity Policy IPBES Data,  informa?on   and  knowledge GBIF 11  
  12. 12. 1.  Information infrastructure – an Internet-based index of a globally distributed network of interoperable databases that contain primary biodiversity data. 2.  Community-developed tools, standards and protocols – the tools data providers need to format and share their data. 3.  Capacity-building and training – and access to a global expert community. 12  
  13. 13. Based  on  slide  by  David  Remsen,  GBIF,  January  2012   Common discovery system 13  
  14. 14. Slide  by  David  Remsen,  GBIF,  November  2011   Architecture   •  Global  Registry  for  resource  discovery.   •  Common  and  documented  data   standards.   – Metadata   – Data   – Vocabularies   •  Data  Sharing  tools.   •  Common  web  service  methods.   •  Resolvable  iden?fiers.   14  
  15. 15. Darwin Core – a vocabulary of terms Wieczorek J, Bloom D, Guralnick R, Blum S, Döring M, De Giovanni R, Robertson T, and Vieglais D (2012) Darwin Core: An Evolving Community-Developed Biodiversity Data Standard. PLoS ONE 7(1): e29715. (doi:10.1371/journal.pone.0029715) 15  
  16. 16.
  17. 17. Slide  by  Donald  Hobern,  2012 Unifying  species  data   Ecological   Monitoring   Genomics   Darwin  Core   Integrated access for records of the occurrence of any species: •  •  •  •  •  •  What? When? Where? What evidence? Data owner? Link to full record Presence only Collec/ons   17  
  18. 18. Slide  by  Donald  Hobern,  2012 Unifying  species  data   Ecological   Monitoring   Integrated access for records of the occurrence of any species: •  •  •  •  •  •  What? When? Where? What evidence? Data owner? Link to full record Presence only Darwin  Core   +  Core  Survey   Fields     Darwin  Core   Sample  Id   Method  Id   Rela?ve  abundance   ...   Collec/ons   Genomics   Fully compatible with existing Darwin Core data, plus: •  Which species were recorded together? •  Which sets of data are directly comparable? •  Which species were most abundant in each sample? Presence/absence 18  
  19. 19. Darwin Core Archive (DwC-A) v  v  v  DwC-A publish DwC records including terms from DwC-A extensions. Simple text based format. Zipped single file archive. Germplasm.txt 19  
  20. 20. Darwin Core Archive Assistant (GBIF, 2010) The Darwin Core Archive Assistant is a web application that presents a simple interface for describing the data elements a data publisher wishes to serve to the GBIF network as basic text files and composes the appropriate XML descriptor file as defined in the Darwin Core Text Guidelines to accompany them. It communicates with the GBIF registry to provide an upto-date listing of all relevant Darwin Core terms and available extensions and presents these in a simple checklist format. 20  
  21. 21. http: //too if.o rg/sp read sh e e t-pro cess o r/
  22. 22. Slide  by  Laura  Russell,  VertNet,  September  2011   Fitness  for  use   Defini?on   "The  general  intent  of  describing  the  quality  of  a  par:cular  dataset   or  record  is  to  describe  the  fitness  of  that  dataset  or  record  for  a   par:cular  use  that  one  may  have  in  mind  for  the  data."   Chrisman,  1991   22  
  23. 23. Slide  by  Donald  Hobern,  2012 Improving  fitness-­‐for-­‐use   Aggregate   •  Progressive  improvement   –  Data  indexes   Data   Indexes   •  Centralised  discovery   •  Standardisa?on  of  persistent  iden?fiers   •  Consistent  metadata   –  Data  quality   Data   Quality   •  •  •  •  Inconsistencies  within  records   Valida?on  against  metadata   Outlier  detec?on   Metrics  per  record  and  per  data  set   –  Expert  cura?on   Expert   Cura/on   •  Interface  with  taxon  expert  groups   •  Incorporate  findings  of  data  users   •  Need  efficient  researcher-­‐friendly  tools   23  
  24. 24. Slide  by  Laura  Russell,  VertNet,  September  2011   Taxonomic  data   Names  are  oeen  the  first  point  of   entry  to  biodiversity  databases.          =>  Risk  of  error  propaga?on     Possible  errors:     •  Wrong  iden?fica?on   •  Wrong  format   •  Spelling  errors   24  
  25. 25. Slide  by  David  Shorthouse,  Canadensys,  January  2013   The problem with scientific names •  •  •  •  •  •  •  No  comprehensive  catalog  of  species   Names  ≠  species   The  species  problem  –  species  concepts   Compe?ng  classifica?ons  /  phylogenies   Many  names  for  one  taxon   One  name  for  many  taxa   ‘Names’  are  more  than  code-­‐compliant   scien?fic  names   25  
  26. 26. Slide  by  David  Shorthouse,  Canadensys,  January  2013   Proposed solution •  Inclusive   –  Accommodate  alternate  perspec?ves   •  Reconcilia?on   –  Map  names  among  and  between  each  other   •  Disambigua?on   –  Context  to  assign  homonymic  names  to  righmul  place   26  
  27. 27. Improving data   quality   The fish collection at NHM has some longitude latitude columns swapped… Indexed by GBIF 14 January 2013 Noticed and corrected in April 2013. (dataset 8102) Indexed by GBIF 3 May 2013 27  
  28. 28.     New  portal   launched     9  October   2013   28  
  29. 29. Data published through GBIF 440 420 400 380 Primary biodiversity records (millions) 360 340 320 300 280 260 240 220 200 180 160 140 120 100 80 A modest decline in the total number of data records in January 2013 resulted from deletion of duplicates and withdrawn data, identified through software and processing upgrades. Last  updated:  2013-­‐10-­‐02   29  
  30. 30. GBIF data publishers 580 560 Number of institutions registered as GBIF data publishers 540 520 500 480 460 440 420 400 380 360 340 320 300 280 260 240 220 200 A sharp rise in the number of data publishers in September 2013 results from institutions choosing to register as separate entities rather than sharing datasets through a single publisher at their national node institution. This helps to raise the visibility and branding of the institutions, and provides more accurate attribution, especially in the new GBIF portal coming online shortly. Last  updated:  2013-­‐10-­‐02   30  
  31. 31. GBIF citation in research 250   232   GBIF  men?oned   GBIF  discussed   No.  of  peer-­‐reviewed  publica?ons   200   197   GBIF-­‐mediated  data  used   170   148   150   100   90   89   66   66   61   57   52   43   50   63   64   48   35   25   17   0   2008   Last  updated:  2013-­‐10-­‐2013   2009   2010   2011   2012   2013  (Jan-­‐Sep)   31  
  32. 32. GBIF portal: 13,3 million occurrences are located in Norway. Published from 30 countries worldwide.
  33. 33. GBIF portal: 12,5 million occurrences published form Norwegian institutes. Covering 180 countries worldwide.
  34. 34. Danmark Finland Norway Sweden Oct  2013   Data  set   Occurences   Denmark   45   9  311  741   Finland   57   14  666  474   Iceland   4   458  705   Norway   85   12  531  207   Sweden   47   43  374  550   Status  Nordic  GBIF  data  sets  (data  hosted  by…)   Iceland 34  
  35. 35. “Artskart” provides the national “GBIF” portal to species occurrences and specimens in Norway. 35  
  36. 36. The site at provides an overview of the Norwegian data sets published to GBIF. 36  
  37. 37. •  •  •  •  •  Custom data portals for Norwegian collections. Upgrade to Darwin Core archives across Norway. Persistent identifiers (UUID, QR code). Data set metadata descriptions (data paper). GIS data server for spatial environment data. 37  
  38. 38. Custom  collec?on  portals   38
  39. 39. •  Soeware  from  GBIF  to  implement  online  data   portals  for  biodiversity  data.   –  Na?onal,  thema?c  or  regional.   –  Based  on  data  published  using  GBIF  standards.   39  
  40. 40. Slide  by  David  Remsen  (2011) Different  data  portals  will  implement   very  different  modules  and   func?onality  to  meet  their  own  needs.   40  
  41. 41. Opportunities with Darwin Core: UiB Artskart UiT GBIF Portal Darwin Core Archive S&L Data portal for institute, region, or theme? Collec?ons  and  data  sets  published  from  the  data  owner  as  one  single  Darwin  Core   archive  (DwC-­‐A).  Different  data  types  from  the  same  DwC-­‐A  can  be  included  to   different  data  portals. 41  
  42. 42. The purpose of identifiers …is to name things, making it possible to refer to them. What is an identifier: “Each identifier refers to one and only one thing” (Coyle 2006). “An association between a string and a thing” (Kunze 2003). “A stated association between a symbol and a thing; that the symbol may be used to unambiguously refer to the thing within a given context” (Campbell 2007). 43  
  43. 43. UUID QR codes for all museum objects at NHMUiO would provide: •  Machine-readable using an ordinary smart phone (or PDA). •  Allows for new and efficient workflows for collection management. •  Deployment for stable identifiers appropriate for data-basing. 44  
  44. 44. Catalog number: O-L-000014, 45  
  45. 45. 46
  46. 46. 47 47  
  47. 47. 48 48  
  48. 48. •  •  •  •  •  Peer  review  op?on  for  biodiversity  data.   Authors  get  scien?fic  credit  for  data  publica?on.   Mee?ng  concerns  over  data  quality.   Mee?ng  concerns  over  data  cita/on  mechanism.   Metadata  formats:  Ecological  Metadata  Language   (EML),  Dublin  Core,  Darwin  Core,  Natural   Collec?ons  Descrip?ons  (NCD)…   •  Towards  à  Each  data  set  published  through  GBIF   accompanied  by  a  data  paper…?   49
  49. 49. 50  
  50. 50. Why  publish  your  data     •  •  •  •  •  •  Citable  publica?on   Establish  scien?fic  priority   Increase  collabora?on   Link  data  to  bigger  network   Re-­‐use  and  mul?ply  effect   Respond  to  funding  requirements   hqp://     Smith V, Georgiev T, Stoev P, Biserkov J, Miller J, Livermore L, Baker E, Mietchen D, Couvreur T, Mueller G, Dikow T, Helgen K, Frank J, Agosti D, Roberts D, Penev L (2013) Beyond dead trees: integrating the scientific process in the Biodiversity Data Journal. Biodiversity Data Journal 1: e995. DOI: 10.3897/BDJ.1.e995
  51. 51. Data rescue activity: Many species occurrence data are “hidden” in reports and documents produced by universities, research institutes, public agencies and the university museums. Project with Artsdatabanken Photo by: Niklas Bildhauer
  52. 52. Scien?sts  from  Norwegian   ins?tutes  using     GBIF-­‐mediated  data:
  53. 53. PCA analysis of 54 environmental variables across Norway versus the National Vegetation Atlas. PCA Component 1 PCA component 2 Bakkestuen, V., Erikstad, L., and Økland, R.H. (2008). Step-less models for regional environmental variation in Norway. J. Biogeography 35: 1906-1922. Norwegian Vegetation Atlas (Moen 1999) Sections (Moen 1999) Zones (Moen 1999) Based on a slide by Vegar Bakkestuen “PCA   Norway”   55  
  54. 54. Modeling  Norwegian  fungi   •  83  fungi  species.   •  10.500  occurrences   from  the  GBIF  portal.   •  Predic?ve  modeling   of  species   distribu?on.   Amanita phalloides Catathelasma imperiale     Wollan,  A.  K.,  Bakkestuen,  V.,  Kauserud,   H.,  Gulden.,  G  and  Halvorsen,  R.  2008.   Modelling  and  predic?ng  fungal   distribu?on  paqerns  using  herbarium   data.  J.  Biogeography  35:2298-­‐2310.       Slide  by  Vegar  Bakkestuen   Hygrocybe vitellina Marasmius_siccus 56  
  55. 55. Node Personnel Dag Endresen, Node Manager Christian Svindseth, Database manager Fridtjof Mehlum, Research Director Einar Timdal, Associate Professor Vegar Bakkestuen, Researcher Geir Søli, Associate Professor Nils Valland, Artsdatabanken Wouter Koch, Artsdatabanken 57  
  56. 56. Thanks for listening! GBIF Norway Dag Endresen Christian Svindseth 58