Global Biodiversity Information Facility (GBIF) - 2012


Presentation of the Global Biodiversity Information Facility (GBIF) and GBIF Norway for the Department of Technical and Scientific Conservation (CONSERV) at the Natural History Museum, University of Oslo. Tøyen, Oslo, 7 November 2012.

  1. 1. Seksjonsmøte:Seksjon for konservering og forskningsteknikk (CONSERV)    Global Biodiversity Information FacilityGBIF NorwayDag Endresen and Christian SvindsethGBIF Norway, NHM-UiONatural History Museum, University of Oslo (NHM-UiO)Global Biodiversity Information Facility (GBIF)7 November 2012
  2. 2. Topics   •  What is GBIF? •  GBIF data portal •  Darwin Core (DwC), DwC archive •  Persistent identifiers (UUID) •  Data paper, citation of data sets2
  3. 3. GBIF enables free and open access to biodiversity data online. We are an international government-initiated and funded initiative focused on making biodiversity data available to all and anyone, for scientific research, conservation and sustainable development. Status data portal October 20123
  4. 4. OECD  Global  Science  Forum  recommenda8on  (1999):  “[E]stablish  and  support  a  distributed  system  of  interlinked  and  interoperable  modules  (databases,  so7ware  and  networking  tools,  search  engines,  analy;cal  algorithms,  etc.)  that  together  will  form  a  Global  Biodiversity  Informa;on  Facility  (GBIF)”.  
  5. 5. 1.  Information infrastructure – an Internet-based index of a globally distributed network of interoperable databases that contain primary biodiversity data.2.  Community-developed tools, standards and protocols – the tools data providers need to format and share their data.3.  Capacity-building and training – and access to a global expert community. 5
  7. 7. GBIF portal: 16,064,074 records with coordinates from a total of 17,268,452 records.GBIF Norway: 11,777,738 records are provided FROM Norwegian data publishers.
  8. 8. GBIF portal: 16,064,074 records with coordinates from a total of 17,268,452 records.GBIF Norway: 11,777,738 records are provided FROM Norwegian data publishers.
  9. 9. GBIF contributes species occurrence data to “Artskart”.9
  10. 10. Slide  developed  by  Donald  Hobern,  2012 GBIF’s  unique  role  •  Registry  of  biodiversity  data  resources  •  Tools  and  support  for  biodiversity  data  publica8on  •  Network  development  at  na8onal,  regional  and   global  levels  •  Global  virtual  natural  history  collec8on  •  Cross-­‐domain  linkage  between  data  from   collec8ons,  ecology  and  genomics  •  Access  to  biodiversity  data  for  GIS  analysis  and   environmental  monitoring   –  Aggregated  presence  data   –  Site-­‐based  survey  data  (samples,  presence/absence)  10
  11. 11. Slide  developed  by  Donald  Hobern,  2012 Improving  fitness-­‐for-­‐use  Aggregate   •  Progressive  improvement   –  Data  indexes   •  Centralised  discovery   Data   Indexes   •  Standardisa8on  of  persistent  iden8fiers   •  Consistent  metadata   –  Data  quality   •  Inconsistencies  within  records   •  Valida8on  against  metadata   Data   Quality   •  Outlier  detec8on   •  Metrics  per  record  and  per  data  set   –  Expert  cura8on   •  Interface  with  taxon  expert  groups   •  Incorporate  findings  of  data  users   Expert   •  Need  efficient  researcher-­‐friendly  tools   Cura6on  
  12. 12. Slide  developed  by  Donald  Hobern,  2012 Organisa8onal  partnerships  •  Some  poten8al  data  collabora8ons   – GBIF-­‐mediated  occurrence  data   •  Maps,  lists  of  countries  recorded   •  Localise  content  in  EOL,  etc.   – BHL  literature   •  User  annota8ons  to  extract  occurrence  records   •  Link  original  (and  other)  descrip8ons  to  taxonomy   – EOL  species  informa8on   •  Support  EOL  as  global  species  informa8on  aggregator   •  Include  EOL  summary  box  on  each  GBIF  species  page   – Catalogue  of  Life   •  IPT  to  publish  global  and  regional  species  databases   •  GBIF  infrastructure  to  support  construc8on  of  CoL  12
  13. 13. Slide  developed  by  Donald  Hobern,  2012 Unifying  species  data   Ecological   Genomics   Monitoring   Darwin  Core  Integrated access forrecords of theoccurrence of anyspecies:•  What? Collec6ons  •  When?•  Where?•  What evidence?•  Data owner?•  Link to full recordPresence only Slide  developed  by  Donald  Hobern
  14. 14. Slide  developed  by  Donald  Hobern,  2012 Unifying  species  data   Ecological   Darwin  Core   Genomics   Monitoring   +  Core  Survey   Fields   Darwin  Core     Sample  Id  Integrated access for Method  Id   Fully compatible withrecords of the Rela8ve  abundance   existing Darwin Core ...  occurrence of any data, plus:species: •  Which species were•  What? recorded together? Collec6ons   •  Which sets of data are•  When?•  Where? directly comparable?•  What evidence? •  Which species were•  Data owner? most abundant in each•  Link to full record sample?Presence only Presence/absence
  15. 15. Darwin Core – a vocabulary of termsWieczorek J, Bloom D, Guralnick R, Blum S, Döring M, De Giovanni R, Robertson T, andVieglais D (2012) Darwin Core: An Evolving Community-Developed Biodiversity Data Standard.PLoS ONE 7(1): e29715. doi:10.1371/journal.pone.0029715
  17. 17. Seman8c   MediaWiki       a  forum  for   discussion  and  development  of   terminology.
  18. 18. Darwin Core Archive (DwC-A)v  DwC-A publish DwC records including terms from DwC-A extensions.v  Simple text based format.v  Zipped single file archive. Germplasm.txt18
  19. 19. Darwin Core Archive extensions •  Global Names Architecture (GNA) •  Audubon Core (multimedia) •  Invasive species (GISIN) •  Genetic Resources (Germplasm) •  EOL species profile •  Taxonomic Concept Schema (TCS) •  Genomics Standards Consortium (GSC) •  Meta-genomics (?) •  ABCD (?) •  …19
  20. 20. Controlled value vocabularies •  Country codes •  Language •  Basis of record •  Taxonomic rank •  Nomenclatural status •  Life form •  Life stage •  Geological time periods •  chronostratigraphy •  magnetostratigraphy •  Species interactions •  saproxylic interactions •  pollinators •  …20
  21. 21. •  Persistent identifiers (UUID, QR code) •  Data set metadata descriptions (data paper) •  Data rescue, scientific reports and student work •  Continue digitization efforts •  Biodiversity literature (BHL)21
  22. 22. •  Persistent  Iden8fier  (PID)  •  Globally  Unique  Iden8fier  (GUID)  •  Universal  Resource  Iden8fier  (URI)  •  Persistent  Uniform  Resource  Locator  (PURL)  •  Digital  Object  Iden8fier  (DOI)  •  Handle  system  (Handle)  •  Life  Science  Iden8fier  (LSID)  •  Archival  Resource  Key  (ARK)  •  Universally  Unique  Iden6fier  (UUID)   22
  23. 23. •  Scalability,  number  of  IDs  •  Community  acceptance  •  Long-­‐term  life-­‐cycle  •  Resolvable,  resolu8on  service(s)  •  Cost  per  iden8fier  •  People-­‐friendly  or  machine-­‐friendly  •  Genera8on  of  IDs   –  Central  genera8on,  PID  issuer     –  Distributed  genera6on  at  source  23
  24. 24. •  A  UUID  is  a  16-­‐octet  (128-­‐bit)  number.  •  Example:   C37E3F9B-­‐BCAF-­‐4479-­‐8EB7-­‐3346A2DB2373  •  The  probability  of  one  duplicate  would  be   about  50%  if  every  person  on  earth  owns   600  million  UUIDs.  •  Allows  for  easy  genera6on  at  source  in  a   distributed  network.  24
  25. 25. •  Quick  Response  Code  (QR  code).  •  A  type  of  matrix  barcode  (or  two-­‐ dimensional  code).  •  Popular  due  to  its  fast  readability  and  large   storage  capacity.  •  The  use  of  QR  Codes  is  free  of  any  license.  •  The  QR  Code  is  clearly  defined  and   published  as  an  ISO  standard.  •  Invented  in  Japan  by  the  Toyota  subsidiary   Denso  Wave  in  1994.  25
  26. 26. UUID: C37E3F9B-BCAF-4479-8EB7-3346A2DB2373QR code for all museum objects atNHM-UiO would provide:•  Machine-readable using an ordinary smart phone (or PDA).•  Allows for new and efficient workflows for collection management.•  Deployment for stable identifiers appropriate for data-basing. 26
  27. 27. •  Peer  review  op8on  for  biodiversity  data.  •  Authors  get  credit  for  data  publica8on.  •  Mee8ng  concerns  over  data  quality.  •  Mee8ng  concerns  over  data  cita6on  mechanism.  •  Metadata  formats:  Ecological  Metadata  Language   (EML),  Dublin  Core,  Darwin  Core,  Natural   Collec8ons  Descrip8ons  (NCD)…  •  Towards  à  Each  data  set  published  through  GBIF   accompanied  by  a  data  paper…?   27
  28. 28. Data rescue activity:Many species occurrence data are“hidden” in reports anddocuments produced byuniversities, research institutes,public agencies and the universitymuseums.Collaboration project with Artsdatabanken Photo by: Niklas Bildhauer
  29. 29. 270 years of literature - since Carl Linnaeus and his Systema Naturae (1735) And a potential source of biodiversity data Biodiversity Heritage Library a consortium of natural history and botanical libraries à BHL Norway…?30
  30. 30. A  book  scanner  at  the  Internet  Archive  headquarters  in  San  Francisco,  California Photo by: Dvortygirl
  31. 31. The Millennium Ecosystem Assessment showed that human actions often lead to irreversible losses in the diversity of life, and these losses have been more rapid in the past 50 years than ever before in human   history. Biological diversity is key to resilience – the ability of natural and social   systems to adapt to change, and is essential for nearly every aspect of human well-being. Because human threats to biodiversity occur across large spatial and temporal scales, biodiversity and ecosystem monitoring, forecasting, and risk assessments require data to be organised in a globally- accessible, integrated infrastructure. GBIF’s Data Portal provides this infrastructure.32
  33. 33. Furthermore, I think that we need persistent identifiers! Cato the Elder ended all his speeches in the senate of Rome with: "Ceterum autem censeo Carthaginem esse delendam" (English: "Furthermore, I think Carthage must be destroyed").34