Seminar at the Norwegian Forest and Landscape Institute

	
  
	
  

Global Biodiversity Information Facility (GBIF)

A global infrastructure for
publishing biodiversity data
Dag Endresen and Christian Svindseth
GBIF Norway, Natural History Museum of the University in Oslo (NHM-UiO)
Global Biodiversity Information Facility (GBIF)
17. October 2013
Topics	
  
• 
• 
• 
• 
• 
• 
• 

What is GBIF?
International partners
Darwin Core terminology
GBIF data portal and services
Norwegian collection portals
Persistent identifiers (PID)
Data paper

2	
  
GBIF enables free and open access to
biodiversity data online.
We are an international government-initiated
and funded initiative focused on making
biodiversity data available to all and anyone,
for scientific research, conservation and
sustainable development.
Status GBIF
data-portal
Oktober 2013

3	
  
Slide	
  by	
  Donald	
  Hobern,	
  2012

GBIF’s	
  unique	
  role
	
  
•  Registry	
  of	
  biodiversity	
  data	
  resources.	
  
•  Tools	
  and	
  support	
  for	
  biodiversity	
  data	
  publica?on.	
  
•  Network	
  development	
  at	
  na?onal,	
  regional	
  and	
  
global	
  levels.	
  
•  Global	
  virtual	
  natural	
  history	
  collec?on.	
  
•  Cross-­‐domain	
  linkage	
  between	
  data	
  from	
  
collec?ons,	
  ecology	
  and	
  genomics.	
  
•  Access	
  to	
  global	
  biodiversity	
  data	
  for	
  GIS	
  analysis	
  
and	
  environmental	
  monitoring.	
  
–  Aggregated	
  presence	
  data	
  
–  Site-­‐based	
  survey	
  data	
  (samples,	
  presence/absence)	
  

4	
  
Norway joined GBIF in February 2004.
The	
  low	
  membership	
  coverage	
  in	
  Africa	
  and	
  Asia	
  is	
  an	
  important	
  gap!

5	
  
OECD	
  Global	
  Science	
  Forum	
  (1999):	
  
	
  
“establish	
  and	
  support	
  a	
  distributed	
  system	
  of	
  interlinked	
  and	
  
interoperable	
  modules	
  (databases,	
  so6ware	
  and	
  networking	
  
tools,	
  search	
  engines,	
  analy:cal	
  algorithms,	
  etc.)	
  that	
  together	
  
will	
  form	
  a	
  Global	
  Biodiversity	
  Informa:on	
  Facility	
  (GBIF)”.	
  

6	
  
The Millennium Ecosystem Assessment showed that human actions
often lead to irreversible losses in the diversity of life, and these losses
have been more rapid in the past 50 years than ever before in human
history.

	
  
	
  

Biological diversity is key to resilience – the ability of natural and social
systems to adapt to change, and is essential for nearly every aspect of
human well-being.
Because human threats to biodiversity occur across large spatial and
temporal scales, biodiversity and ecosystem monitoring, forecasting,
and risk assessments require data to be organised in a globallyaccessible, integrated infrastructure.

GBIF’s Data Portal provides this infrastructure.

7	
  
Based	
  on	
  slide	
  by	
  Donald	
  Hobern,	
  2012

Organisa?onal	
  partnerships
	
  
•  Some	
  poten?al	
  data	
  collabora?ons	
  
– Taxon	
  names	
  and	
  nomenclature	
  

•  Catalog	
  of	
  Life	
  (CoL)	
  
•  IPT	
  to	
  publish	
  global	
  and	
  regional	
  species	
  databases	
  
•  GBIF	
  infrastructure	
  to	
  support	
  construc?on	
  of	
  CoL	
  

– Biodiversity	
  literature	
  

•  Biodiversity	
  Heritage	
  Library	
  (BHL)	
  
•  User	
  annota?ons	
  to	
  extract	
  occurrence	
  records	
  
•  Link	
  original	
  (and	
  other)	
  descrip?ons	
  to	
  taxonomy	
  

– Species	
  informa?on	
  and	
  traits	
  

•  Encyclopedia	
  of	
  Life	
  (EoL)	
  
•  Support	
  EOL	
  as	
  global	
  species	
  informa?on	
  aggregator	
  
•  Include	
  EOL	
  summary	
  box	
  on	
  each	
  GBIF	
  species	
  page	
  
8	
  
GBIF and GEO
Intergovernmental group on earth observations

GEO	
  BON	
  

Biodiversity observation
network

Data Integration &
Interoperability
GBIF provides the infrastructure delivering
species occurrence data.
9	
  
GIASIP
	
  
Global Invasive Alien Species Information Partnership
GBIF provides the infrastructure delivering species occurrence data.
Launched at CBD COP11 October 2012 in Hyderabad, India.

10	
  
GBIF and IPBES
(Naturpanelet)
Intergovernmental Science-Policy Platform on
Biodiversity and Ecosystem Services (IPBES)

IPBES	
  provides	
  informa?on	
  to	
  support	
  
policy	
  decisions	
  and	
  scien?fic	
  research	
  
on	
  biodiversity.	
  
	
  
GBIF	
  operate	
  within	
  data,	
  informa?on	
  
and	
  knowledge	
  domain	
  of	
  biodiversity	
  
informa?cs.	
  
	
  
GBIF	
  GBIF	
  provides	
  the	
  infrastructure	
  
delivering	
  species	
  occurrence	
  data	
  in	
  
IPBES.

Science

Biodiversity

Policy
IPBES

Data,	
  informa?on	
  
and	
  knowledge
GBIF

11	
  
1.  Information infrastructure –
an Internet-based index of a
globally distributed network of
interoperable databases that
contain primary biodiversity
data.
2.  Community-developed tools,
standards and protocols – the
tools data providers need to
format and share their data.
3.  Capacity-building and training
– and access to a global expert
community.

12	
  
Based	
  on	
  slide	
  by	
  David	
  Remsen,	
  GBIF,	
  January	
  2012	
  

Common discovery system
http://gbrds.gbif.org

gbrds.gbif.org

www.gbif.org

13	
  
Slide	
  by	
  David	
  Remsen,	
  GBIF,	
  November	
  2011	
  

Architecture
	
  
•  Global	
  Registry	
  for	
  resource	
  discovery.	
  
•  Common	
  and	
  documented	
  data	
  
standards.	
  
– Metadata	
  
– Data	
  
– Vocabularies	
  

•  Data	
  Sharing	
  tools.	
  
•  Common	
  web	
  service	
  methods.	
  
•  Resolvable	
  iden?fiers.	
  
14	
  
Darwin Core – a vocabulary of terms

Wieczorek J, Bloom D, Guralnick R, Blum S, Döring M, De Giovanni R, Robertson T, and
Vieglais D (2012) Darwin Core: An Evolving Community-Developed Biodiversity Data Standard.
PLoS ONE 7(1): e29715. (doi:10.1371/journal.pone.0029715)

15	
  
http://rs.tdwg.org/terms/
Slide	
  by	
  Donald	
  Hobern,	
  2012

Unifying	
  species	
  data
	
  
Ecological	
  
Monitoring	
  

Genomics	
  

Darwin	
  Core	
  
Integrated access for
records of the
occurrence of any
species:
• 
• 
• 
• 
• 
• 

What?
When?
Where?
What evidence?
Data owner?
Link to full record

Presence only

Collec/ons	
  

17	
  
Slide	
  by	
  Donald	
  Hobern,	
  2012

Unifying	
  species	
  data
	
  
Ecological	
  
Monitoring	
  

Integrated access for
records of the
occurrence of any
species:
• 
• 
• 
• 
• 
• 

What?
When?
Where?
What evidence?
Data owner?
Link to full record

Presence only

Darwin	
  Core	
  
+	
  Core	
  Survey	
  
Fields	
  
	
  
Darwin	
  Core	
  

Sample	
  Id	
  
Method	
  Id	
  
Rela?ve	
  abundance	
  
...	
  

Collec/ons	
  

Genomics	
  

Fully compatible with
existing Darwin Core
data, plus:
•  Which species were
recorded together?
•  Which sets of data are
directly comparable?
•  Which species were
most abundant in each
sample?

Presence/absence
18	
  
Darwin Core Archive (DwC-A)
v 
v 
v 

DwC-A publish DwC records including terms
from DwC-A extensions.
Simple text based format.
Zipped single file archive.

Germplasm.txt

19	
  
Darwin Core Archive Assistant (GBIF, 2010)
The Darwin Core Archive Assistant is a web application that presents a
simple interface for describing the data elements a data publisher wishes to
serve to the GBIF network as basic text files and composes the appropriate
XML descriptor file as defined in the Darwin Core Text Guidelines to
accompany them. It communicates with the GBIF registry to provide an upto-date listing of all relevant Darwin Core terms and available extensions
and presents these in a simple checklist format.

http://tools.gbif.org/dwca-assistant/

20	
  
http:
//too
ls.gb
if.o

rg/sp

read

sh e e

t-pro
cess
o

r/
Slide	
  by	
  Laura	
  Russell,	
  VertNet,	
  September	
  2011	
  

Fitness	
  for	
  use
	
  
Defini?on
	
  

"The	
  general	
  intent	
  of	
  describing	
  the	
  quality	
  of	
  a	
  par:cular	
  dataset	
  
or	
  record	
  is	
  to	
  describe	
  the	
  fitness	
  of	
  that	
  dataset	
  or	
  record	
  for	
  a	
  
par:cular	
  use	
  that	
  one	
  may	
  have	
  in	
  mind	
  for	
  the	
  data."
	
  
Chrisman,	
  1991	
  

22	
  
Slide	
  by	
  Donald	
  Hobern,	
  2012

Improving	
  fitness-­‐for-­‐use
	
  
Aggregate	
  

•  Progressive	
  improvement	
  
–  Data	
  indexes	
  

Data	
  
Indexes	
  

•  Centralised	
  discovery	
  
•  Standardisa?on	
  of	
  persistent	
  iden?fiers	
  
•  Consistent	
  metadata	
  

–  Data	
  quality	
  
Data	
  
Quality	
  

• 
• 
• 
• 

Inconsistencies	
  within	
  records	
  
Valida?on	
  against	
  metadata	
  
Outlier	
  detec?on	
  
Metrics	
  per	
  record	
  and	
  per	
  data	
  set	
  

–  Expert	
  cura?on	
  
Expert	
  
Cura/on	
  

•  Interface	
  with	
  taxon	
  expert	
  groups	
  
•  Incorporate	
  findings	
  of	
  data	
  users	
  
•  Need	
  efficient	
  researcher-­‐friendly	
  tools	
  
23	
  
Slide	
  by	
  Laura	
  Russell,	
  VertNet,	
  September	
  2011	
  

Taxonomic	
  data
	
  
Names	
  are	
  oeen	
  the	
  first	
  point	
  of	
  
entry	
  to	
  biodiversity	
  databases.	
  
	
  	
  	
  	
  =>	
  Risk	
  of	
  error	
  propaga?on	
  
	
  
Possible	
  errors:	
  
	
  
•  Wrong	
  iden?fica?on	
  
•  Wrong	
  format	
  
•  Spelling	
  errors	
  
24	
  
Slide	
  by	
  David	
  Shorthouse,	
  Canadensys,	
  January	
  2013	
  

The problem with scientific names
• 
• 
• 
• 
• 
• 
• 

No	
  comprehensive	
  catalog	
  of	
  species	
  
Names	
  ≠	
  species	
  
The	
  species	
  problem	
  –	
  species	
  concepts	
  
Compe?ng	
  classifica?ons	
  /	
  phylogenies	
  
Many	
  names	
  for	
  one	
  taxon	
  
One	
  name	
  for	
  many	
  taxa	
  
‘Names’	
  are	
  more	
  than	
  code-­‐compliant	
  
scien?fic	
  names	
  
25	
  
Slide	
  by	
  David	
  Shorthouse,	
  Canadensys,	
  January	
  2013	
  

Proposed solution
•  Inclusive	
  
–  Accommodate	
  alternate	
  perspec?ves	
  

•  Reconcilia?on	
  
–  Map	
  names	
  among	
  and	
  between	
  each	
  other	
  

•  Disambigua?on	
  
–  Context	
  to	
  assign	
  homonymic	
  names	
  to	
  righmul	
  place	
  

26	
  
Improving
data	
  
quality
	
  
The fish collection at
NHM has some
longitude latitude
columns swapped…

Indexed by GBIF 14 January 2013

Noticed and
corrected in April
2013.
(dataset 8102)

Indexed by GBIF 3 May 2013

27	
  
http://www.gbif.org/

	
  
	
  
New	
  portal	
  
launched	
  	
  
9	
  October	
  
2013	
  
28	
  
Data published through GBIF
440
420
400
380

Primary biodiversity records (millions)

360
340
320
300
280
260
240
220
200
180
160
140
120
100
80

A modest decline in the total number of data records in January 2013 resulted from deletion of duplicates and withdrawn data,
identified through software and processing upgrades.
Last	
  updated:	
  2013-­‐10-­‐02	
  

29	
  
GBIF data publishers
580
560
Number of institutions registered as GBIF data publishers

540
520
500
480
460
440
420
400
380
360
340
320
300
280
260
240
220
200

A sharp rise in the number of data publishers in September 2013 results from institutions choosing to register as separate entities rather
than sharing datasets through a single publisher at their national node institution. This helps to raise the visibility and branding of the
institutions, and provides more accurate attribution, especially in the new GBIF portal coming online shortly.
Last	
  updated:	
  2013-­‐10-­‐02	
  

30	
  
GBIF citation in research
250	
  
232	
  
GBIF	
  men?oned	
  
GBIF	
  discussed	
  
No.	
  of	
  peer-­‐reviewed	
  publica?ons	
  

200	
  

197	
  

GBIF-­‐mediated	
  data	
  used	
  
170	
  
148	
  

150	
  

100	
  

90	
  

89	
  
66	
   66	
  

61	
  

57	
  

52	
  
43	
  

50	
  

63	
  

64	
  

48	
  
35	
  
25	
  

17	
  
0	
  
2008	
  

Last	
  updated:	
  2013-­‐10-­‐2013	
  

2009	
  

2010	
  

2011	
  

2012	
  

2013	
  (Jan-­‐Sep)	
  

31	
  
GBIF portal:

13,3 million occurrences are located in Norway.
Published from 30 countries worldwide.
GBIF portal:

12,5 million occurrences published form Norwegian institutes.
Covering 180 countries worldwide.
Danmark

Finland

Norway

Sweden

Oct	
  2013	
  

Data	
  set	
  

Occurences	
  

Denmark	
  

45	
  

9	
  311	
  741	
  

Finland	
  

57	
  

14	
  666	
  474	
  

Iceland	
  

4	
  

458	
  705	
  

Norway	
  

85	
  

12	
  531	
  207	
  

Sweden	
  

47	
  

43	
  374	
  550	
  

Status	
  Nordic	
  GBIF	
  data	
  sets	
  (data	
  hosted	
  by…)
	
  

Iceland
34	
  
“Artskart” provides
the national “GBIF”
portal to species
occurrences and
specimens in Norway.
35	
  
The site at
http://gbif.no
provides an
overview of the
Norwegian
data sets
published to
GBIF.

36	
  
• 
• 
• 
• 
• 

Custom data portals for Norwegian collections.
Upgrade to Darwin Core archives across Norway.
Persistent identifiers (UUID, QR code).
Data set metadata descriptions (data paper).
GIS data server for spatial environment data.

37	
  
Custom	
  collec?on	
  portals
	
  

38
•  Soeware	
  from	
  GBIF	
  to	
  implement	
  online	
  data	
  
portals	
  for	
  biodiversity	
  data.	
  
–  Na?onal,	
  thema?c	
  or	
  regional.	
  
–  Based	
  on	
  data	
  published	
  using	
  GBIF	
  standards.	
  

39	
  
Slide	
  by	
  David	
  Remsen	
  (2011)

Different	
  data	
  portals	
  will	
  implement	
  
very	
  different	
  modules	
  and	
  
func?onality	
  to	
  meet	
  their	
  own	
  needs.
	
  

40	
  
Opportunities with Darwin Core:
UiB

Artskart

UiT

GBIF
Portal
Darwin Core
Archive

S&L

Data portal
for institute,
region, or
theme?

Collec?ons	
  and	
  data	
  sets	
  published	
  from	
  the	
  data	
  owner	
  as	
  one	
  single	
  Darwin	
  Core	
  
archive	
  (DwC-­‐A).	
  Different	
  data	
  types	
  from	
  the	
  same	
  DwC-­‐A	
  can	
  be	
  included	
  to	
  
different	
  data	
  portals.

41	
  
The purpose of identifiers
…is to name things,
making it possible to refer to them.

What is an identifier:
“Each identifier refers to one and only one thing” (Coyle 2006).
“An association between a string and a thing” (Kunze 2003).
“A stated association between a symbol and a thing; that the
symbol may be used to unambiguously refer to the thing
within a given context” (Campbell 2007).

43	
  
UUID QR codes for all
museum objects at NHMUiO would provide:
•  Machine-readable using an
ordinary smart phone (or PDA).
•  Allows for new and efficient
workflows for collection
management.
•  Deployment for stable identifiers
appropriate for data-basing.

44	
  
Catalog number: O-L-000014, http://purl.org/nhmuio/id/41d9cbb4-4590-4265-8079-ca44d46d27c3

45	
  
http://purl.org/nhmuio/id/d91e8253-0ac1-4681-ac69-e50070af86a2

46
47

47	
  
48

48	
  
• 
• 
• 
• 
• 

Peer	
  review	
  op?on	
  for	
  biodiversity	
  data.	
  
Authors	
  get	
  scien?fic	
  credit	
  for	
  data	
  publica?on.	
  
Mee?ng	
  concerns	
  over	
  data	
  quality.	
  
Mee?ng	
  concerns	
  over	
  data	
  cita/on	
  mechanism.	
  
Metadata	
  formats:	
  Ecological	
  Metadata	
  Language	
  
(EML),	
  Dublin	
  Core,	
  Darwin	
  Core,	
  Natural	
  
Collec?ons	
  Descrip?ons	
  (NCD)…	
  

•  Towards	
  à	
  Each	
  data	
  set	
  published	
  through	
  GBIF	
  
accompanied	
  by	
  a	
  data	
  paper…?	
  
49
50	
  
Why	
  publish	
  your	
  data	
  
	
  

• 
• 
• 
• 
• 
• 

Citable	
  publica?on	
  
Establish	
  scien?fic	
  priority	
  
Increase	
  collabora?on	
  
Link	
  data	
  to	
  bigger	
  network	
  
Re-­‐use	
  and	
  mul?ply	
  effect	
  
Respond	
  to	
  funding	
  requirements	
  

hqp://biodiversitydatajournal.com/	
  	
  

Smith V, Georgiev T, Stoev P, Biserkov J, Miller J, Livermore L,
Baker E, Mietchen D, Couvreur T, Mueller G, Dikow T, Helgen K,
Frank J, Agosti D, Roberts D, Penev L (2013) Beyond dead trees:
integrating the scientific process in the Biodiversity Data Journal.
Biodiversity Data Journal 1: e995. DOI: 10.3897/BDJ.1.e995
Data rescue activity:
Many species occurrence data are
“hidden” in reports and
documents produced by
universities, research institutes,
public agencies and the university
museums.
Project with Artsdatabanken
Photo by: Niklas Bildhauer
Scien?sts	
  from	
  Norwegian	
  
ins?tutes	
  using	
  	
  
GBIF-­‐mediated	
  data:
PCA analysis of 54 environmental variables across
Norway versus the National Vegetation Atlas.

PCA
Component 1

PCA
component 2

Bakkestuen, V., Erikstad, L., and Økland, R.H. (2008). Step-less models for
regional environmental variation in Norway. J. Biogeography 35: 1906-1922.

Norwegian Vegetation
Atlas (Moen 1999)

Sections
(Moen 1999)

Zones
(Moen 1999)

Based on a slide
by Vegar Bakkestuen

“PCA	
  
Norway”	
  
55	
  
Modeling	
  Norwegian	
  fungi
	
  
•  83	
  fungi	
  species.	
  
•  10.500	
  occurrences	
  
from	
  the	
  GBIF	
  portal.	
  
•  Predic?ve	
  modeling	
  
of	
  species	
  
distribu?on.	
  

Amanita phalloides

Catathelasma imperiale

	
  
	
  
Wollan,	
  A.	
  K.,	
  Bakkestuen,	
  V.,	
  Kauserud,	
  
H.,	
  Gulden.,	
  G	
  and	
  Halvorsen,	
  R.	
  2008.	
  
Modelling	
  and	
  predic?ng	
  fungal	
  
distribu?on	
  paqerns	
  using	
  herbarium	
  
data.	
  J.	
  Biogeography	
  35:2298-­‐2310.	
  
	
  
	
  
Slide	
  by	
  Vegar	
  Bakkestuen	
  

Hygrocybe vitellina

Marasmius_siccus

56	
  
Node Personnel
Dag Endresen, Node Manager
Christian Svindseth, Database manager
Fridtjof Mehlum, Research Director
Einar Timdal, Associate Professor
Vegar Bakkestuen, Researcher
Geir Søli, Associate Professor
Nils Valland, Artsdatabanken
Wouter Koch, Artsdatabanken

57	
  
Thanks for listening!

GBIF Norway
Dag Endresen
dag.endresen@nhm.uio.no
Christian Svindseth
christian.svindseth@nhm.uio.no
58	
  

Global Biodiversity Information Facility - 2013

  • 1.
    Seminar at theNorwegian Forest and Landscape Institute     Global Biodiversity Information Facility (GBIF) A global infrastructure for publishing biodiversity data Dag Endresen and Christian Svindseth GBIF Norway, Natural History Museum of the University in Oslo (NHM-UiO) Global Biodiversity Information Facility (GBIF) 17. October 2013
  • 2.
    Topics   •  •  •  •  •  •  •  What isGBIF? International partners Darwin Core terminology GBIF data portal and services Norwegian collection portals Persistent identifiers (PID) Data paper 2  
  • 3.
    GBIF enables freeand open access to biodiversity data online. We are an international government-initiated and funded initiative focused on making biodiversity data available to all and anyone, for scientific research, conservation and sustainable development. Status GBIF data-portal Oktober 2013 3  
  • 4.
    Slide  by  Donald  Hobern,  2012 GBIF’s  unique  role   •  Registry  of  biodiversity  data  resources.   •  Tools  and  support  for  biodiversity  data  publica?on.   •  Network  development  at  na?onal,  regional  and   global  levels.   •  Global  virtual  natural  history  collec?on.   •  Cross-­‐domain  linkage  between  data  from   collec?ons,  ecology  and  genomics.   •  Access  to  global  biodiversity  data  for  GIS  analysis   and  environmental  monitoring.   –  Aggregated  presence  data   –  Site-­‐based  survey  data  (samples,  presence/absence)   4  
  • 5.
    Norway joined GBIFin February 2004. The  low  membership  coverage  in  Africa  and  Asia  is  an  important  gap! 5  
  • 6.
    OECD  Global  Science  Forum  (1999):     “establish  and  support  a  distributed  system  of  interlinked  and   interoperable  modules  (databases,  so6ware  and  networking   tools,  search  engines,  analy:cal  algorithms,  etc.)  that  together   will  form  a  Global  Biodiversity  Informa:on  Facility  (GBIF)”.   6  
  • 7.
    The Millennium EcosystemAssessment showed that human actions often lead to irreversible losses in the diversity of life, and these losses have been more rapid in the past 50 years than ever before in human history.     Biological diversity is key to resilience – the ability of natural and social systems to adapt to change, and is essential for nearly every aspect of human well-being. Because human threats to biodiversity occur across large spatial and temporal scales, biodiversity and ecosystem monitoring, forecasting, and risk assessments require data to be organised in a globallyaccessible, integrated infrastructure. GBIF’s Data Portal provides this infrastructure. 7  
  • 8.
    Based  on  slide  by  Donald  Hobern,  2012 Organisa?onal  partnerships   •  Some  poten?al  data  collabora?ons   – Taxon  names  and  nomenclature   •  Catalog  of  Life  (CoL)   •  IPT  to  publish  global  and  regional  species  databases   •  GBIF  infrastructure  to  support  construc?on  of  CoL   – Biodiversity  literature   •  Biodiversity  Heritage  Library  (BHL)   •  User  annota?ons  to  extract  occurrence  records   •  Link  original  (and  other)  descrip?ons  to  taxonomy   – Species  informa?on  and  traits   •  Encyclopedia  of  Life  (EoL)   •  Support  EOL  as  global  species  informa?on  aggregator   •  Include  EOL  summary  box  on  each  GBIF  species  page   8  
  • 9.
    GBIF and GEO Intergovernmentalgroup on earth observations GEO  BON   Biodiversity observation network Data Integration & Interoperability GBIF provides the infrastructure delivering species occurrence data. 9  
  • 10.
    GIASIP   Global InvasiveAlien Species Information Partnership GBIF provides the infrastructure delivering species occurrence data. Launched at CBD COP11 October 2012 in Hyderabad, India. 10  
  • 11.
    GBIF and IPBES (Naturpanelet) IntergovernmentalScience-Policy Platform on Biodiversity and Ecosystem Services (IPBES) IPBES  provides  informa?on  to  support   policy  decisions  and  scien?fic  research   on  biodiversity.     GBIF  operate  within  data,  informa?on   and  knowledge  domain  of  biodiversity   informa?cs.     GBIF  GBIF  provides  the  infrastructure   delivering  species  occurrence  data  in   IPBES. Science Biodiversity Policy IPBES Data,  informa?on   and  knowledge GBIF 11  
  • 12.
    1.  Information infrastructure– an Internet-based index of a globally distributed network of interoperable databases that contain primary biodiversity data. 2.  Community-developed tools, standards and protocols – the tools data providers need to format and share their data. 3.  Capacity-building and training – and access to a global expert community. 12  
  • 13.
    Based  on  slide  by  David  Remsen,  GBIF,  January  2012   Common discovery system http://gbrds.gbif.org gbrds.gbif.org www.gbif.org 13  
  • 14.
    Slide  by  David  Remsen,  GBIF,  November  2011   Architecture   •  Global  Registry  for  resource  discovery.   •  Common  and  documented  data   standards.   – Metadata   – Data   – Vocabularies   •  Data  Sharing  tools.   •  Common  web  service  methods.   •  Resolvable  iden?fiers.   14  
  • 15.
    Darwin Core –a vocabulary of terms Wieczorek J, Bloom D, Guralnick R, Blum S, Döring M, De Giovanni R, Robertson T, and Vieglais D (2012) Darwin Core: An Evolving Community-Developed Biodiversity Data Standard. PLoS ONE 7(1): e29715. (doi:10.1371/journal.pone.0029715) 15  
  • 16.
  • 17.
    Slide  by  Donald  Hobern,  2012 Unifying  species  data   Ecological   Monitoring   Genomics   Darwin  Core   Integrated access for records of the occurrence of any species: •  •  •  •  •  •  What? When? Where? What evidence? Data owner? Link to full record Presence only Collec/ons   17  
  • 18.
    Slide  by  Donald  Hobern,  2012 Unifying  species  data   Ecological   Monitoring   Integrated access for records of the occurrence of any species: •  •  •  •  •  •  What? When? Where? What evidence? Data owner? Link to full record Presence only Darwin  Core   +  Core  Survey   Fields     Darwin  Core   Sample  Id   Method  Id   Rela?ve  abundance   ...   Collec/ons   Genomics   Fully compatible with existing Darwin Core data, plus: •  Which species were recorded together? •  Which sets of data are directly comparable? •  Which species were most abundant in each sample? Presence/absence 18  
  • 19.
    Darwin Core Archive(DwC-A) v  v  v  DwC-A publish DwC records including terms from DwC-A extensions. Simple text based format. Zipped single file archive. Germplasm.txt 19  
  • 20.
    Darwin Core ArchiveAssistant (GBIF, 2010) The Darwin Core Archive Assistant is a web application that presents a simple interface for describing the data elements a data publisher wishes to serve to the GBIF network as basic text files and composes the appropriate XML descriptor file as defined in the Darwin Core Text Guidelines to accompany them. It communicates with the GBIF registry to provide an upto-date listing of all relevant Darwin Core terms and available extensions and presents these in a simple checklist format. http://tools.gbif.org/dwca-assistant/ 20  
  • 21.
  • 22.
    Slide  by  Laura  Russell,  VertNet,  September  2011   Fitness  for  use   Defini?on   "The  general  intent  of  describing  the  quality  of  a  par:cular  dataset   or  record  is  to  describe  the  fitness  of  that  dataset  or  record  for  a   par:cular  use  that  one  may  have  in  mind  for  the  data."   Chrisman,  1991   22  
  • 23.
    Slide  by  Donald  Hobern,  2012 Improving  fitness-­‐for-­‐use   Aggregate   •  Progressive  improvement   –  Data  indexes   Data   Indexes   •  Centralised  discovery   •  Standardisa?on  of  persistent  iden?fiers   •  Consistent  metadata   –  Data  quality   Data   Quality   •  •  •  •  Inconsistencies  within  records   Valida?on  against  metadata   Outlier  detec?on   Metrics  per  record  and  per  data  set   –  Expert  cura?on   Expert   Cura/on   •  Interface  with  taxon  expert  groups   •  Incorporate  findings  of  data  users   •  Need  efficient  researcher-­‐friendly  tools   23  
  • 24.
    Slide  by  Laura  Russell,  VertNet,  September  2011   Taxonomic  data   Names  are  oeen  the  first  point  of   entry  to  biodiversity  databases.          =>  Risk  of  error  propaga?on     Possible  errors:     •  Wrong  iden?fica?on   •  Wrong  format   •  Spelling  errors   24  
  • 25.
    Slide  by  David  Shorthouse,  Canadensys,  January  2013   The problem with scientific names •  •  •  •  •  •  •  No  comprehensive  catalog  of  species   Names  ≠  species   The  species  problem  –  species  concepts   Compe?ng  classifica?ons  /  phylogenies   Many  names  for  one  taxon   One  name  for  many  taxa   ‘Names’  are  more  than  code-­‐compliant   scien?fic  names   25  
  • 26.
    Slide  by  David  Shorthouse,  Canadensys,  January  2013   Proposed solution •  Inclusive   –  Accommodate  alternate  perspec?ves   •  Reconcilia?on   –  Map  names  among  and  between  each  other   •  Disambigua?on   –  Context  to  assign  homonymic  names  to  righmul  place   26  
  • 27.
    Improving data   quality   Thefish collection at NHM has some longitude latitude columns swapped… Indexed by GBIF 14 January 2013 Noticed and corrected in April 2013. (dataset 8102) Indexed by GBIF 3 May 2013 27  
  • 28.
    http://www.gbif.org/     New  portal   launched     9  October   2013   28  
  • 29.
    Data published throughGBIF 440 420 400 380 Primary biodiversity records (millions) 360 340 320 300 280 260 240 220 200 180 160 140 120 100 80 A modest decline in the total number of data records in January 2013 resulted from deletion of duplicates and withdrawn data, identified through software and processing upgrades. Last  updated:  2013-­‐10-­‐02   29  
  • 30.
    GBIF data publishers 580 560 Numberof institutions registered as GBIF data publishers 540 520 500 480 460 440 420 400 380 360 340 320 300 280 260 240 220 200 A sharp rise in the number of data publishers in September 2013 results from institutions choosing to register as separate entities rather than sharing datasets through a single publisher at their national node institution. This helps to raise the visibility and branding of the institutions, and provides more accurate attribution, especially in the new GBIF portal coming online shortly. Last  updated:  2013-­‐10-­‐02   30  
  • 31.
    GBIF citation inresearch 250   232   GBIF  men?oned   GBIF  discussed   No.  of  peer-­‐reviewed  publica?ons   200   197   GBIF-­‐mediated  data  used   170   148   150   100   90   89   66   66   61   57   52   43   50   63   64   48   35   25   17   0   2008   Last  updated:  2013-­‐10-­‐2013   2009   2010   2011   2012   2013  (Jan-­‐Sep)   31  
  • 32.
    GBIF portal: 13,3 millionoccurrences are located in Norway. Published from 30 countries worldwide.
  • 33.
    GBIF portal: 12,5 millionoccurrences published form Norwegian institutes. Covering 180 countries worldwide.
  • 34.
    Danmark Finland Norway Sweden Oct  2013   Data  set   Occurences   Denmark   45   9  311  741   Finland   57   14  666  474   Iceland   4   458  705   Norway   85   12  531  207   Sweden   47   43  374  550   Status  Nordic  GBIF  data  sets  (data  hosted  by…)   Iceland 34  
  • 35.
    “Artskart” provides the national“GBIF” portal to species occurrences and specimens in Norway. 35  
  • 36.
    The site at http://gbif.no providesan overview of the Norwegian data sets published to GBIF. 36  
  • 37.
    •  •  •  •  •  Custom data portalsfor Norwegian collections. Upgrade to Darwin Core archives across Norway. Persistent identifiers (UUID, QR code). Data set metadata descriptions (data paper). GIS data server for spatial environment data. 37  
  • 38.
  • 39.
    •  Soeware  from  GBIF  to  implement  online  data   portals  for  biodiversity  data.   –  Na?onal,  thema?c  or  regional.   –  Based  on  data  published  using  GBIF  standards.   39  
  • 40.
    Slide  by  David  Remsen  (2011) Different  data  portals  will  implement   very  different  modules  and   func?onality  to  meet  their  own  needs.   40  
  • 41.
    Opportunities with DarwinCore: UiB Artskart UiT GBIF Portal Darwin Core Archive S&L Data portal for institute, region, or theme? Collec?ons  and  data  sets  published  from  the  data  owner  as  one  single  Darwin  Core   archive  (DwC-­‐A).  Different  data  types  from  the  same  DwC-­‐A  can  be  included  to   different  data  portals. 41  
  • 43.
    The purpose ofidentifiers …is to name things, making it possible to refer to them. What is an identifier: “Each identifier refers to one and only one thing” (Coyle 2006). “An association between a string and a thing” (Kunze 2003). “A stated association between a symbol and a thing; that the symbol may be used to unambiguously refer to the thing within a given context” (Campbell 2007). 43  
  • 44.
    UUID QR codesfor all museum objects at NHMUiO would provide: •  Machine-readable using an ordinary smart phone (or PDA). •  Allows for new and efficient workflows for collection management. •  Deployment for stable identifiers appropriate for data-basing. 44  
  • 45.
    Catalog number: O-L-000014,http://purl.org/nhmuio/id/41d9cbb4-4590-4265-8079-ca44d46d27c3 45  
  • 46.
  • 47.
  • 48.
  • 49.
    •  •  •  •  •  Peer  review  op?on  for  biodiversity  data.   Authors  get  scien?fic  credit  for  data  publica?on.   Mee?ng  concerns  over  data  quality.   Mee?ng  concerns  over  data  cita/on  mechanism.   Metadata  formats:  Ecological  Metadata  Language   (EML),  Dublin  Core,  Darwin  Core,  Natural   Collec?ons  Descrip?ons  (NCD)…   •  Towards  à  Each  data  set  published  through  GBIF   accompanied  by  a  data  paper…?   49
  • 50.
  • 51.
    Why  publish  your  data     •  •  •  •  •  •  Citable  publica?on   Establish  scien?fic  priority   Increase  collabora?on   Link  data  to  bigger  network   Re-­‐use  and  mul?ply  effect   Respond  to  funding  requirements   hqp://biodiversitydatajournal.com/     Smith V, Georgiev T, Stoev P, Biserkov J, Miller J, Livermore L, Baker E, Mietchen D, Couvreur T, Mueller G, Dikow T, Helgen K, Frank J, Agosti D, Roberts D, Penev L (2013) Beyond dead trees: integrating the scientific process in the Biodiversity Data Journal. Biodiversity Data Journal 1: e995. DOI: 10.3897/BDJ.1.e995
  • 52.
    Data rescue activity: Manyspecies occurrence data are “hidden” in reports and documents produced by universities, research institutes, public agencies and the university museums. Project with Artsdatabanken Photo by: Niklas Bildhauer
  • 54.
    Scien?sts  from  Norwegian   ins?tutes  using     GBIF-­‐mediated  data:
  • 55.
    PCA analysis of54 environmental variables across Norway versus the National Vegetation Atlas. PCA Component 1 PCA component 2 Bakkestuen, V., Erikstad, L., and Økland, R.H. (2008). Step-less models for regional environmental variation in Norway. J. Biogeography 35: 1906-1922. Norwegian Vegetation Atlas (Moen 1999) Sections (Moen 1999) Zones (Moen 1999) Based on a slide by Vegar Bakkestuen “PCA   Norway”   55  
  • 56.
    Modeling  Norwegian  fungi   •  83  fungi  species.   •  10.500  occurrences   from  the  GBIF  portal.   •  Predic?ve  modeling   of  species   distribu?on.   Amanita phalloides Catathelasma imperiale     Wollan,  A.  K.,  Bakkestuen,  V.,  Kauserud,   H.,  Gulden.,  G  and  Halvorsen,  R.  2008.   Modelling  and  predic?ng  fungal   distribu?on  paqerns  using  herbarium   data.  J.  Biogeography  35:2298-­‐2310.       Slide  by  Vegar  Bakkestuen   Hygrocybe vitellina Marasmius_siccus 56  
  • 57.
    Node Personnel Dag Endresen,Node Manager Christian Svindseth, Database manager Fridtjof Mehlum, Research Director Einar Timdal, Associate Professor Vegar Bakkestuen, Researcher Geir Søli, Associate Professor Nils Valland, Artsdatabanken Wouter Koch, Artsdatabanken 57  
  • 58.
    Thanks for listening! GBIFNorway Dag Endresen dag.endresen@nhm.uio.no Christian Svindseth christian.svindseth@nhm.uio.no 58