Jan J. Kuiper, Margret Veltman & Wolf M. Mooij
The scientific use and non-use of the
Global Biodiversity Information Facility
www.nioo.knaw.nl
jankuiper87@gmail.com
Project aim
The Global Biodiversity Information Facility (GBIF) is an open data
infrastructure providing free online access to >700.000.000
occurrence records. GBIF allows to share knowledge beyond borders,
and explore whole new types of research. Although the number of
studies using GBIF-mediated data is rapidly increasing1, the
prevailing perception is that GBIF has not lived up to its full
potential. For example, the Netherlands is a major supplier of data,
yet only a minor user in terms of downloads. NLBIF, the Dutch node
of GBIF, commissioned a project to better understand how scientists
are perceiving GBIF, shine light on GBIF’s realized, as well as its
potential ‘niche’, and identify ways to become more ‘fit for purpose’.
Interviews
We entered into dialogue with 16 biodiversity scientist from 12
different institutes. In-depth interviews (~1.5 hours) allowed us to
uncover a deeper layer of meaning that is not easily addressed by
other forms of data gathering, e.g. regarding attitudes and
incentives2. Our approach revealed that much of the debate centers
around data quality and trust. Data suppliers tend to be skeptical
about the intentions and skills of potential data users, while these
users tent to be skeptical about the quality of freely available data,
and the amount of available data in general. This easily leads to a
vicious circle, whereby it doesn’t help that the competition over data
and business models of data collecting organizations causes much
data to remain restricted and unavailable to the public. In general we
found a remarkable discrepancy between the positive attitude
towards open access science on one hand and the perceived
practical limitations of open access data on the other hand,
overriding a sympathetic attitude towards GBIF. The time it takes to
check and clean GBIF-mediated data easily discourages researchers,
drawing them to performing research at a scale where high quality
data is readily available. Indeed, those who are most skeptical of
GBIF are often working on smaller scale topics.
Recommendations
We identified several possible ways for NLBIF to increase its impact.
• It’s wise to narrow the gap between data suppliers and users, to
overcome the mistrust that currently works benumbing. This
could be done by bringing researchers in contact with each other;
being more of a broker than being a software framework.
• GBIF’s current potential is shaped by Ecological Niche Modelling.
To expand this potential, NLBIF could organize workshops and
establish a national user community. NLBIF should see to it that
GBIF-mediated data is properly referenced by the community.
• GBIF has been a seminal enterprise in biodiversity informatics, but
is currently caught up on different sides by more specialized
alternatives that have learned from GBIF’s children's diseases. To
remain a frontrunner, NLBIF should invest in innovation, e.g.
regarding data cleaning and reprocessing, or by facilitating the
development of novel methods to calculate trends3.
• A study published in Nature or Science led by a Dutch research
group will definitely be an eye opener for the research community
in the Netherlands, tackling the prevailing unfamiliarly. NLBIF can
promote this by supporting promising research in cash or in kind.
• NLBIF can strategically position itself as one of the key
ambassadors of Nature4Life.
• When it comes to global questions in an international context,
e.g. regarding the Aichi Biodiversity Targets and the Sustainable
Development Goals, there is no real alternative for GBIF. This
argument can be emphasized more strongly to entice non-users.
1 GBIF. Science Review 2016. Copenhagen, Denmark.
2 Veltman, M. 2016. Scientific use and non-use of GBIF. A qualitative study of the motivations of biodiversity researchers in the Netherlands. Report, NIOO-KNAW, Wageningen
3 Van Strien, A.J., et al 2013. Opportunistic citizen science data of animal species produce reliable estimates of distribution trends if analysed with occupancy models. J. Appl. Ecol. 50
“Now it is time to expand laterally, to get on with the great Linnean enterprise and finish mapping the biosphere”
E.O. Wilson
Mini-survey NAEM 2016
During last year’s NAEM, 45 participants filled out a questionnaire.
In turned out that 42% of the respondents did not know what GBIF
is, even though 78% is working with primary biodiversity data.
Which source of primary biodiversity information do you use?
a) My own measurements or monitoring data
b) A database owned by the research institute
c) The Dutch Database Flora Fauna (NDFF)
d) NLBIF or GBIF
e) Peer-reviewed papers
f) Other
0% 50% 100%
When asked about why researchers are not using GBIF, the most
mentioned reason was unfamiliarity. Interestingly, 36% of the
researchers working with primary biodiversity data is not willing to
share data via open data infrastructures like GBIF. From the 10
researchers who had actually used GBIF during their research only
half indicated that the attempt was successful.
Literature mapping
To gain insight in the current scientific use of GBIF we examined all
the scientific papers published in 2015 citing ‘GBIF’. This were 698
papers, which appeared in 318 different journals. The latter statistic
already reveals something about the great range of different topics
that can be addressed using GBIF-mediated data. We counted 34
Dutch affiliations. Naturalis is the biggest player with 12
contributions, followed by VU and UVA both scoring 7.
0
10
20
30
40
50
PloSone
Molecularecology
Biodiversitydatajournal
JournalofBiogeography
BiologicalJournaloftheLinneanSociety
NewPhytologist
ZooKeys
Zootaxa
BiologicalConservation
DiversityandDistributions
Ecography
Ecologyandevolution
GlobalEcologyandBiogeography
MolecularPhylogeneticsandEvolution
CheckList
Phytotaxa
BiologicalInvasions
EcologicalModelling
Globalchangebiology
PeerJ
Biodiversity
ConservationBiology
PLoSbiology
ClimaticChange
EcologyLetters
PNAS
Articlesperjournal
> 5 articles/journal
0
10
20
30
40
50
<1 1-2 2-3 3-5 5-8 8-12 >12
Percentagearticles(%)
Impact Factor (Sjr)
N=624 articles
An in-depth analysis of 100 randomly selected papers showed that:
• Only half of the studies actually used GBIF-mediated data.
• The majority of them being fairly fundamental, aiming at
understanding the link between environment, occurrence and
species physiology, whether in the past, present or future.
• The questions and goals of the studies are highly diverse; scopes
range from the gene level to complete ecosystems.
• Numerous types of taxa are represented, from all over the globe.
• Yet, most studies focus on one or only few species and hence are
not considering ‘biodiversity’ per se.
• Ecological Niche Modelling is by far the most applied method.
• In only few cases is GBIF the single source of biodiversity data.
• Hardly ever is the process of data extraction and handling
extensively described; GBIF-mediated data is poorly referenced.
0 5 10 15 20 25 30 35
Percentage NWA questions (%), n=72
Methodological
e.g. “How can the absence of species be determined effectively?”
Enhancing biodiversity
e.g. “How can we maximize biodiversity in urban areas?”
Biodiversity-Ecosystem Functioning
e.g. “What is the consequence of biodiversity loss for resilience?”
Biodiversity-Ecosystem Services
e.g. “Can biodiversity be used to enhance agricultural yield?”
Biodiversity Conservation
e.g. “How can biodiversity be maintained?”
What drives biodiversity
e.g. “What causes the tremendous diversity in species?”
What the public wants
In 2016 the Dutch government asked civilians to submit
scientific research questions. This resulted in 11.700 questions,
which form the basis of the National Science Agenda (NWA), and
contributed to the formation of the Research Agenda for
Biodiversity, Ecology & Evolution (Nature4Life). We analyzed and
clustered the questions about biodiversity (n = 72), revealing
what the public thinks scientists should be studying.

NLBIF_NIOO_2017v3

  • 1.
    Jan J. Kuiper,Margret Veltman & Wolf M. Mooij The scientific use and non-use of the Global Biodiversity Information Facility www.nioo.knaw.nl jankuiper87@gmail.com Project aim The Global Biodiversity Information Facility (GBIF) is an open data infrastructure providing free online access to >700.000.000 occurrence records. GBIF allows to share knowledge beyond borders, and explore whole new types of research. Although the number of studies using GBIF-mediated data is rapidly increasing1, the prevailing perception is that GBIF has not lived up to its full potential. For example, the Netherlands is a major supplier of data, yet only a minor user in terms of downloads. NLBIF, the Dutch node of GBIF, commissioned a project to better understand how scientists are perceiving GBIF, shine light on GBIF’s realized, as well as its potential ‘niche’, and identify ways to become more ‘fit for purpose’. Interviews We entered into dialogue with 16 biodiversity scientist from 12 different institutes. In-depth interviews (~1.5 hours) allowed us to uncover a deeper layer of meaning that is not easily addressed by other forms of data gathering, e.g. regarding attitudes and incentives2. Our approach revealed that much of the debate centers around data quality and trust. Data suppliers tend to be skeptical about the intentions and skills of potential data users, while these users tent to be skeptical about the quality of freely available data, and the amount of available data in general. This easily leads to a vicious circle, whereby it doesn’t help that the competition over data and business models of data collecting organizations causes much data to remain restricted and unavailable to the public. In general we found a remarkable discrepancy between the positive attitude towards open access science on one hand and the perceived practical limitations of open access data on the other hand, overriding a sympathetic attitude towards GBIF. The time it takes to check and clean GBIF-mediated data easily discourages researchers, drawing them to performing research at a scale where high quality data is readily available. Indeed, those who are most skeptical of GBIF are often working on smaller scale topics. Recommendations We identified several possible ways for NLBIF to increase its impact. • It’s wise to narrow the gap between data suppliers and users, to overcome the mistrust that currently works benumbing. This could be done by bringing researchers in contact with each other; being more of a broker than being a software framework. • GBIF’s current potential is shaped by Ecological Niche Modelling. To expand this potential, NLBIF could organize workshops and establish a national user community. NLBIF should see to it that GBIF-mediated data is properly referenced by the community. • GBIF has been a seminal enterprise in biodiversity informatics, but is currently caught up on different sides by more specialized alternatives that have learned from GBIF’s children's diseases. To remain a frontrunner, NLBIF should invest in innovation, e.g. regarding data cleaning and reprocessing, or by facilitating the development of novel methods to calculate trends3. • A study published in Nature or Science led by a Dutch research group will definitely be an eye opener for the research community in the Netherlands, tackling the prevailing unfamiliarly. NLBIF can promote this by supporting promising research in cash or in kind. • NLBIF can strategically position itself as one of the key ambassadors of Nature4Life. • When it comes to global questions in an international context, e.g. regarding the Aichi Biodiversity Targets and the Sustainable Development Goals, there is no real alternative for GBIF. This argument can be emphasized more strongly to entice non-users. 1 GBIF. Science Review 2016. Copenhagen, Denmark. 2 Veltman, M. 2016. Scientific use and non-use of GBIF. A qualitative study of the motivations of biodiversity researchers in the Netherlands. Report, NIOO-KNAW, Wageningen 3 Van Strien, A.J., et al 2013. Opportunistic citizen science data of animal species produce reliable estimates of distribution trends if analysed with occupancy models. J. Appl. Ecol. 50 “Now it is time to expand laterally, to get on with the great Linnean enterprise and finish mapping the biosphere” E.O. Wilson Mini-survey NAEM 2016 During last year’s NAEM, 45 participants filled out a questionnaire. In turned out that 42% of the respondents did not know what GBIF is, even though 78% is working with primary biodiversity data. Which source of primary biodiversity information do you use? a) My own measurements or monitoring data b) A database owned by the research institute c) The Dutch Database Flora Fauna (NDFF) d) NLBIF or GBIF e) Peer-reviewed papers f) Other 0% 50% 100% When asked about why researchers are not using GBIF, the most mentioned reason was unfamiliarity. Interestingly, 36% of the researchers working with primary biodiversity data is not willing to share data via open data infrastructures like GBIF. From the 10 researchers who had actually used GBIF during their research only half indicated that the attempt was successful. Literature mapping To gain insight in the current scientific use of GBIF we examined all the scientific papers published in 2015 citing ‘GBIF’. This were 698 papers, which appeared in 318 different journals. The latter statistic already reveals something about the great range of different topics that can be addressed using GBIF-mediated data. We counted 34 Dutch affiliations. Naturalis is the biggest player with 12 contributions, followed by VU and UVA both scoring 7. 0 10 20 30 40 50 PloSone Molecularecology Biodiversitydatajournal JournalofBiogeography BiologicalJournaloftheLinneanSociety NewPhytologist ZooKeys Zootaxa BiologicalConservation DiversityandDistributions Ecography Ecologyandevolution GlobalEcologyandBiogeography MolecularPhylogeneticsandEvolution CheckList Phytotaxa BiologicalInvasions EcologicalModelling Globalchangebiology PeerJ Biodiversity ConservationBiology PLoSbiology ClimaticChange EcologyLetters PNAS Articlesperjournal > 5 articles/journal 0 10 20 30 40 50 <1 1-2 2-3 3-5 5-8 8-12 >12 Percentagearticles(%) Impact Factor (Sjr) N=624 articles An in-depth analysis of 100 randomly selected papers showed that: • Only half of the studies actually used GBIF-mediated data. • The majority of them being fairly fundamental, aiming at understanding the link between environment, occurrence and species physiology, whether in the past, present or future. • The questions and goals of the studies are highly diverse; scopes range from the gene level to complete ecosystems. • Numerous types of taxa are represented, from all over the globe. • Yet, most studies focus on one or only few species and hence are not considering ‘biodiversity’ per se. • Ecological Niche Modelling is by far the most applied method. • In only few cases is GBIF the single source of biodiversity data. • Hardly ever is the process of data extraction and handling extensively described; GBIF-mediated data is poorly referenced. 0 5 10 15 20 25 30 35 Percentage NWA questions (%), n=72 Methodological e.g. “How can the absence of species be determined effectively?” Enhancing biodiversity e.g. “How can we maximize biodiversity in urban areas?” Biodiversity-Ecosystem Functioning e.g. “What is the consequence of biodiversity loss for resilience?” Biodiversity-Ecosystem Services e.g. “Can biodiversity be used to enhance agricultural yield?” Biodiversity Conservation e.g. “How can biodiversity be maintained?” What drives biodiversity e.g. “What causes the tremendous diversity in species?” What the public wants In 2016 the Dutch government asked civilians to submit scientific research questions. This resulted in 11.700 questions, which form the basis of the National Science Agenda (NWA), and contributed to the formation of the Research Agenda for Biodiversity, Ecology & Evolution (Nature4Life). We analyzed and clustered the questions about biodiversity (n = 72), revealing what the public thinks scientists should be studying.