Successfully reported this slideshow.
Your SlideShare is downloading. ×

2021-01-27--biodiversity-informatics-gbif-(52slides)

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 52 Ad

2021-01-27--biodiversity-informatics-gbif-(52slides)

Download to read offline

Lecture for a course at NTNU, 27th January 2021
CC-BY 4.0 Dag Endresen https://orcid.org/0000-0002-2352-5497

See also http://bit.ly/biodiversityinformatics
https://www.gbif.no/events/2021/lecture-ntnu-gbif.html

Lecture for a course at NTNU, 27th January 2021
CC-BY 4.0 Dag Endresen https://orcid.org/0000-0002-2352-5497

See also http://bit.ly/biodiversityinformatics
https://www.gbif.no/events/2021/lecture-ntnu-gbif.html

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to 2021-01-27--biodiversity-informatics-gbif-(52slides) (20)

Advertisement

More from Dag Endresen (14)

Recently uploaded (20)

Advertisement

2021-01-27--biodiversity-informatics-gbif-(52slides)

  1. 1. Biodiversity Informatics Dag Endresen | GBIF Node Manager for Norway Lecture | NTNU, Trondheim, Norway | 27th January 2021 Global Biodiversity Information Facility Illustration by Harry Potter Bygett (NTNU) Wiki Commons CC BY 2.0
  2. 2. WHY SHOULD STUDENTS LEARN OPEN SCIENCE?
  3. 3. WHY TEACH STUDENTS OPEN SCIENCE? v We are in the middle of an ongoing paradigm shift in scientific practice (and impact metrics). v The open science wave is moving fast! v Young scientists will need different skills, than was needed previously to succeed in academia. v Researchers will need to develop different approaches, than they needed in the past – to remain relevant. v Society is quickly gaining Big Data maturity and will expect new services from biodiversity information and research.
  4. 4. DATA CITATION AS A NEW CURRENCY OF SCIENCE ● Peer-reviewed scholarly papers in high impact journals maintain considerable weight for impact metrics. ● A movement is under way to build similar status for open data, open metadata, open material samples, and other open scientific research products…
  5. 5. DECLARATION ON RESEARCH ASSESSMENT ● The Declaration on Research Assessment (DORA) recognizes the need to improve the ways in which the outputs of scholarly research are evaluated. Developed in 2012 in San Francisco. ● It has become a worldwide initiative covering all scholarly disciplines and all key stakeholders including funders, publishers, professional societies, institutions, and researchers. ● DORA’s vision is to advance practical and robust approaches to research assessment globally and across all scholarly disciplines. ● The Research Council of Norway (RCN) signed DORA (May 2018) To date (2021-01-27), 16 873 individuals and 2 139 organizations in 144 countries have signed DORA.
  6. 6. DATA CITATION PRINCIPLES 1. Data to be legitimate citable products of research. 2. Data citations giving scholarly credit and attribution. 3. In scholarly literature, whenever claims are based on data, data should always be cited. 4. Persistent method for identification of data, that is machine actionable, globally unique, universal. 5. Data citation facilitate access to data or at least to metadata. 6. Unique identifiers that persist even beyond the lifespan of the data. 7. Data citation identify and access the specific data that support verification of the claim (provenance, time-slice, version). 8. Flexible, but attention to interoperability of practices across communities. Data Citation Synthesis Group: Joint Declaration of Data Citation Principles. Martone M. (ed.) San Diego CA: FORCE11; 2014; https://doi.org/10.25490/a97f-egyk
  7. 7. FAIR DATA PRINCIPLES … researchers need to do more than simply post their data on the web for it to be re-usable.
  8. 8. WHAT IS FAIR DATA? ● FINDABLE à Data and supplementary materials have sufficiently rich metadata and a unique and persistent identifier. ● ACCESSIBLE à Metadata and data are understandable to humans and machines. Data is deposited in a trusted repository. ● INTEROPERABLE à Metadata use a formal, accessible, shared, and broadly applicable language for knowledge representation. ● REUSABLE à Data and collections have a clear usage licenses and provide accurate information on provenance. ● https://www.go-fair.org/
  9. 9. FAIR data is about machine-readable data …
  10. 10. Novel possibilities… (for novel curiosity-driven research) Open science Traditional science Young Researcher
  11. 11. CAREER OPPORTUNITIES ● Skills for open research and open data are in increasing demand! ● Enables new research methodologies that were not possible before. ● Bring benefits for your career as a (young) researcher.
  12. 12. The dark side
  13. 13. REPRODUCIBILITY CRISIS "Scientific irreproducibility — the inability to repeat others' experiments and reach the same conclusion” (Nature 2016) Baker (2016) 1,500 scientists lift the lid on reproducibility. Nature. doi:10.1038/533452a
  14. 14. Baker (2016) 1,500 scientists lift the lid on reproducibility. Nature. doi:10.1038/533452a Scientific irreproducibility is a growing concern. Open Science solution: researchers to share their methods, data, computer code and results in central data repositories. For physical real-world samples we also need herbarium specimen and bio- repositories (museums).
  15. 15. WILL ANYBODY TRUST CLOSED SCIENCE AGAIN? ● Studies (1,2) indicates that p-hacking is a significant problem – sometimes even without the scientist even being aware of doing so. ● Pre-registered (open) data provides a good insurance against suspicion of both data dredging (and plain data falsification). ● “p-hacking” = occurs when researchers collect or select data or statistical analyses until nonsignificant results become significant (data fishing, …) (1) Head et al. (2015) The Extent and Consequences of P-Hacking in Science . PLoS Biol. doi:10.1371/journal.pbio.1002106 (2) Ioannidis (2005). "Why Most Published Research Findings Are False". PLoS Medicine. doi:10.1371/journal.pmed.0020124.
  16. 16. TOOLS FOR REUSE OF RESEARCH DATA
  17. 17. DATA MANAGEMENT PLAN (DMP) • A formal document that describes how data are to be handled during a research project, and after the project is completed. • The goal is to plan data management before the project begins. • Including a plan for the costs of data management and archiving. • This saves time in the long run and promotes data fitness for reuse. • Reduce duplication of existing scientific studies. • Reduce the loss of data. https://innsida.ntnu.no/wiki/-/wiki/English/Data+management+plan Illustration CC BY Jørgen Stamp
  18. 18. WHAT IS METADATA? • Metadata, literally means “data about data” are an essential component of a data management system, describing such aspects as the “what, where, when, who and how” pertaining to a resource.
  19. 19. METADATA SHOULD ALLOW A USER OF THE DATA TO • Identify & discover its existence • Learn how to access or acquire the data • Understand its fitness-for-use • Learn how to obtain a copy of the data • Learn how the data should be used
  20. 20. METADATA REDUCES “DATA ENTROPY” Illustration from: The Loss of Information about Data (Metadata) Over Time, Michener et al, 1997
  21. 21. WHAT IS A DATA PAPER? • A data paper is a peer reviewed document describing a dataset, published in a peer reviewed journal. • It takes effort to prepare, curate and describe data. • Data papers provide recognition for this effort by means of a scholarly article. • Getting scholarly recognition for improving the fitness for reuse for your own datasets. • Getting scholarly recognition for preparing and making legacy research data available for reuse. https://www.gbif.org/data-papers
  22. 22. Intergovernmental network and research infrastructure Provides anyone, anywhere, free and open access to data about all types of life on Earth Voluntary collaboration through Memorandum of Understanding (MoU) Participant nodes, Secretariat in Copenhagen, Denmark WHAT IS GBIF? https://www.gbif.org
  23. 23. GBIF PARTICIPANT COUNTRIES https://www.gbif.org/the-gbif-network 41 Voting Participants 21 Associate Countries 39 Other Associate 1704 Data publishers
  24. 24. A WINDOW ON EVIDENCE ABOUT WHERE SPECIES HAVE LIVED, AND WHEN https://www.gbif.org/occurrence/search Digitized specimens Observations Literature Remote-sensing Environmental DNA Common standards (DwC) Data publishing and indexing Data discovery and use
  25. 25. Darwin Core Research data portals GBIF: MULTIPLE-PURPOSE DATA PUBLISHING SERVICES portal Bio-Collections & ecology datasets Norwegian Red List EU Directive reporting (proposed workflow)
  26. 26. BY THE NUMBERS | 27 JANUARY 2021 62 Country Participants 39 Organizational Participants 5 373 Peer-review papers using data 1 650 849 882 Species occurrence records 55 902 Datasets 1 634 Publishers 23.6 billion Average records downloaded per month (2020)
  27. 27. BY THE NUMBERS | 27 JANUARY 2021 -- NORWAY 124 Peer-review papers using data (co-author from Norway 40 894 819 Species occurrence records (published from) 301 Datasets (published from) 38 Publishers (from Norway)
  28. 28. DATA TRENDS ON GBIF.org https://www.gbif.org/analytics/global % specimens
  29. 29. DATA RICHNESS LEVELS SUPPORTED BY GBIF https://www.gbif.org/dataset-classes Dataset description, taxonomic/geographic/temporal scope Dataset metadata M List of taxa regional or thematic (e.g. invasive, medicinal) Species checklists C Species occurrences and sampling events dates, coordinates, sampling effort / protocol, abundance Sampling-event data SE Species occurrences dates, coordinates, basis of record Occurrence-only data O
  30. 30. SPECIES OCCURRENCE RECORDS WITH MULTIMEDIA EVIDENCE 27th January 2021 75 million records with taxonomically identified images (1.8 million from Norway) • 41.1 million specimens (Norway: 884 763) • 31.3 million human observations (Norway: 923 763) • 1.4 million material samples (Norway: 38 386) 685 806 audio files (Norway: 3 566) 2 825 videos (Norway: 4) https://www.gbif.org/occurrence/gallery
  31. 31. SOURCES OF DATA IN GBIF: CITIZEN SCIENCE OBSERVATIONS
  32. 32. SOURCES OF DATA IN GBIF: DIGITIZED SPECIMENS FROM MUSEUM COLLECTIONS
  33. 33. SOURCES OF DATA IN GBIF: TAXONOMIC LITERATURE, OLD AND NEW Data liberation
  34. 34. GLOBAL BIODIVERSITY VS. DIGITALLY AVAILABLE DATA Image: FL Fawcett in Wheller Ann. Entomol. Soc. Am. 1990 Troudet et al. Nature Scientific Reports 2017 1200 mill. animals 300 m plants 20 m fungi 16 m bacteria 0,04 m virus
  35. 35. LATIN NAMES ARE RULED BY THE CODES Domain (Eukarya) Kingdom (Animalia) Phylum (Chordata) Class (Mammalia) Order (Primates) Family (Hominidae) Genus (Homo) Species (Homo sapiens) PhyloCode ICN ICZN
  36. 36. OTU = SH, Species hypothesis numbers [DOI] OTU = BIN, Barcode identification number GBIF backbone taxonomy BIN DEF0002 SH ABC0001 OTU = Operational Taxonomic Unit Species DNA Barcode
  37. 37. SOURCES OF DATA IN GBIF: DNA SEQUENCE-DERIVED OCCURRENCE DATA MGnify -- https://www.gbif.org/publisher/ab733144-7043-4e88-bd4f-fca7bf858880
  38. 38. NEW GBIF GUIDE: PUBLISHING SEQUENCE-DERIVED DATA THROUGH BIODIVERSITY DISCOVERY PLATFORMS • Authors from Australia, Norway, Sweden, Denmark, UNITE, and GBIF • Based on practical mapping and data publishing experiences • Cross-platform • About 40 pages long ”cookbook” v Introduction – refresh your ”data culinary” knowledge v Categorization – what ”data ingredients” you got to publish? v Mapping – choose and follow the ”recipe” v Visuals – clarity and guidelines v Future prospects v Resources: glossary, links, references Based on Darwin Core and MIxS data standards https://doi.org/10.35035/doc-vf1a-nr22
  39. 39. POLICY LINKS International treaties – national progress based on national GBIF data publication
  40. 40. POLICY LINKS: AICHI TARGETS - Trend in invasive alien species introductions (through Global Register of Introduced and Invasive Species) - Species Protection Index - Protected Area Representativeness Index - Comprehensiveness of conservation of socioeconomically/cu lturally valuable species - Agrobiodiversity Index - Crop Wild Relative Index - Growth in species occurrence records accessible through GBIF - Species Status Information Index https://www.cbd.int/cooperation/csp/gbif.shtml | https://www.cbd.int/csp/survey/GBIF.pdf
  41. 41. A DATA RESOURCE TO SUPPORT RESEARCH AND SUSTAINABLE DEVELOPMENT Conservation - Protected areas - Threatened species - Invasive species risk Food Security - Crop wild relatives - In situ, ex situ conservation of genetic diversity - Fisheries planning Climate change - Modelling impacts on species ranges - Adaptation strategies - Mitigation benefits, risks Human health - Disease risk based on occurrence of vectors, hosts, reservoirs - Medicinal plants - Hazards e.g. snakebite https://www.gbif.org/science-review
  42. 42. PEER-REVIEWED PUBLICATIONS USING GBIF-MEDIATED DATA September 2020 https://www.gbif.org/resource/search?contentType=literature&literatureType=journal&relevance=GBIF_USED&peerReview=true 626 52 89 148 169 229 249 350 407 428 696 676 743 938 0 200 400 600 800 1 000 1 200 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 Year-to-date Annual total (with projection for 2020) ~ 2-3 papers a day #CiteTheDOI
  43. 43. DATA USE IN PEER-REVIEWED JOURNALS https://www.gbif.org/resource/search?contentType=literature&year=2020&literatureType=journal&relevance=GBIF_USED&countriesOfResearcher=NO&peerReview=true GBIF-mediated data use citations 2020 2019 1 United States 260 215 2 China 192 175 3 Brazil 119 95 4 United Kingdom 107 89 5 Mexico 92 64 5 Germany 79 63 7 Spain 77 51 8 Canada 63 44 9 Australia 55 53 10 France 52 48 -- Norway 20 19 78 58 220 261 255 597 47 54 176 182 214 398 Oceania Africa Asia North America Latin America Europe 0 200 400 600 800 Peer-reviewed uses by region 2020 2019
  44. 44. HOW TO CITE DATA MEDIATED BY GBIF 1. Download data from GBIF.org 2. and receive recommended citation with a download DOI 3. Cite the DOI in published research or other work Example: GBIF.org (12 October 2020) GBIF Occurrence Download https//doi.org/10.15468/dl.xxxxxx https://www.gbif.org/citation-guidelines #CiteTheDOI
  45. 45. WHY CITE DATA? • Good academic practice for transparent and reproducible research • Credit institutions who shared data and supported your research • Help data publishing institutions to demonstrate value of digitization and data publication through research • Correct citation encourages data sharing • Data accessed through GBIF is free for all – but not free of obligations: see the user agreement https://www.gbif.org/citation-guidelines #CiteTheDOI
  46. 46. Filter Source dataset #1 Source dataset #2 Source dataset #3 GBIF download Process Archive Final state of data Dataset DOIs Download DOI Archive DOI Bibliographic DOI Analyze & publish
  47. 47. Filter Source dataset #1 Source dataset #2 Source dataset #3 GBIF download Process Archive Final state of data Dataset DOIs Download DOI Archive DOI Bibliographic DOI Analyze & publish
  48. 48. Filter Source dataset #1 Source dataset #2 Source dataset #3 GBIF download Process Archive Final state of data Dataset DOIs Download DOI Archive DOI Bibliographic DOI Analyze & publish (possible with persistent object identifiers)
  49. 49. DOI BASED DATA CITATION AT GBIF.ORG NTNU Vascular plants: https://doi.org/10.15468/zrlqok citations papers dataset
  50. 50. THE RESEARCH DATA LIFECYCLE https://library.sydney.edu.au/research/data-management/research-data-management.html GBIF.org
  51. 51. THANK YOU www.gbif.org Dag Endresen | GBIF Norway helpdesk@gbif.no

×