SlideShare a Scribd company logo
Natural history research as a
replicable data science
Rutger Vos
Natural history museums
and collections
• Main goal is not to
exhibit but to collect and
curate specimens
• Usually multiple
specimens per species,
sometimes many more
• Specimens are research
and reference materials
Natural history research
To understand the patterns and processes of biodiversity
Biodiversity is expressed and studied in multiple ways:
- Species diversity, e.g.
counts of species, maybe
taking abundances into
account
- Phylogenetic diversity,
i.e. the evolutionary
distances between
species
- Functional diversity, i.e.
the ecological roles
species play, and the
characteristics associated
with that role
Natural history research
To understand the patterns and processes of biodiversity
The patterns and processes of biodiversity are systematized
as taking place:
- Within a given system (⍺
diversity), e.g. a biome
- Across systems (β
diversity, turnover)
- Among systems (ɣ
diversity, totality)
Natural history data
High dimensionality:
- Sequential
- Geospatial
- Morphological
DNA barcoding
• Some genes are variable so that a
few hundred letters suffice to identify
species
• In addition, barcodes are useful for
studying evolution and phylogeny
• Taking the barcode of a specimen (by
Sanger seq) is part of the workflow of
indexing collection specimens
Barcoding example: species boundaries in beetles
Pentinsaari, Vos & Mutanen. 2016. Algorithmic single-locus species
delimitation: effects of sampling effort, variation and nonmonophyly in four
methods and 1870 species of beetles. Molecular Ecology Resources 17(3):
393-404
Metabarcoding
• The species contents of organic mixtures can also be identified using
identifiable marker genes
• This is typically done using multiplexed, high-throughput (“next
generation”) sequencing
• Consequently, data storage and processing requirements are higher
Metabarcoding
examples: gut contents
of Ice Age grazers
Species distribution modelling
• Collection specimens are (ideally) stored with their collection locality
recorded as lat/lon coordinates
• Based on the localities where specimens were found, and geospatial
data layers (climate, land use, soil, etc.) a correlative model of the
species affinities can be constructed
• With such a model, habitat suitability and predictive scenarios (e.g.
climate change) can be projected
Biogeographic example: vulnerability of
European butterflies
Shapes, traits, and phenotypes
Natural history data
• Highest data volumes are HTS,
3D scanning, images
• High dimensionality at multiple
scales
• Many biases in species/locality
sampling
• Many axes are messy:
- Species names have been
changing for centuries
- Likewise place names
- Trait descriptions are often
ambiguous 3d.naturalis.nl
The Reproducibility Crisis
• More than 70%
of researchers
(n=1576) have
tried and failed to
reproduce
another
scientist's
experiments
• More than half
have failed to
reproduce their
own experiments
Reproducible data science
and cultural change
1. “Data available from the author upon request”
No: data are open, as FAIR as possible
2. “Data were processed with custom scripts”
No: scripts/workflows are open source
3. “Data were analyzed on a Pentium III 450 MHz…”
No: the environment can be cloned as VM
1. FAIR data management
Findable: increasing attention to
metadata, and discoverability
and indexing of data
Accessible: implementation of
resolvable identifiers, e.g.
PURLs and DOIs
Interoperable: increasing
attention for open community
standards (syntax) and
semantics
Re-usable: increasing attention
for data ownership and
licensing
2. Open source
Analytical code is no longer a folder on a postdoc’s laptop, it’s a code
repository with specific versions, documentation, tests, and a license
3. Virtualization
- Analyses are not run on dedicated hardware, e.g.
workstations, clusters, but in the (private) cloud
- Complex workflows are distributed as virtual machines,
docker containers, or deployed with devops tools
In closing
Thank you for
your attention

More Related Content

What's hot

New Systematics
New SystematicsNew Systematics
New Systematics
Manideep Raj
 
Bioinformatics t6-phylogenetics v2013-wim_vancriekinge
Bioinformatics t6-phylogenetics v2013-wim_vancriekingeBioinformatics t6-phylogenetics v2013-wim_vancriekinge
Bioinformatics t6-phylogenetics v2013-wim_vancriekinge
Prof. Wim Van Criekinge
 
Species concept
Species conceptSpecies concept
Species concept
Bio-Geek
 
species Concept
species Conceptspecies Concept
species Conceptkaakaawaah
 
Open Access Bhl Ia
Open Access Bhl IaOpen Access Bhl Ia
Open Access Bhl Ia
tgarnett
 
Modeling the biosphere: the natural historian's perspective
Modeling the biosphere: the natural historian's perspectiveModeling the biosphere: the natural historian's perspective
Modeling the biosphere: the natural historian's perspective
Rutger Vos
 
Animal Systematics Lecture 3
Animal Systematics Lecture 3Animal Systematics Lecture 3
Animal Systematics Lecture 3
Hamid Ur-Rahman
 
Species concept
Species conceptSpecies concept
Species concept
Noor Zada
 
The Good Species
The Good SpeciesThe Good Species
The Good Species
John Wilkins
 
Species concept
Species conceptSpecies concept
Species concept
Noor Zada
 
Higher taxa and higher category
Higher taxa and higher categoryHigher taxa and higher category
Higher taxa and higher category
Noor Zada
 
Evolução e Raças Humanas
Evolução e Raças HumanasEvolução e Raças Humanas
Evolução e Raças HumanasFlávia Smarti
 
Species Concepts And Speciation
Species Concepts And SpeciationSpecies Concepts And Speciation
Species Concepts And SpeciationMark McGinley
 
Species problem
Species problemSpecies problem
Species problem
Tahir Shahzad
 
Nomenclature for the Future: The power and challenges for stable and sensible...
Nomenclature for the Future: The power and challenges for stable and sensible...Nomenclature for the Future: The power and challenges for stable and sensible...
Nomenclature for the Future: The power and challenges for stable and sensible...ICZN
 
Phylogenetics
PhylogeneticsPhylogenetics
Phylogenetics
Afnan Zuiter
 
Quentin D. Wheeler - ZooBank and the Taxonomic Renaissance
Quentin D. Wheeler - ZooBank and the Taxonomic RenaissanceQuentin D. Wheeler - ZooBank and the Taxonomic Renaissance
Quentin D. Wheeler - ZooBank and the Taxonomic RenaissanceICZN
 

What's hot (20)

New Systematics
New SystematicsNew Systematics
New Systematics
 
Bioinformatics t6-phylogenetics v2013-wim_vancriekinge
Bioinformatics t6-phylogenetics v2013-wim_vancriekingeBioinformatics t6-phylogenetics v2013-wim_vancriekinge
Bioinformatics t6-phylogenetics v2013-wim_vancriekinge
 
Species concept
Species conceptSpecies concept
Species concept
 
species Concept
species Conceptspecies Concept
species Concept
 
Open Access Bhl Ia
Open Access Bhl IaOpen Access Bhl Ia
Open Access Bhl Ia
 
Modeling the biosphere: the natural historian's perspective
Modeling the biosphere: the natural historian's perspectiveModeling the biosphere: the natural historian's perspective
Modeling the biosphere: the natural historian's perspective
 
Animal Systematics Lecture 3
Animal Systematics Lecture 3Animal Systematics Lecture 3
Animal Systematics Lecture 3
 
Species concept
Species conceptSpecies concept
Species concept
 
The Good Species
The Good SpeciesThe Good Species
The Good Species
 
Evolution
EvolutionEvolution
Evolution
 
Species concept
Species conceptSpecies concept
Species concept
 
Higher taxa and higher category
Higher taxa and higher categoryHigher taxa and higher category
Higher taxa and higher category
 
Evolução e Raças Humanas
Evolução e Raças HumanasEvolução e Raças Humanas
Evolução e Raças Humanas
 
Species Concepts And Speciation
Species Concepts And SpeciationSpecies Concepts And Speciation
Species Concepts And Speciation
 
Species problem
Species problemSpecies problem
Species problem
 
Bi 2005 20
Bi 2005 20Bi 2005 20
Bi 2005 20
 
Nomenclature for the Future: The power and challenges for stable and sensible...
Nomenclature for the Future: The power and challenges for stable and sensible...Nomenclature for the Future: The power and challenges for stable and sensible...
Nomenclature for the Future: The power and challenges for stable and sensible...
 
Species
SpeciesSpecies
Species
 
Phylogenetics
PhylogeneticsPhylogenetics
Phylogenetics
 
Quentin D. Wheeler - ZooBank and the Taxonomic Renaissance
Quentin D. Wheeler - ZooBank and the Taxonomic RenaissanceQuentin D. Wheeler - ZooBank and the Taxonomic Renaissance
Quentin D. Wheeler - ZooBank and the Taxonomic Renaissance
 

Similar to Natural history research as a replicable data science

How can we release biodiversity data from herbarium specimens for climate cha...
How can we release biodiversity data from herbarium specimens for climate cha...How can we release biodiversity data from herbarium specimens for climate cha...
How can we release biodiversity data from herbarium specimens for climate cha...
redrinkwater
 
Webs of Life and Data: Impacts of open and networked data on scientific pract...
Webs of Life and Data: Impacts of open and networked data on scientific pract...Webs of Life and Data: Impacts of open and networked data on scientific pract...
Webs of Life and Data: Impacts of open and networked data on scientific pract...
Sarah Anna Stewart
 
Lab report instrukcije
Lab report instrukcijeLab report instrukcije
Lab report instrukcijeEna Lalic
 
Is there a bias in deep sea diversity patterns?
Is there a bias in deep sea diversity patterns?Is there a bias in deep sea diversity patterns?
Is there a bias in deep sea diversity patterns?Graeme Lloyd
 
Finding the annotation needs of the botanical community in a digital library
Finding the annotation needs of the botanical community in a digital libraryFinding the annotation needs of the botanical community in a digital library
Finding the annotation needs of the botanical community in a digital library
William Ulate
 
Phylogenomic methods for comparative evolutionary biology - University Colleg...
Phylogenomic methods for comparative evolutionary biology - University Colleg...Phylogenomic methods for comparative evolutionary biology - University Colleg...
Phylogenomic methods for comparative evolutionary biology - University Colleg...
Joe Parker
 
Interpreting ‘tree space’ in the context of very large empirical datasets
Interpreting ‘tree space’ in the context of very large empirical datasetsInterpreting ‘tree space’ in the context of very large empirical datasets
Interpreting ‘tree space’ in the context of very large empirical datasets
Joe Parker
 
importance of biology and biological observations.pptx
importance of biology and biological observations.pptximportance of biology and biological observations.pptx
importance of biology and biological observations.pptx
Sitamarhi Institute of Technology
 
Research Data Lifecycle: Role of Data Services
Research Data Lifecycle: Role of Data ServicesResearch Data Lifecycle: Role of Data Services
Research Data Lifecycle: Role of Data Services
Arhiv družboslovnih podatkov
 
Frontiers of discovery with Encyclopedia of Life
Frontiers of discovery with Encyclopedia of LifeFrontiers of discovery with Encyclopedia of Life
Frontiers of discovery with Encyclopedia of Life
Cyndy Parr
 
Fifty shades of evidence: A transdisciplinary research project on changing cl...
Fifty shades of evidence: A transdisciplinary research project on changing cl...Fifty shades of evidence: A transdisciplinary research project on changing cl...
Fifty shades of evidence: A transdisciplinary research project on changing cl...
Carina van Rooyen
 
Slicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology ResearchSlicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology Research
Marieke van Erp
 
Multiplying method: Ethnography and the reconceptualization of evaluation stu...
Multiplying method: Ethnography and the reconceptualization of evaluation stu...Multiplying method: Ethnography and the reconceptualization of evaluation stu...
Multiplying method: Ethnography and the reconceptualization of evaluation stu...
Gemma Derrick
 
Scratchpad 2014-introduction
Scratchpad 2014-introductionScratchpad 2014-introduction
Scratchpad 2014-introduction
Vince Smith
 
Thesis Proposal Presentations Sample.pdf.pptx
Thesis Proposal Presentations Sample.pdf.pptxThesis Proposal Presentations Sample.pdf.pptx
Thesis Proposal Presentations Sample.pdf.pptx
Maribeth Manuel
 
The Future of Microalgal Taxonomy
The Future of Microalgal TaxonomyThe Future of Microalgal Taxonomy
The Future of Microalgal Taxonomy
Anne Thessen
 
Research skills for Egyptology
Research skills for Egyptology Research skills for Egyptology
Research skills for Egyptology
Melanie Pitkin
 
Research Skills for Egyptology
Research Skills for EgyptologyResearch Skills for Egyptology
Research Skills for Egyptology
Melanie Pitkin
 
Ethics, Ethnography, Archeology
Ethics, Ethnography, ArcheologyEthics, Ethnography, Archeology
Ethics, Ethnography, Archeology
animation0118
 
Genomics and proteomics ppt
Genomics and proteomics pptGenomics and proteomics ppt
Genomics and proteomics ppt
PatelSupriya
 

Similar to Natural history research as a replicable data science (20)

How can we release biodiversity data from herbarium specimens for climate cha...
How can we release biodiversity data from herbarium specimens for climate cha...How can we release biodiversity data from herbarium specimens for climate cha...
How can we release biodiversity data from herbarium specimens for climate cha...
 
Webs of Life and Data: Impacts of open and networked data on scientific pract...
Webs of Life and Data: Impacts of open and networked data on scientific pract...Webs of Life and Data: Impacts of open and networked data on scientific pract...
Webs of Life and Data: Impacts of open and networked data on scientific pract...
 
Lab report instrukcije
Lab report instrukcijeLab report instrukcije
Lab report instrukcije
 
Is there a bias in deep sea diversity patterns?
Is there a bias in deep sea diversity patterns?Is there a bias in deep sea diversity patterns?
Is there a bias in deep sea diversity patterns?
 
Finding the annotation needs of the botanical community in a digital library
Finding the annotation needs of the botanical community in a digital libraryFinding the annotation needs of the botanical community in a digital library
Finding the annotation needs of the botanical community in a digital library
 
Phylogenomic methods for comparative evolutionary biology - University Colleg...
Phylogenomic methods for comparative evolutionary biology - University Colleg...Phylogenomic methods for comparative evolutionary biology - University Colleg...
Phylogenomic methods for comparative evolutionary biology - University Colleg...
 
Interpreting ‘tree space’ in the context of very large empirical datasets
Interpreting ‘tree space’ in the context of very large empirical datasetsInterpreting ‘tree space’ in the context of very large empirical datasets
Interpreting ‘tree space’ in the context of very large empirical datasets
 
importance of biology and biological observations.pptx
importance of biology and biological observations.pptximportance of biology and biological observations.pptx
importance of biology and biological observations.pptx
 
Research Data Lifecycle: Role of Data Services
Research Data Lifecycle: Role of Data ServicesResearch Data Lifecycle: Role of Data Services
Research Data Lifecycle: Role of Data Services
 
Frontiers of discovery with Encyclopedia of Life
Frontiers of discovery with Encyclopedia of LifeFrontiers of discovery with Encyclopedia of Life
Frontiers of discovery with Encyclopedia of Life
 
Fifty shades of evidence: A transdisciplinary research project on changing cl...
Fifty shades of evidence: A transdisciplinary research project on changing cl...Fifty shades of evidence: A transdisciplinary research project on changing cl...
Fifty shades of evidence: A transdisciplinary research project on changing cl...
 
Slicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology ResearchSlicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology Research
 
Multiplying method: Ethnography and the reconceptualization of evaluation stu...
Multiplying method: Ethnography and the reconceptualization of evaluation stu...Multiplying method: Ethnography and the reconceptualization of evaluation stu...
Multiplying method: Ethnography and the reconceptualization of evaluation stu...
 
Scratchpad 2014-introduction
Scratchpad 2014-introductionScratchpad 2014-introduction
Scratchpad 2014-introduction
 
Thesis Proposal Presentations Sample.pdf.pptx
Thesis Proposal Presentations Sample.pdf.pptxThesis Proposal Presentations Sample.pdf.pptx
Thesis Proposal Presentations Sample.pdf.pptx
 
The Future of Microalgal Taxonomy
The Future of Microalgal TaxonomyThe Future of Microalgal Taxonomy
The Future of Microalgal Taxonomy
 
Research skills for Egyptology
Research skills for Egyptology Research skills for Egyptology
Research skills for Egyptology
 
Research Skills for Egyptology
Research Skills for EgyptologyResearch Skills for Egyptology
Research Skills for Egyptology
 
Ethics, Ethnography, Archeology
Ethics, Ethnography, ArcheologyEthics, Ethnography, Archeology
Ethics, Ethnography, Archeology
 
Genomics and proteomics ppt
Genomics and proteomics pptGenomics and proteomics ppt
Genomics and proteomics ppt
 

More from Rutger Vos

Anna Karenina on hooves - what makes an animal fit for domestication?
Anna Karenina on hooves - what makes an animal fit for domestication?Anna Karenina on hooves - what makes an animal fit for domestication?
Anna Karenina on hooves - what makes an animal fit for domestication?
Rutger Vos
 
10 Misverstanden Over Evolutie
10 Misverstanden Over Evolutie10 Misverstanden Over Evolutie
10 Misverstanden Over Evolutie
Rutger Vos
 
Crash Course Biodiversiteit
Crash Course BiodiversiteitCrash Course Biodiversiteit
Crash Course Biodiversiteit
Rutger Vos
 
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
Rutger Vos
 
Robot eye for the butterfly
Robot eye for the butterflyRobot eye for the butterfly
Robot eye for the butterfly
Rutger Vos
 
Taxonomic classification of digitized specimens using machine learning
Taxonomic classification of digitized specimens using machine learningTaxonomic classification of digitized specimens using machine learning
Taxonomic classification of digitized specimens using machine learning
Rutger Vos
 
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
Rutger Vos
 
Assembling the Tree of Life from public DNA sequence data
Assembling the Tree of Life from public DNA sequence dataAssembling the Tree of Life from public DNA sequence data
Assembling the Tree of Life from public DNA sequence data
Rutger Vos
 
Hoe leer je een robot soorten te herkennen?
Hoe leer je een robot soorten te herkennen?Hoe leer je een robot soorten te herkennen?
Hoe leer je een robot soorten te herkennen?
Rutger Vos
 
Kunnen we een tomaat van 400 jaar oud proeven
Kunnen we een tomaat van 400 jaar oud proevenKunnen we een tomaat van 400 jaar oud proeven
Kunnen we een tomaat van 400 jaar oud proeven
Rutger Vos
 
PhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integrationPhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integration
Rutger Vos
 
SUPERSMART pipeline intro
SUPERSMART pipeline introSUPERSMART pipeline intro
SUPERSMART pipeline intro
Rutger Vos
 
Reconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomicsReconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomics
Rutger Vos
 
Synthesising disparate data resources to obtain composite estimates of geophy...
Synthesising disparate data resources to obtain composite estimates of geophy...Synthesising disparate data resources to obtain composite estimates of geophy...
Synthesising disparate data resources to obtain composite estimates of geophy...
Rutger Vos
 
The Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environmentThe Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environment
Rutger Vos
 
Retrieving useful information from connected specimen- and data collections
Retrieving useful information from connected specimen- and data collectionsRetrieving useful information from connected specimen- and data collections
Retrieving useful information from connected specimen- and data collections
Rutger Vos
 
NeXML - phylogenetic data as XML
NeXML - phylogenetic data as XMLNeXML - phylogenetic data as XML
NeXML - phylogenetic data as XML
Rutger Vos
 
Vos at NCB Naturalis
Vos at NCB NaturalisVos at NCB Naturalis
Vos at NCB NaturalisRutger Vos
 
Tree of Life
Tree of LifeTree of Life
Tree of Life
Rutger Vos
 
Perl for Phyloinformatics
Perl for PhyloinformaticsPerl for Phyloinformatics
Perl for Phyloinformatics
Rutger Vos
 

More from Rutger Vos (20)

Anna Karenina on hooves - what makes an animal fit for domestication?
Anna Karenina on hooves - what makes an animal fit for domestication?Anna Karenina on hooves - what makes an animal fit for domestication?
Anna Karenina on hooves - what makes an animal fit for domestication?
 
10 Misverstanden Over Evolutie
10 Misverstanden Over Evolutie10 Misverstanden Over Evolutie
10 Misverstanden Over Evolutie
 
Crash Course Biodiversiteit
Crash Course BiodiversiteitCrash Course Biodiversiteit
Crash Course Biodiversiteit
 
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
 
Robot eye for the butterfly
Robot eye for the butterflyRobot eye for the butterfly
Robot eye for the butterfly
 
Taxonomic classification of digitized specimens using machine learning
Taxonomic classification of digitized specimens using machine learningTaxonomic classification of digitized specimens using machine learning
Taxonomic classification of digitized specimens using machine learning
 
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
 
Assembling the Tree of Life from public DNA sequence data
Assembling the Tree of Life from public DNA sequence dataAssembling the Tree of Life from public DNA sequence data
Assembling the Tree of Life from public DNA sequence data
 
Hoe leer je een robot soorten te herkennen?
Hoe leer je een robot soorten te herkennen?Hoe leer je een robot soorten te herkennen?
Hoe leer je een robot soorten te herkennen?
 
Kunnen we een tomaat van 400 jaar oud proeven
Kunnen we een tomaat van 400 jaar oud proevenKunnen we een tomaat van 400 jaar oud proeven
Kunnen we een tomaat van 400 jaar oud proeven
 
PhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integrationPhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integration
 
SUPERSMART pipeline intro
SUPERSMART pipeline introSUPERSMART pipeline intro
SUPERSMART pipeline intro
 
Reconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomicsReconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomics
 
Synthesising disparate data resources to obtain composite estimates of geophy...
Synthesising disparate data resources to obtain composite estimates of geophy...Synthesising disparate data resources to obtain composite estimates of geophy...
Synthesising disparate data resources to obtain composite estimates of geophy...
 
The Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environmentThe Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environment
 
Retrieving useful information from connected specimen- and data collections
Retrieving useful information from connected specimen- and data collectionsRetrieving useful information from connected specimen- and data collections
Retrieving useful information from connected specimen- and data collections
 
NeXML - phylogenetic data as XML
NeXML - phylogenetic data as XMLNeXML - phylogenetic data as XML
NeXML - phylogenetic data as XML
 
Vos at NCB Naturalis
Vos at NCB NaturalisVos at NCB Naturalis
Vos at NCB Naturalis
 
Tree of Life
Tree of LifeTree of Life
Tree of Life
 
Perl for Phyloinformatics
Perl for PhyloinformaticsPerl for Phyloinformatics
Perl for Phyloinformatics
 

Recently uploaded

Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
anitaento25
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
sachin783648
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
AADYARAJPANDEY1
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
Sérgio Sacani
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
Sérgio Sacani
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
sonaliswain16
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SELF-EXPLANATORY
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
Richard Gill
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
Health Advances
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 

Recently uploaded (20)

Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 

Natural history research as a replicable data science

  • 1. Natural history research as a replicable data science Rutger Vos
  • 2. Natural history museums and collections • Main goal is not to exhibit but to collect and curate specimens • Usually multiple specimens per species, sometimes many more • Specimens are research and reference materials
  • 3. Natural history research To understand the patterns and processes of biodiversity Biodiversity is expressed and studied in multiple ways: - Species diversity, e.g. counts of species, maybe taking abundances into account - Phylogenetic diversity, i.e. the evolutionary distances between species - Functional diversity, i.e. the ecological roles species play, and the characteristics associated with that role
  • 4. Natural history research To understand the patterns and processes of biodiversity The patterns and processes of biodiversity are systematized as taking place: - Within a given system (⍺ diversity), e.g. a biome - Across systems (β diversity, turnover) - Among systems (ɣ diversity, totality)
  • 5. Natural history data High dimensionality: - Sequential - Geospatial - Morphological
  • 6. DNA barcoding • Some genes are variable so that a few hundred letters suffice to identify species • In addition, barcodes are useful for studying evolution and phylogeny • Taking the barcode of a specimen (by Sanger seq) is part of the workflow of indexing collection specimens
  • 7. Barcoding example: species boundaries in beetles Pentinsaari, Vos & Mutanen. 2016. Algorithmic single-locus species delimitation: effects of sampling effort, variation and nonmonophyly in four methods and 1870 species of beetles. Molecular Ecology Resources 17(3): 393-404
  • 8. Metabarcoding • The species contents of organic mixtures can also be identified using identifiable marker genes • This is typically done using multiplexed, high-throughput (“next generation”) sequencing • Consequently, data storage and processing requirements are higher
  • 10. Species distribution modelling • Collection specimens are (ideally) stored with their collection locality recorded as lat/lon coordinates • Based on the localities where specimens were found, and geospatial data layers (climate, land use, soil, etc.) a correlative model of the species affinities can be constructed • With such a model, habitat suitability and predictive scenarios (e.g. climate change) can be projected
  • 11. Biogeographic example: vulnerability of European butterflies
  • 12. Shapes, traits, and phenotypes
  • 13. Natural history data • Highest data volumes are HTS, 3D scanning, images • High dimensionality at multiple scales • Many biases in species/locality sampling • Many axes are messy: - Species names have been changing for centuries - Likewise place names - Trait descriptions are often ambiguous 3d.naturalis.nl
  • 14. The Reproducibility Crisis • More than 70% of researchers (n=1576) have tried and failed to reproduce another scientist's experiments • More than half have failed to reproduce their own experiments
  • 15. Reproducible data science and cultural change 1. “Data available from the author upon request” No: data are open, as FAIR as possible 2. “Data were processed with custom scripts” No: scripts/workflows are open source 3. “Data were analyzed on a Pentium III 450 MHz…” No: the environment can be cloned as VM
  • 16. 1. FAIR data management Findable: increasing attention to metadata, and discoverability and indexing of data Accessible: implementation of resolvable identifiers, e.g. PURLs and DOIs Interoperable: increasing attention for open community standards (syntax) and semantics Re-usable: increasing attention for data ownership and licensing
  • 17. 2. Open source Analytical code is no longer a folder on a postdoc’s laptop, it’s a code repository with specific versions, documentation, tests, and a license
  • 18. 3. Virtualization - Analyses are not run on dedicated hardware, e.g. workstations, clusters, but in the (private) cloud - Complex workflows are distributed as virtual machines, docker containers, or deployed with devops tools
  • 20. Thank you for your attention