Biodiversity Discovery and Documentation in the Information and Attention Age Presented by: Rob Guralnick Authors: Rob Guralnick and Andrew Hill Contributors: Meredith Lane, Dan Janies, Walter Jetz, and lots of other folks. Funding support: Global Biodiversity Information Facility, National Biological Information Infrastructure, Defense Advanced Research Projects Agency, National Science Foundation. #ievobio
WHAT IS BIODIVERSITY DISCOVERY AND DOCUMENTATION? IMPEDIMENTS ACTION Linnean shortfall (too few taxonomists, antiquated and laborious process) Wallacean shortfall (very coarse resolution, scattered data, no integration) Darwinian shortfall (trees scattered in literature, no “mother of all trees”) Multiple repositories that do not communicate well storing genetic, phenotypic data. Phenotypic knowledge-bases lag behind. Discovering and documenting new units of biodiversity Discovering and documenting distributions of lineages Discovering and documenting relationship among lineages Discovering and documenting lineage traits from genomes to phenotype.
app Pace of Species Description and Documentation for 2008 Approx. 1.922 million named species (all taxa) ~4-30 million undiscovered From the State of Observed Species report, http://species.asu.edu/files/SOS2010.pdf
Pace of Species Description and Documentation for 2008 Assuming a relatively conservative number (eg. 10 million undescribed species), it will take another 360 years to discover and document them at our current pace. Why is discovery and documentation so slow? Taxonomists proceed in the same manner today as they did one hundred years ago. Few products are generated along the way. This also means the process is vulnerable (to loss of computers to the loss of taxonomists themselves). Discovery and documentation are coupled.
State and Scale of Knowledge in Environmental Sciences Slide from Walter Jetz (thanks Walter)
90 meter resolution SRTM elevation data for a Portion of Colorado A view of the world at different resolutions 100 times as coarse 1000 times as coarse
Documenting our biodiversity matters because it is under increasing threat.
“Overall, we are locked into a race. We must hurry to acquire the knowledge on which a wise policy of conservation and development can be based for centuries to come.” - E. O. Wilson
HOW DO WE DO THIS? DEVELOP KNOWLEDGEBASES OF SPECIES DISTRIBUTIONS AND SPECIES RELATIONSHIPS. PROVIDE MEANS TO INTEGRATE ACROSS THESE KNOWLEDGE-BASES PROVIDE TOOLS TO RAPIDLY AND EASILY EXLORE THESE DATA ACROSS SPACE AND TIME MAKE THIS A COMMUNITY EFFORT – LEVERAGE COMMUNITY SOURCING
Phylo-, Biodiversity and Ecological Informatics Analytical Methods means to summarize data & select hypotheses Growing Toolbox Application Services automated workflow for biodiversity science X Tools Encoding analytical methods X Initial Research Questions Raw global data lineage, occurrence, environmental New Research Questions X Concepts and ideas Processed global data Species, distributions, new envir. layers Growing data and information repositories
Growing data Repositories, formats Growing Toolbox Concepts and ideas Paup, Phyml/Raxml, MrBayes, Beast, Mesquite, etc. Tree of Life Population Genetics GenBank, TreeBase (Nexus/Newick/PhyloXML, etc) TCS/NCA, MsBayes. BayesSCC, Structure, etc. Inference-based Satellites (Modis/GOES/Landat, etc) Satellite image repositiories, Worldclim, PRISM , PMIP (erdapp, netCDF, GIS formats) Earth Surface Satellites; historical, current in-situ, GCMs, etc. Climate Infrared Imaging Spectrometer , etc Ecosystem fluxes Instrument-based raw Statistical/inferential TaxonX, automated Species name extraction ITIS, Catalog of Life, Zoobank, Zookeys, etc. Species named Lucid, Ontologies, RDF Species traits Species distributions Morphbank, TraitNet, etc. GIS, habitat suitability models, SDMs/ENMs, Survey Gap,etc. GBIF, VertNet, OBIS (species occurrence), Map of Life, IUCN Observations and model-based
The Interconnected Nature of Biodiversity Ideas, Outputs, Repositories From Peterson et al. In Press Systematics and Biodiversity
DECOUPLING SPECIES DISCOVERY AND DOCUMENTATION (OR GET IT OUT THERE FOR OTHERS TO USE AND REPURPOSE) (OR CLAIM NEW BIODIVERSITY, PROVISIONALLY, BEFORE FORMAL PUBLICATION) repositories Community sourcing Publish step 1 Generate new data from specimens Genbank Scratch- pads Morphbank Life-desks Comparartive analyses Treebase Link new unit of biodiversity onto tree of life (claim discovery) Publish step 2 Formal publication (documentation)
TAKE HOME MESSAGE 1: We need to use the web as a collaborative work environment for biodiversity knowledge generation We need to claim knowledge of the existence of new species before all of the formal steps to document it are complete We need to publish new data about species soon after generation and prior to publication
What about monitoring an evolving Earth System? Tracking the spread of disease lineages with known important mutations through time & space Questions:
How are drug resistant strains of H5N1 circulating around the globe?
How did drug resistance arise in the H5N1 population?
Are mutations that give rise to drug resistance in H5N1 under positive selection?
Can we provide ways for researchers and the general public to near real-time track this spread?
Hosts and strains of avian influenza A Viral structure
Test whether mutations on M2 gene (L26I, V27A/I, A30S, S31N) that provide resistance to adamantanes (a class of drugs used to treat influenza A) are under positive selection, purifying selection or are neutral (across the full sampled population of H5N1 inf. A)
TAKE HOME MESSAGES 3: Geophylogenies provide rich visualizations of multidimensional data that can be examined at multiple spatial (and temporal) scales Such visualizations may appeal beyond our community of evolutionary biologists to the broader scientific and policy community Automated approaches and workbench-oriented tools allow for updating, community-driven content to be generated Our ultimate goal should be an ever-growing “mother of all trees” from which we can attach new “twigs” as we discover them.
Can We Really Track Distributions of Lineages Through Space and Time?
improvement through modeling and community involvement both Map of Life Connections
Integrating phylogenetic and distributional data in GoogleEarthTM
Work Workflows combining phlyogenetic approaches, conservation status and species occurrence
TAKE HOME MESSAGES 4: Map of Life fills a critical gap in our global biodiversity knowledge by integrating different sources of species distribution into high resolution range maps for community use. The ultimate goal is to integrate such species distribution knowledge with knowledge about relationships among species and conservation knowledge Such integration, at global scale, and across large taxonomic groups, is the next step forward
Community Sourcing and the Attention Age At the heart of the message here today is also a challenge: The vision here suggests that data publishing and “sharing” is as important as academic “kudos” Can we act for collective good of our community and by so doing see gains for all? Lets change our model of credit!