RPG iEvoBio 2010 Keynote

Biodiversity Discovery and Documentation in the Information and Attention Age Presented by: Rob Guralnick Authors: Rob Guralnick and Andrew Hill Contributors: Meredith Lane, Dan Janies, Walter Jetz, and lots of other folks. Funding support: Global Biodiversity Information Facility, National Biological Information Infrastructure, Defense Advanced Research Projects Agency, National Science Foundation. #ievobio

WHAT IS BIODIVERSITY DISCOVERY AND DOCUMENTATION? IMPEDIMENTS ACTION Linnean shortfall (too few taxonomists, antiquated and laborious process) Wallacean shortfall (very coarse resolution, scattered data, no integration) Darwinian shortfall (trees scattered in literature, no “mother of all trees”) Multiple repositories that do not communicate well storing genetic, phenotypic data. Phenotypic knowledge-bases lag behind. Discovering and documenting new units of biodiversity Discovering and documenting distributions of lineages Discovering and documenting relationship among lineages Discovering and documenting lineage traits from genomes to phenotype.

app Pace of Species Description and Documentation for 2008 Approx. 1.922 million named species (all taxa) ~4-30 million undiscovered From the State of Observed Species report, http://species.asu.edu/files/SOS2010.pdf

Pace of Species Description and Documentation for 2008 Assuming a relatively conservative number (eg. 10 million undescribed species), it will take another 360 years to discover and document them at our current pace. Why is discovery and documentation so slow? Taxonomists proceed in the same manner today as they did one hundred years ago. Few products are generated along the way. This also means the process is vulnerable (to loss of computers to the loss of taxonomists themselves). Discovery and documentation are coupled.

State and Scale of Knowledge in Environmental Sciences Slide from Walter Jetz (thanks Walter)

90 meter resolution SRTM elevation data for a Portion of Colorado A view of the world at different resolutions 100 times as coarse 1000 times as coarse

Distribution Knowledge Is Scattered ,[object Object], data portal ,[object Object], map from IUCN Red List ,[object Object], habitat preferences (cropland, meadows, mountain valleys) Microtusmontanus

Documenting our biodiversity matters because it is under increasing threat.

“Overall, we are locked into a race. We must hurry to acquire the knowledge on which a wise policy of conservation and development can be based for centuries to come.” - E. O. Wilson

HOW DO WE DO THIS? DEVELOP KNOWLEDGEBASES OF SPECIES DISTRIBUTIONS AND SPECIES RELATIONSHIPS. PROVIDE MEANS TO INTEGRATE ACROSS THESE KNOWLEDGE-BASES PROVIDE TOOLS TO RAPIDLY AND EASILY EXLORE THESE DATA ACROSS SPACE AND TIME MAKE THIS A COMMUNITY EFFORT – LEVERAGE COMMUNITY SOURCING

Phylo-, Biodiversity and Ecological Informatics Analytical Methods means to summarize data & select hypotheses Growing Toolbox Application Services automated workflow for biodiversity science X Tools Encoding analytical methods X Initial Research Questions Raw global data lineage, occurrence, environmental New Research Questions X Concepts and ideas Processed global data Species, distributions, new envir. layers Growing data and information repositories

Growing data Repositories, formats Growing Toolbox Concepts and ideas Paup, Phyml/Raxml, MrBayes, Beast, Mesquite, etc. Tree of Life Population Genetics GenBank, TreeBase (Nexus/Newick/PhyloXML, etc) TCS/NCA, MsBayes. BayesSCC, Structure, etc. Inference-based Satellites (Modis/GOES/Landat, etc) Satellite image repositiories, Worldclim, PRISM , PMIP (erdapp, netCDF, GIS formats) Earth Surface Satellites; historical, current in-situ, GCMs, etc. Climate Infrared Imaging Spectrometer , etc Ecosystem fluxes Instrument-based raw Statistical/inferential TaxonX, automated Species name extraction ITIS, Catalog of Life, Zoobank, Zookeys, etc. Species named Lucid, Ontologies, RDF Species traits Species distributions Morphbank, TraitNet, etc. GIS, habitat suitability models, SDMs/ENMs, Survey Gap,etc. GBIF, VertNet, OBIS (species occurrence), Map of Life, IUCN Observations and model-based

The Interconnected Nature of Biodiversity Ideas, Outputs, Repositories From Peterson et al. In Press Systematics and Biodiversity

DECOUPLING SPECIES DISCOVERY AND DOCUMENTATION (OR GET IT OUT THERE FOR OTHERS TO USE AND REPURPOSE) (OR CLAIM NEW BIODIVERSITY, PROVISIONALLY, BEFORE FORMAL PUBLICATION) repositories Community sourcing Publish step 1 Generate new data from specimens Genbank Scratch- pads Morphbank Life-desks Comparartive analyses Treebase Link new unit of biodiversity onto tree of life (claim discovery) Publish step 2 Formal publication (documentation)

TAKE HOME MESSAGE 1: We need to use the web as a collaborative work environment for biodiversity knowledge generation We need to claim knowledge of the existence of new species before all of the formal steps to document it are complete We need to publish new data about species soon after generation and prior to publication

What about monitoring an evolving Earth System? Tracking the spread of disease lineages with known important mutations through time & space Questions: ,[object Object]

How did drug resistance arise in the H5N1 population?

Are mutations that give rise to drug resistance in H5N1 under positive selection?

Can we provide ways for researchers and the general public to near real-time track this spread?Hosts and strains of avian influenza A Viral structure

Methods: ,[object Object], influenza (676 full genomes). ,[object Object],phylogeneticanalysis of data ,[object Object]

Make GoogleEarthTMvizualizations available.

Global View of Spread of H5N1 (blue branches are lineages with mutation for higher transmissibility among mammals)

Resistant mutant found at position 31 of the M2 protein – colored red below Altitude of node X = a+ [(n− 1) ×b]

Dn/ds measurements across the M2 protein (high Dn/ds ratios (>1) suggest that more non-synonymous substitutions are occurring than expected and therefore are likely being maintained in population)

Table 2Amantadine use in chicken farms in Northern China in 1 year (from October 2004 to September 2005) From He, 2007, Antiviral Research

So What Did We Find Out? ,[object Object], for at least some mutations (S31N and V27A/I). ,[object Object]

Emergence of drug resistance has been through mutation not recombination and hitch-hiking (results not shown) ,[object Object], continued monitoring of evolution and spread of resistance to adamantanesand oseltimivir(Tamiflu)

TAKE HOME MESSAGES 2: ,[object Object], evolving lineages. ,[object Object], evolution, selection and adaptation. ,[object Object]

Developing such a system means creating automated workflows. ,[object Object]

WHAT ABOUT ALLOWING OTHERS TO MAKE THEIR OWN GEOPHYLOGENY? http:// geophylo.appspot.com/ Hill and Guralnick, in press, Ecography Google App Engine application

GeoPhylo Engine - Written in Python, open source, and deployed on Google App Engine. Advantages of cloud-based deployment: ,[object Object]

All versioning kept intact so developers can easily link to latest and greatest

Storage of persistent KMLs for users who want to share and modify their KMLs.

Easily deployable as a web service ,[object Object]

Can We Really Track Distributions of Lineages Through Space and Time?

Map of Life Will: ,[object Object]

Provide means for the community to annotate those maps

Assemble point occurences, habitat preference data and environmental data (e.g. climate, landcover, soil, etc)

Provide a modeling approach to generate much finer scale distribution models (on the order of a kilometer resolution),[object Object]

RPG iEvoBio 2010 Keynote

Recommended

Recommended

More Related Content

What's hot

What's hot (16)

Viewers also liked

Viewers also liked (8)

Similar to RPG iEvoBio 2010 Keynote

Similar to RPG iEvoBio 2010 Keynote (20)

Recently uploaded

Recently uploaded (20)

RPG iEvoBio 2010 Keynote