Biodiversity Discovery and Documentation in the Information and Attention Age<br />Presented by: Rob Guralnick<br />Authors: Rob Guralnick and Andrew Hill<br />Contributors: Meredith Lane, Dan Janies, Walter Jetz, and lots of other folks.<br />Funding support: Global Biodiversity Information Facility, National Biological<br />Information Infrastructure, Defense Advanced Research Projects Agency, <br />National Science Foundation.<br />#ievobio<br />
WHAT IS BIODIVERSITY DISCOVERY AND DOCUMENTATION?<br />IMPEDIMENTS<br />ACTION<br />Linnean shortfall (too few taxonomists,<br />antiquated and laborious process)<br />Wallacean shortfall (very coarse <br />resolution, scattered data, no integration)<br />Darwinian shortfall (trees scattered in <br />literature, no “mother of all trees”)<br />Multiple repositories that do not <br />communicate well storing genetic, <br />phenotypic data. Phenotypic <br />knowledge-bases lag behind.<br />Discovering and documenting<br /> new units of biodiversity<br />Discovering and documenting<br />distributions of lineages<br />Discovering and documenting<br />relationship among lineages<br />Discovering and documenting<br />lineage traits from genomes<br />to phenotype.<br />
app<br />Pace of Species Description and Documentation for 2008<br />Approx. 1.922 million named species (all taxa)<br />~4-30 million undiscovered<br />From the State of Observed Species report, http://species.asu.edu/files/SOS2010.pdf<br />
Pace of Species Description and Documentation for 2008<br />Assuming a relatively conservative number <br />(eg. 10 million undescribed species), it will take another<br />360 years to discover and document them at our current <br />pace. Why is discovery and documentation so slow?<br />Taxonomists proceed in the same manner today as they did one hundred years ago.<br />Few products are generated along the way. This also means the process is vulnerable (to loss of computers to the loss of taxonomists themselves).<br />Discovery and documentation are coupled. <br />
State and Scale of Knowledge in Environmental Sciences<br />Slide from Walter Jetz (thanks Walter) <br />
90 meter<br />resolution <br />SRTM <br />elevation<br />data for a <br />Portion of <br />Colorado<br />A view of the<br />world at different<br />resolutions<br />100 times<br />as coarse<br />1000 times<br />as coarse<br />
Distribution Knowledge Is Scattered<br /><ul><li> Points are from GBIF</li></ul> data portal<br /><ul><li> Expert opinion range </li></ul> map from IUCN Red <br /> List<br /><ul><li> IUCN also lists some </li></ul> habitat preferences <br /> (cropland, meadows, <br /> mountain valleys)<br />Microtusmontanus<br />
Documenting our <br />biodiversity matters<br />because it is under<br />increasing threat. <br />
“Overall, we are locked into a race. We must hurry to acquire the knowledge on which a wise policy of conservation and development can be based for centuries to come.”<br />- E. O. Wilson<br />
HOW DO WE DO THIS? <br />DEVELOP KNOWLEDGEBASES OF SPECIES <br />DISTRIBUTIONS AND SPECIES RELATIONSHIPS. <br />PROVIDE MEANS TO INTEGRATE ACROSS<br /> THESE KNOWLEDGE-BASES<br />PROVIDE TOOLS TO RAPIDLY AND EASILY <br />EXLORE THESE DATA ACROSS SPACE AND TIME<br />MAKE THIS A COMMUNITY EFFORT – <br />LEVERAGE COMMUNITY SOURCING <br />
Phylo-, Biodiversity and <br />Ecological Informatics<br />Analytical Methods<br />means to summarize data & select hypotheses<br />Growing <br />Toolbox<br />Application Services<br />automated workflow for biodiversity science<br />X<br />Tools<br />Encoding analytical methods<br />X<br />Initial Research Questions<br />Raw global data<br />lineage, occurrence,<br />environmental<br />New Research Questions<br />X<br />Concepts <br />and ideas<br />Processed global data <br />Species, distributions, new envir. layers<br />Growing data<br />and information<br />repositories<br />
DECOUPLING SPECIES DISCOVERY AND DOCUMENTATION<br />(OR GET IT OUT THERE FOR OTHERS TO USE AND REPURPOSE)<br />(OR CLAIM NEW BIODIVERSITY, PROVISIONALLY, BEFORE FORMAL PUBLICATION)<br />repositories<br />Community <br />sourcing<br />Publish step 1<br />Generate new data <br />from specimens<br />Genbank<br />Scratch-<br />pads<br />Morphbank<br />Life-desks<br />Comparartive<br />analyses<br />Treebase<br />Link new unit of <br />biodiversity <br />onto tree of life<br />(claim discovery)<br />Publish step 2<br />Formal publication <br />(documentation)<br />
TAKE HOME MESSAGE 1:<br />We need to use the web as a collaborative work environment for biodiversity knowledge generation<br />We need to claim knowledge of the existence of new species before all of the formal steps to document it are complete<br />We need to publish new data about species soon after generation and prior to publication<br />
What about monitoring an evolving Earth System?<br />Tracking the spread of disease lineages with known important mutations through time & space<br />Questions:<br /><ul><li>How are drug resistant strains of H5N1 circulating around the globe?
How did drug resistance arise in the H5N1 population?
Are mutations that give rise to drug resistance in H5N1 under positive selection?
Can we provide ways for researchers and the general public to near real-time track this spread?</li></ul>Hosts and strains of avian influenza A<br />Viral structure<br />
Methods:<br /><ul><li>Collect public genome data for H5N1 avian </li></ul> influenza (676 full genomes).<br /><ul><li>Use tools for more efficient alignment and</li></ul>phylogeneticanalysis of data <br /><ul><li>Test whether mutations on M2 gene (L26I, V27A/I, A30S, S31N) that provide resistance to adamantanes (a class of drugs used to treat influenza A) are under positive selection, purifying selection or are neutral (across the full sampled population of H5N1 inf. A)
Make GoogleEarthTMvizualizations available</li></ul>.<br />
Global View of Spread of H5N1 (blue branches are lineages with mutation for higher transmissibility among mammals) <br />
Resistant mutant found at position 31 of the M2 protein – colored red below<br />Altitude of node X = a+ [(n− 1) ×b]<br />
Dn/ds measurements across the M2 protein (high Dn/ds ratios (>1) suggest that more non-synonymous substitutions are occurring than expected and therefore are likely being maintained in population)<br />
Table 2Amantadine use in chicken farms in Northern China in 1 year (from October 2004 to September 2005)<br />From He, 2007, Antiviral Research<br />
So What Did We Find Out?<br /><ul><li> Drug resistance to adamantanes is under positive selection </li></ul> for at least some mutations (S31N and V27A/I).<br /><ul><li> Drug resistant lineages can spread quickly across the globe
Emergence of drug resistance has been through mutation not </li></ul> recombination and hitch-hiking (results not shown)<br /><ul><li>Effectively treating a potential H5N1 pandemic is based on </li></ul> continued monitoring of evolution and spread of resistance to <br />adamantanesand oseltimivir(Tamiflu) <br />
TAKE HOME MESSAGES 2:<br /><ul><li> It is possible to not just develop observing systems of species but of </li></ul> evolving lineages.<br /><ul><li> These monitoring or observing systems can provide a unique view into </li></ul> evolution, selection and adaptation.<br /><ul><li> Such systems are essential for more accurate forecasting.
Developing such a system means creating automated workflows. </li></li></ul><li>
WHAT ABOUT ALLOWING OTHERS TO MAKE THEIR OWN GEOPHYLOGENY?<br />http:// geophylo.appspot.com/<br />Hill and Guralnick, in press, Ecography<br />Google App Engine application<br />
GeoPhylo Engine - Written in Python, open source, <br />and deployed on Google App Engine. <br />Advantages of cloud-based deployment:<br /><ul><li>Scalable (near infinite computation resources)
All versioning kept intact so developers can easily link to latest and greatest
Storage of persistent KMLs for users who want to share and modify their KMLs.
Easily deployable as a web service </li></li></ul><li>TAKE HOME MESSAGES 3:<br />Geophylogenies provide rich visualizations of multidimensional data that can be examined at multiple spatial (and temporal) scales <br />Such visualizations may appeal beyond our community of evolutionary biologists to the broader scientific and policy community<br />Automated approaches and workbench-oriented tools allow for updating, community-driven content to be generated<br />Our ultimate goal should be an ever-growing “mother of all trees” from which we can attach new “twigs” as we discover them.<br />
Can We Really Track Distributions of Lineages Through Space and Time?<br />
Map of Life Will:<br /><ul><li>Provide expert opinion range maps for almost all terrestrial vertebrates (and means to accumulate more maps for other taxa)
Provide means for the community to annotate those maps
Assemble point occurences, habitat preference data and environmental data (e.g. climate, landcover, soil, etc)
Provide a modeling approach to generate much finer scale distribution models (on the order of a kilometer resolution)</li></li></ul><li>
<ul><li> Common data model </li></ul> for range maps<br /><ul><li> Web-services based</li></ul> for sharing maps<br /><ul><li> Focus on </li></ul> improvement <br /> through <br /> modeling and <br /> community <br /> involvement both<br />Map of Life Connections <br />
Integrating phylogenetic and distributional data in GoogleEarthTM<br />
TAKE HOME MESSAGES 4:<br />Map of Life fills a critical gap in our global biodiversity knowledge by integrating different sources of species distribution into high resolution range maps for community use. <br />The ultimate goal is to integrate such species distribution knowledge with knowledge about relationships among species and conservation knowledge<br />Such integration, at global scale, and across large taxonomic groups, is the next step forward<br />
Community Sourcing and the Attention Age<br />At the heart of the message here today is also a challenge:<br />The vision here suggests that data publishing and “sharing” is as important as academic “kudos”<br />Can we act for collective good of our community and by so doing see gains for all? <br />Lets change our model of credit! <br />
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.