Data Intensive Challenges in Biodiversity Conservation Steve Kelling
Environmental Science Challenges Climate Change Biodiversity Loss Invasive Species Water Depletion Disease Spread Green Energy Habitat Loss ---
Habitat Loss From: University of California Press Blog Earth Day 2010 Habitat loss is the major issue for Biodiversity Conservation.
The increasing availability of massive volumes of scientific data requires new synthetic analysis techniques to explore and identify interesting patterns that were otherwise not apparent. For biodiversity studies a “data driven” approach is necessary due to the complexity of ecological systems, particularly when viewed at large spatial and temporal scales.
Presentation Goals: Observation Networks Description of eBird: http://www.ebird.org Species Distribution Models Description of the Avian Knowledge Network: http://avianknowledge.net Data Intensive Science Description of the outcomes of the DataONE Exploration, Visualization, and Analysis Working Group
eBird is a global online program that gathers bird observations from citizen scientists, predominately across the Western Hemisphere. eBird gathers checklists of birds with associated effort information from well-defined locations, passing each record through a two-tiered verification system. ebird is a joint project between the Cornell Lab of Ornithology and National Audubon Society, and has more than 2 dozen regional partners. Sullivan, B.L., C.L. Wood, m.J. Iliff, R.E. Bonney, D. Fink, and S. Kelling. 2009. eBird: A citizen-based bird observation network in the biological sciences. Biological Conservation 142: 2282-2292.
eBird uses Crowdsourcing techniques to gather observations of birds. Crowdsourcing is the act of outsourcing tasks, traditionally performed by an employee or contractor, to an undefined, large group of people or community (a "crowd"), through an open call. Jeff Howe, one of first authors to employ the term, established that the concept of crowdsourcing depends essentially on the fact that because it is an open call to an undefined group of people, it gathers those who are most fit to perform tasks, solve complex problems and contribute with the most relevant and fresh ideas. For example, the public may be invited to develop a new technology, carry out a design task, refine or carry out the steps of an algorithm, or help capture, systematize or analyze large amounts of data (CITIZEN SCIENCE). (From Wikipedia)
eBird Checklists Volunteers submit checklists of bird observations from specific locations using protocols that collect information on data, time, and distance traveled.
Flagged Records 4% submitted records were flagged for review 60% of those records were reviewed and validated eBird contains a two-stag verification system: Instantaneous automated evaluation of submissions based on species count limits for a given data and location; A growing network of more than 500 regional editors composed of local experts who vet records flagged by the automated filters.
Understanding our Audience eBird is building a web‐enabled community of bird watchers who collect, manage, and store their observations in a globally accessible unified database. Through its development as a tool that addresses the needs of the birding community, eBird sustains and grows participation. Give Birders What They Want!
eBird contains an array of data visualization and analysis tools that provide birders, land managers, and scientists with summary information about bird distribution.
Sooty Shearwater eBird data can be used to examine the timing of migration across large geographic areas. Because each eBird observation is recorded at a specific location, eBird can generate maps depicting species distribution at multiple spatio‐temporal scales.
Bird Occurrence Patterns in Upstate New York eBird provides ‘‘bar charts” (i.e., frequency histograms) based on frequency of detection for individual species. These visualizations provide users with occurrence information at specific locations at 1‐week increments and indicate the likelihood of detecting a species based on its frequency in that area (darker and wider bars indicate increased frequency).
Growth in eBird Observations and Checklists eBird 2.0 launch Observations Checklists 2011
Statistics 2010 More than… 18,214, 480 observations submitted d 1,300,029 hours collecting bird observations. 1,293,480 checklists entered 22,136 contributors 351,000 unique visitors to eBird 20 million page views
Estimating Species Distributions Determining the patterns of species occurrence through time, space, and understanding their links with features of the environment are central themes in ecology. Identifying the factors that influence species distributions is a complex task, requiring the examination of multiple facets of a species’ natural history and their relationships with the complex and variable environments which they live. Fink, D., W. M. Hochachka, D. Winkler, B. Shaby, G. Hooker, B. Zuckerberg, M. A. Munson, D. Sheldon, M. Riedewald, and S. Kelling. 2010. Spatiotemporal Exploratory models for Large‐scale Survey Data. Ecological Applications 20:2131‐2147.
Observational Data Model The most crucial aspect of predicting species occurrence is to learn a model—called the observation model—from observed measurements and make probabilistic inferences over regions or variables where measurements were not made. This approach joins organism observations with a multitude of "drivers", covariates that could potentially influence the occurrence of the organism. While a single (or a few sources) of noisy observations may not be sufficient to accurately model distributions, combining many measurements (e.g., species occurrence, weather, organism occurrence, landscape mosaic, human population data etc.), greatly improves the accuracy of the models.
Munson, M. A., K. Webb, D. Sheldon, D. Fink, W. M. Hochachka, M. J. Iliff, M. Riedewald, D. Sorokina, B. L. Sullivan, C. L. Wood, and S. Kelling. 2009. The eBird Reference Dataset (http://www.avianknowledge.net/content/features/archive/eBird_Ref).
The Multi-scale Modeling Challenge Goal: Analysis at broad-scale with fine resolution Challenge: spatiotemporal patterning at multiple scales Local-scale Fine-scale spatial and temporal resource patterns Large-scale Regional & seasonal variation in species’ habitat utilization
SpatioTemporal Exploratory Model (STEM) Current nonparametric SDM’s are very good for local-scale modeling by relating environmental predictors (X) to observed occurrences (y) Multi-scale strategy: differentiate between local and global-scale ST structure. Make explicit time (t) and location (s) “Regionalize” by restricting support Predictions at time (t) and location (s) are made by averaging across a set of local models containing that time and location Restricted Support Set (q) ith ST explicit base model Number of models supporting (s,t)
The ST Ensemble “Slice and dice” ST extent into stixels With sufficient overlap Adapt to different dynamics Temporal Design: 40 dayintevals 80 evenly spaced windows throughout year Spatial Design For each time interval Random Sample rectangles (12 deg lonx 9 deg lat) Minimum 25 unique locations.
Exploratory Inference: SpatioTemporal Variation of Local-scale Predictor Effects Non-stationarity of species-habitat associations Although many ecological processes are known or expected to vary in space and time, the vast majority of SDM is done for a single region and/or season. So, our goal is to develop techniques to explore patterns of variation in ST and time to provide ecologists and land managers with more accurate information about how species‐habitat associations (requirements) change.
Taking a data intensive science approach requires a data management and research environment that supports the entire data life cycle; from acquisition, storage, management, and integration, to data exploration, analysis, visualization and other computing and information processing services. Kelling, S., W. M. Hochachka, D. Fink, M. Riedewald, R. Caruana, G. Ballard, and G. Hooker. 2009. Data‐intensive Science: A New Paradigm for Biodiversity Studies. BioScience59:613‐620.
Scientific Exploration, Visualization, and Analysis Working Group
Steve Kelling (co-chair), Cornell Lab of Ornithology Bob Cook (co-chair), Oak Ridge National Lab John Cobb, Oak Ridge National Lab Theo Damoulis, Cornell University Tom Dietterich, Oregon State Juliana Freire, University of Utah Daniel Fink, Cornell Lab of Ornithology Damian Gesler, iPlant Bill Michener, University of New Mexico Jeff Morisette, USGS Patrick O’Leary U of Idaho Alyssa Rosemartin NPN Suresh SanthanaVannan, Oak Ridge National Lab Claudio Silva, University of Utah Kevin Webb, Cornell Lab of Ornithology Kelling, S., R. Cook, T. Damoulas, D. Fink, J. Freire, W. M. Hochachka, W. K. Michener, K. Rosenberg, and C. Silva, 2011 IN PRESS. Estimating species distributions, across space through time and with features of the environment.
Observational Data Sources Sensors, sensor networks, and remote sensing gather observations Photo courtesy of www.carboafrica.net
Data Interoperability Our major data interoperability challenge rectifying object‐based models (i.e. vector entities such as locations where birds are observed), with field‐based models (i.e. raster imagery comprised of attribute values in gridded in space) of storing geographic information. To make data interoperable we had to apply that conflate point‐location based observations (e.g. bird observations) to match raster attribute data at the resolution of the raster data. For each observation location, we determine the cell in the raster grid into which the observation's location falls. We use the value of that cell's attribute as the attribute value for each observation.
Patterns in Bird Species Occurrence Explored through Data Intensive Analysis and Visualization Bird observations and environmental data from > 100,000 locations in US integrated and analyzed using High Performance Computing Resources Model results eBird Occurrence of Indigo Bunting (2008) Land Cover Jan Sep Dec Jun Apr Meteorology Potential Uses-
Spatio-Temporal Exploratory Models predict the probability of occurrence of bird species across the United States at a 35 km x 35 km grid. MODIS – Remote sensing data
Observations from Bird Watchers (citizen scientists)—huge number of birders collecting 16 million observations each year Combine with environmental factors like land cover, landscape fragmentation, topography, human population, weather, and remote sensing data (green‐ness of terrestrial vegetation). Integrating the data into one database is challenge. This huge amount of data can only be analyzed on Supercomputers, using the NSF TeraGrid High Performance Computing Models used in the creation of the 2011 United States of America State of the Birds Report entitled Birds in Public Lands and Waters.
Biodiversity Research and Conservation in a Digital World Gaining insight into the complexities and processes of natural systems is no longer an exclusive realm of theory and experiment; computation and access to largequantities of data is now an equal and indispensible partner for advances in scientific knowledge, land management, and informed decision making.
Funding and Acknowledgements National Science Foundation Leon Levy Foundation Wolf Creek Foundation The volunteers who contributed millions of hours gathering bird observations.
Acknowledgements eBird and the Avian Knowledge Network Art Munson - CU Daniel Fink - CU Wesley Hochachka - CU Denis Lepage - BSC Rich Caruana - MS Mirek Riedewald - NEU Daria Sorokina - CMU Kevin Webb - CU Giles Hooker - CU Brian Sullivan - CU Chris Wood - CU Marshall Iliff - CU Computational Sustainability Carla Gomes - CU Tom Dietterich - OSU Daniel Sheldon - OCU Ken Rosenberg - CU Rebecca Hutchinson - OSU Weng-Keen Wong - OSU Megan MacDonald - CU Stefan Hames - CU Theo Damoulas -CU BistraDilkina - CU DataONE Bill Michener - UNM Bob Cook - ORNL Jeff Morrisette - USGS Juliana Freire - UUT Claudio Silva - UUT Matt Jones - UCSB Suresh SanthanaVannan - ORNL