Our data for modeling bird distributions will come from eBird, a global online program that gathers bird observations from citizen scientists, predominately across the Western Hemisphere. eBird gathers checklists of birds with associated effort information from well-defined locations, passing each record through a two-tiered data verification system. In 2009, over 48,000 participants volunteered over 780,000 hours to collect more than one million bird observations per month from nearly 300,000 unique locations across the Western Hemisphere. Data from eBird have been used to reveal biological patterns across large spatio-temporal regions, allowing researchers to better understand patterns of bird occurrence and to explore species–habitat relationships. Quantities of data sufficient for analysis are available from 2005.
Will: fade and fast (done)
Determining the patterns of species occurrence through time, space, and understanding their links with features of the environment are central themes in ecology.Identifying the factors that influence species distributions is a complex task, requiring the examination of multiple facets of a species’ natural history and their relationships with the complex and variable environments which they live.
The SpatioTemporal Exploratory Model (STEM) adds essential spatiotemporal structure to existing techniques for developing species distribution models through a simple parametric structure without requiring a detailed understanding of the underlying dynamic processes. STEM use a multi-scale strategy to differentiate between local and global-scale spatiotemporal structure. A user specified species distribution model accounts for spatial and temporal patterning at the local level. These local patterns are then allowed to “scale up” via ensemble averaging to larger scales. This makes STEMs especially well suited for exploring distributional dynamics arising from a variety of processes.
Although many ecological processes are known or expected to vary in space and time, the vast majority of SDM is done for a single region and/or season. So, our goal is to develop techniques to explore patterns of variation in ST and time to provide ecologists and land managers with more accurate information about how species-habitat associations (requirements) change.
Taking a data intensive science approach requires a data management and research environment that supports the entire data life cycle; from acquisition, storage, management, and integration, to data exploration, analysis, visualization and other computing and information processing services.
We describe the assembly, exploration, visualization, and analysis of data using examples that are based on the synthesis and modeling of large datasets collected by researchers in multiple domains. Data Access and Synthesis: Identifying data sources and formatting them format for analysis. Model Development: Advancing species distribution modeling of large-scale migration.Managing Computational Requirements: Handling computationally intensive models. Exploring and Visualizing Model Results: The amount of information obtainable from a model of spatiotemporal variation in species’ distribution is enormous and tools for exploring and visualizing the data are required.Examples of Results: Providing results from the analysis done for SOTB
There are a wide variety of sensors and sensor networks that gather observational data. Each of these sensors and networks have there own particular biases in calibration and data gathering, as well as detecting phenomena.While autonomous sensors gather accurate data of many types, they are not useful for collecting species data, a task for which human observers are still needed.The one sensor network that I am most familiar with is a global network of volunteer observers (citizen scientists) who gather observations of birds.
John Cobb is from TeraGrid and Computer Science and Math, ORNL Observations from Bird Watchers (citizen scientists)—huge number of birders collecting 16 million observations each yearCombine with environmental factors like land cover, landscape fragmentation, topography, human population, weather, and remote sensing data (green-ness of terrestrial vegetation)Integrating the data into one database is challenge.Huge amount of data can only be analyzed on Supercomputers, using TeraGrid High Performance ComputingModels can be used to examine how bird migration may possibly change with changes in climate: input to IPCC Working Group 2 (impacts Adaptation, and Vulnerability)Also, plan to use these observations and model results to publish the State of the Birds Report for 2010 (the 2009 version of State of the Bird was released by Secretary Salazaar, Dept of Interior)Indigo Bunting winters in Central and South AmericaModel predictions show population timing and location through migrationEarly April – Concentration on Gulf of Mexico CoastMid April – Concentration along Mississippi River valley Mid May – Breeding distribution References:http://www.nature.com/news/2010/100810/full/news.2010.395.htmlhttp://www.scientificamerican.com/article.cfm?id=satellites-and-supercomputing-improve-bird-watching
Transcript of "Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach to estimating species distributions - Steve Kelling"
Data Intensive Challenges in Biodiversity Conservation<br />Steve Kelling<br />
Habitat Loss<br />From: University of California Press Blog Earth Day 2010<br />Habitat loss is the major issue for Biodiversity Conservation.<br />
The increasing availability of massive volumes of scientific data requires new synthetic analysis techniques to explore and identify interesting patterns that were otherwise not apparent. For biodiversity studies a “data driven” approach is necessary due to the complexity of ecological systems, particularly when viewed at large spatial and temporal scales. <br />
Presentation Goals:<br />Observation Networks<br />Description of eBird: http://www.ebird.org<br />Species Distribution Models<br />Description of the Avian Knowledge Network: http://avianknowledge.net<br />Data Intensive Science<br />Description of the outcomes of the DataONE Exploration, Visualization, and Analysis Working Group<br />
eBird is a global online program that gathers bird observations from citizen scientists, predominately across the Western Hemisphere. eBird gathers checklists of birds with associated effort information from well-defined locations, passing each record through a two-tiered verification system.<br />ebird is a joint project between the Cornell Lab of Ornithology and National Audubon Society, and has more than 2 dozen regional partners.<br />Sullivan, B.L., C.L. Wood, m.J. Iliff, R.E. Bonney, D. Fink, and S. Kelling. 2009. eBird: A citizen-based bird observation network in the biological sciences. Biological Conservation 142: 2282-2292.<br />
eBird uses Crowdsourcing techniques to gather observations of birds.<br />Crowdsourcing is the act of outsourcing tasks, traditionally performed by an employee or contractor, to an undefined, large group of people<br />or community (a "crowd"), through an open call.<br />Jeff Howe, one of first authors to employ the term, established that the concept of crowdsourcing depends essentially on the fact that<br />because it is an open call to an undefined group of people, it gathers those who are most fit to perform tasks, solve complex problems and<br />contribute with the most relevant and fresh ideas.<br />For example, the public may be invited to develop a new technology, carry out a design task, refine or carry out the steps of an algorithm,<br />or help capture, systematize or analyze large amounts of data (CITIZEN SCIENCE).<br />(From Wikipedia)<br />
eBird Checklists<br />Volunteers submit checklists of bird observations from specific locations using protocols that collect information on data, time, and distance traveled.<br />
Flagged Records<br />4% submitted records were flagged for review<br />60% of those records were reviewed and validated<br />eBird contains a two-stag verification system:<br />Instantaneous automated evaluation of submissions based on species count limits for a given data and location;<br />A growing network of more than 500 regional editors composed of local experts who vet records flagged by the automated filters.<br />
Understanding our Audience<br />eBird is building a web‐enabled community of bird watchers who collect, manage,<br />and store their observations in a globally accessible unified database. Through its<br />development as a tool that addresses the needs of the birding community,<br />eBird sustains and grows participation.<br />Give Birders What They Want!<br />
eBird contains an array of data visualization and analysis tools<br />that provide birders, land managers, and scientists with summary<br />information about bird distribution.<br />
Sooty Shearwater<br />eBird data can be used to examine the timing of migration across large geographic areas.<br />Because each eBird observation is recorded at a specific location,<br />eBird can generate maps depicting species distribution at multiple<br />spatio‐temporal scales.<br />
Bird Occurrence Patterns in Upstate New York<br />eBird provides ‘‘bar charts” (i.e., frequency histograms) based on frequency of detection for individual species.<br />These visualizations provide users with occurrence information at specific locations at 1‐week increments and indicate the likelihood of detecting a species based on its frequency in that area (darker and wider bars indicate increased frequency).<br />
Growth in eBird Observations and Checklists<br />eBird 2.0 launch<br />Observations<br />Checklists<br />2011<br />
Estimating Species Distributions<br />Determining the patterns of species occurrence through time, space, and understanding their links with features of the environment are central themes in ecology. Identifying the factors that influence species distributions is a complex task, requiring the examination of multiple facets of a species’ natural history and their relationships with the complex and variable environments which they live.<br />Fink, D., W. M. Hochachka, D. Winkler, B. Shaby, G. Hooker, B. Zuckerberg, M. A. Munson, D. Sheldon, M. Riedewald, and S. Kelling. 2010. Spatiotemporal Exploratory models for Large‐scale Survey Data. Ecological Applications 20:2131‐2147.<br />
Observational Data Model<br />The most crucial aspect of predicting species occurrence is to learn a model—called the observation model—from observed measurements and make probabilistic inferences over regions or variables where measurements were not made. This approach joins organism observations with a multitude of "drivers", covariates that could potentially influence the occurrence of the organism. While a single (or a few sources) of noisy observations may not be sufficient to accurately model distributions, combining many measurements (e.g., species occurrence, weather, organism occurrence, landscape mosaic, human population data etc.), greatly improves the accuracy of the models.<br />
Munson, M. A., K. Webb, D. Sheldon, D. Fink, W. M. Hochachka, M. J. Iliff, M. Riedewald,<br />D. Sorokina, B. L. Sullivan, C. L. Wood, and S. Kelling. 2009.<br />The eBird Reference Dataset<br />(http://www.avianknowledge.net/content/features/archive/eBird_Ref).<br />
The Multi-scale Modeling Challenge<br />Goal: Analysis at broad-scale with fine resolution<br />Challenge: spatiotemporal patterning at multiple scales<br />Local-scale<br />Fine-scale spatial and temporal resource patterns<br />Large-scale<br />Regional & seasonal variation in species’ habitat utilization <br />
SpatioTemporal Exploratory Model (STEM) <br />Current nonparametric SDM’s are very good for local-scale modeling by relating environmental predictors (X) to observed occurrences (y)<br />Multi-scale strategy: differentiate between local and global-scale ST structure.<br />Make explicit time (t) and location (s) <br />“Regionalize” by restricting support<br />Predictions at time (t) and location (s) are made by averaging across a set of local models containing that time and location <br />Restricted Support Set (q)<br />ith ST explicit base model<br />Number of models supporting (s,t)<br />
The ST Ensemble <br />“Slice and dice” ST extent into stixels<br />With sufficient overlap<br />Adapt to different dynamics <br />Temporal Design:<br />40 dayintevals<br />80 evenly spaced windows throughout year<br />Spatial Design<br />For each time interval<br />Random Sample rectangles <br />(12 deg lonx 9 deg lat) <br />Minimum 25 unique locations.<br />
Exploratory Inference:<br />SpatioTemporal Variation of Local-scale Predictor Effects <br />Non-stationarity of species-habitat associations<br />Although many ecological processes are known or expected to vary in space and time, the vast majority of SDM is done for a single region and/or season. So, our goal is to develop techniques to explore patterns of variation in ST and time to provide ecologists and land managers with more accurate information about how species‐habitat associations (requirements) change.<br />
Taking a data intensive science approach requires a data management and research environment that supports the entire data life cycle; from acquisition, storage, management, and integration, to data exploration, analysis, visualization and other computing and information processing services.<br />Kelling, S., W. M. Hochachka, D. Fink, M. Riedewald, R. Caruana, G. Ballard, and G. Hooker. 2009. Data‐intensive Science: A New Paradigm for Biodiversity Studies. BioScience59:613‐620.<br />
Scientific Exploration, Visualization, and <br />Analysis Working Group<br /><ul><li>Data Discovery, Access, and Synthesis
Examples</li></ul>Steve Kelling (co-chair), Cornell Lab of Ornithology <br />Bob Cook (co-chair), Oak Ridge National Lab <br />John Cobb, Oak Ridge National Lab<br />Theo Damoulis, Cornell University<br />Tom Dietterich, Oregon State <br />Juliana Freire, University of Utah<br />Daniel Fink, Cornell Lab of Ornithology<br />Damian Gesler, iPlant<br />Bill Michener, University of New Mexico <br />Jeff Morisette, USGS <br />Patrick O’Leary U of Idaho<br />Alyssa Rosemartin NPN<br />Suresh SanthanaVannan, Oak Ridge National Lab <br />Claudio Silva, University of Utah <br />Kevin Webb, Cornell Lab of Ornithology<br />Kelling, S., R. Cook, T. Damoulas, D. Fink, J. Freire, W. M. Hochachka, W. K. Michener, K. Rosenberg, and C. Silva, 2011 IN PRESS.<br />Estimating species distributions, across space through time and with features of the environment.<br />
Observational Data Sources<br />Sensors, sensor networks, and <br />remote sensing gather observations<br />Photo courtesy of www.carboafrica.net<br />
Data Interoperability<br />Our major data interoperability challenge rectifying object‐based models (i.e. vector entities such as<br />locations where birds are observed), with field‐based models (i.e. raster imagery comprised of attribute<br />values in gridded in space) of storing geographic information. To make data interoperable we had to apply<br />that conflate point‐location based observations (e.g. bird observations) to match raster attribute data<br />at the resolution of the raster data. For each observation location, we determine the cell in the raster<br />grid into which the observation's location falls. We use the value of that cell's attribute as the attribute<br />value for each observation.<br />
Patterns in Bird Species Occurrence Explored through Data Intensive Analysis and Visualization<br />Bird observations and environmental data from > 100,000 locations in US integrated and analyzed using High Performance Computing Resources<br />Model results<br />eBird<br />Occurrence of Indigo Bunting (2008)<br />Land Cover<br />Jan<br />Sep<br />Dec<br />Jun<br />Apr<br />Meteorology<br />Potential Uses-<br /><ul><li>Examine patterns of migration
Measure population trends</li></ul>Spatio-Temporal Exploratory Models predict the probability of occurrence of bird species across the United States at a 35 km x 35 km grid.<br />MODIS – Remote sensing data<br />
Observations from Bird Watchers (citizen scientists)—huge number of birders collecting 16 million observations each year<br />Combine with environmental factors like land cover, landscape fragmentation, topography, human population, weather, and remote sensing data (green‐ness of terrestrial vegetation).<br />Integrating the data into one database is challenge.<br />This huge amount of data can only be analyzed on Supercomputers, using the NSF TeraGrid High Performance Computing<br />Models used in the creation of the 2011 United States of America State of the Birds Report entitled Birds in Public Lands and Waters.<br />
Biodiversity Research and Conservation in a Digital World <br />Gaining insight into the complexities and processes of natural systems is no longer an exclusive realm of theory and experiment; computation and access to largequantities of data is now an equal and indispensible partner for advances in scientific knowledge, land management, and informed decision making.<br />
Funding and Acknowledgements<br />National Science Foundation<br />Leon Levy Foundation<br />Wolf Creek Foundation<br />The volunteers who contributed millions of hours<br />gathering bird observations.<br />