• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach to estimating species distributions - Steve Kelling
 

Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach to estimating species distributions - Steve Kelling

on

  • 1,234 views

TERN Symposium 2011

TERN Symposium 2011

Statistics

Views

Total Views
1,234
Views on SlideShare
1,136
Embed Views
98

Actions

Likes
0
Downloads
7
Comments
0

3 Embeds 98

http://aceas.org.au 94
http://www.aceas.org.au 2
http://translate.googleusercontent.com 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Our data for modeling bird distributions will come from eBird, a global online program that gathers bird observations from citizen scientists, predominately across the Western Hemisphere. eBird gathers checklists of birds with associated effort information from well-defined locations, passing each record through a two-tiered data verification system. In 2009, over 48,000 participants volunteered over 780,000 hours to collect more than one million bird observations per month from nearly 300,000 unique locations across the Western Hemisphere. Data from eBird have been used to reveal biological patterns across large spatio-temporal regions, allowing researchers to better understand patterns of bird occurrence and to explore species–habitat relationships. Quantities of data sufficient for analysis are available from 2005.
  • RickBonney
  • Will: fade and fast (done)
  • Determining the patterns of species occurrence through time, space, and understanding their links with features of the environment are central themes in ecology.Identifying the factors that influence species distributions is a complex task, requiring the examination of multiple facets of a species’ natural history and their relationships with the complex and variable environments which they live.
  • The SpatioTemporal Exploratory Model (STEM) adds essential spatiotemporal structure to existing techniques for developing species distribution models through a simple parametric structure without requiring a detailed understanding of the underlying dynamic processes. STEM use a multi-scale strategy to differentiate between local and global-scale spatiotemporal structure. A user specified species distribution model accounts for spatial and temporal patterning at the local level. These local patterns are then allowed to “scale up” via ensemble averaging to larger scales. This makes STEMs especially well suited for exploring distributional dynamics arising from a variety of processes.
  • Although many ecological processes are known or expected to vary in space and time, the vast majority of SDM is done for a single region and/or season. So, our goal is to develop techniques to explore patterns of variation in ST and time to provide ecologists and land managers with more accurate information about how species-habitat associations (requirements) change.
  • Taking a data intensive science approach requires a data management and research environment that supports the entire data life cycle; from acquisition, storage, management, and integration, to data exploration, analysis, visualization and other computing and information processing services.
  • We describe the assembly, exploration, visualization, and analysis of data using examples that are based on the synthesis and modeling of large datasets collected by researchers in multiple domains. Data Access and Synthesis: Identifying data sources and formatting them format for analysis. Model Development: Advancing species distribution modeling of large-scale migration.Managing Computational Requirements: Handling computationally intensive models. Exploring and Visualizing Model Results: The amount of information obtainable from a model of spatiotemporal variation in species’ distribution is enormous and tools for exploring and visualizing the data are required.Examples of Results: Providing results from the analysis done for SOTB
  • There are a wide variety of sensors and sensor networks that gather observational data. Each of these sensors and networks have there own particular biases in calibration and data gathering, as well as detecting phenomena.While autonomous sensors gather accurate data of many types, they are not useful for collecting species data, a task for which human observers are still needed.The one sensor network that I am most familiar with is a global network of volunteer observers (citizen scientists) who gather observations of birds.
  • John Cobb is from TeraGrid and Computer Science and Math, ORNL Observations from Bird Watchers (citizen scientists)—huge number of birders collecting 16 million observations each yearCombine with environmental factors like land cover, landscape fragmentation, topography, human population, weather, and remote sensing data (green-ness of terrestrial vegetation)Integrating the data into one database is challenge.Huge amount of data can only be analyzed on Supercomputers, using TeraGrid High Performance ComputingModels can be used to examine how bird migration may possibly change with changes in climate: input to IPCC Working Group 2 (impacts Adaptation, and Vulnerability)Also, plan to use these observations and model results to publish the State of the Birds Report for 2010 (the 2009 version of State of the Bird was released by Secretary Salazaar, Dept of Interior)Indigo Bunting winters in Central and South AmericaModel predictions show population timing and location through migrationEarly April – Concentration on Gulf of Mexico CoastMid April – Concentration along Mississippi River valley Mid May – Breeding distribution References:http://www.nature.com/news/2010/100810/full/news.2010.395.htmlhttp://www.scientificamerican.com/article.cfm?id=satellites-and-supercomputing-improve-bird-watching

Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach to estimating species distributions - Steve Kelling Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach to estimating species distributions - Steve Kelling Presentation Transcript

  • Data Intensive Challenges in Biodiversity Conservation
    Steve Kelling
  • Environmental Science Challenges
    Climate Change
    Biodiversity Loss
    Invasive Species
    Water Depletion
    Disease Spread
    Green Energy
    Habitat Loss
    ---
  • Habitat Loss
    From: University of California Press Blog Earth Day 2010
    Habitat loss is the major issue for Biodiversity Conservation.
  • The increasing availability of massive volumes of scientific data requires new synthetic analysis techniques to explore and identify interesting patterns that were otherwise not apparent. For biodiversity studies a “data driven” approach is necessary due to the complexity of ecological systems, particularly when viewed at large spatial and temporal scales.
  • Presentation Goals:
    Observation Networks
    Description of eBird: http://www.ebird.org
    Species Distribution Models
    Description of the Avian Knowledge Network: http://avianknowledge.net
    Data Intensive Science
    Description of the outcomes of the DataONE Exploration, Visualization, and Analysis Working Group
  • eBird is a global online program that gathers bird observations from citizen scientists, predominately across the Western Hemisphere. eBird gathers checklists of birds with associated effort information from well-defined locations, passing each record through a two-tiered verification system.
    ebird is a joint project between the Cornell Lab of Ornithology and National Audubon Society, and has more than 2 dozen regional partners.
    Sullivan, B.L., C.L. Wood, m.J. Iliff, R.E. Bonney, D. Fink, and S. Kelling. 2009. eBird: A citizen-based bird observation network in the biological sciences. Biological Conservation 142: 2282-2292.
  • eBird uses Crowdsourcing techniques to gather observations of birds.
    Crowdsourcing is the act of outsourcing tasks, traditionally performed by an employee or contractor, to an undefined, large group of people
    or community (a "crowd"), through an open call.
    Jeff Howe, one of first authors to employ the term, established that the concept of crowdsourcing depends essentially on the fact that
    because it is an open call to an undefined group of people, it gathers those who are most fit to perform tasks, solve complex problems and
    contribute with the most relevant and fresh ideas.
    For example, the public may be invited to develop a new technology, carry out a design task, refine or carry out the steps of an algorithm,
    or help capture, systematize or analyze large amounts of data (CITIZEN SCIENCE).
    (From Wikipedia)
  • eBird Checklists
    Volunteers submit checklists of bird observations from specific locations using protocols that collect information on data, time, and distance traveled.
  • Flagged Records
    4% submitted records were flagged for review
    60% of those records were reviewed and validated
    eBird contains a two-stag verification system:
    Instantaneous automated evaluation of submissions based on species count limits for a given data and location;
    A growing network of more than 500 regional editors composed of local experts who vet records flagged by the automated filters.
  • Understanding our Audience
    eBird is building a web‐enabled community of bird watchers who collect, manage,
    and store their observations in a globally accessible unified database. Through its
    development as a tool that addresses the needs of the birding community,
    eBird sustains and grows participation.
    Give Birders What They Want!
  • eBird contains an array of data visualization and analysis tools
    that provide birders, land managers, and scientists with summary
    information about bird distribution.
  • Sooty Shearwater
    eBird data can be used to examine the timing of migration across large geographic areas.
    Because each eBird observation is recorded at a specific location,
    eBird can generate maps depicting species distribution at multiple
    spatio‐temporal scales.
  • Bird Occurrence Patterns in Upstate New York
    eBird provides ‘‘bar charts” (i.e., frequency histograms) based on frequency of detection for individual species.
    These visualizations provide users with occurrence information at specific locations at 1‐week increments and indicate the likelihood of detecting a species based on its frequency in that area (darker and wider bars indicate increased frequency).
  • Growth in eBird Observations and Checklists
    eBird 2.0 launch
    Observations
    Checklists
    2011
  • Statistics 2010
    More than…
    18,214, 480 observations submitted d
    1,300,029 hours collecting bird observations.
    1,293,480 checklists entered
    22,136 contributors
    351,000 unique visitors to eBird
    20 million page views
  • Introducing BirdsEye—an eBird powered iPhone app
  • Estimating Species Distributions
    Determining the patterns of species occurrence through time, space, and understanding their links with features of the environment are central themes in ecology. Identifying the factors that influence species distributions is a complex task, requiring the examination of multiple facets of a species’ natural history and their relationships with the complex and variable environments which they live.
    Fink, D., W. M. Hochachka, D. Winkler, B. Shaby, G. Hooker, B. Zuckerberg, M. A. Munson, D. Sheldon, M. Riedewald, and S. Kelling. 2010. Spatiotemporal Exploratory models for Large‐scale Survey Data. Ecological Applications 20:2131‐2147.
  • Observational Data Model
    The most crucial aspect of predicting species occurrence is to learn a model—called the observation model—from observed measurements and make probabilistic inferences over regions or variables where measurements were not made. This approach joins organism observations with a multitude of "drivers", covariates that could potentially influence the occurrence of the organism. While a single (or a few sources) of noisy observations may not be sufficient to accurately model distributions, combining many measurements (e.g., species occurrence, weather, organism occurrence, landscape mosaic, human population data etc.), greatly improves the accuracy of the models.
  • Munson, M. A., K. Webb, D. Sheldon, D. Fink, W. M. Hochachka, M. J. Iliff, M. Riedewald,
    D. Sorokina, B. L. Sullivan, C. L. Wood, and S. Kelling. 2009.
    The eBird Reference Dataset
    (http://www.avianknowledge.net/content/features/archive/eBird_Ref).
  • The Multi-scale Modeling Challenge
    Goal: Analysis at broad-scale with fine resolution
    Challenge: spatiotemporal patterning at multiple scales
    Local-scale
    Fine-scale spatial and temporal resource patterns
    Large-scale
    Regional & seasonal variation in species’ habitat utilization
  • Wood Thrush
  • SpatioTemporal Exploratory Model (STEM)
    Current nonparametric SDM’s are very good for local-scale modeling by relating environmental predictors (X) to observed occurrences (y)
    Multi-scale strategy: differentiate between local and global-scale ST structure.
    Make explicit time (t) and location (s)
    “Regionalize” by restricting support
    Predictions at time (t) and location (s) are made by averaging across a set of local models containing that time and location
    Restricted Support Set (q)
    ith ST explicit base model
    Number of models supporting (s,t)
  • The ST Ensemble
    “Slice and dice” ST extent into stixels
    With sufficient overlap
    Adapt to different dynamics
    Temporal Design:
    40 dayintevals
    80 evenly spaced windows throughout year
    Spatial Design
    For each time interval
    Random Sample rectangles
    (12 deg lonx 9 deg lat)
    Minimum 25 unique locations.
  • Western Meadowlark
  • Exploratory Inference:
    SpatioTemporal Variation of Local-scale Predictor Effects
    Non-stationarity of species-habitat associations
    Although many ecological processes are known or expected to vary in space and time, the vast majority of SDM is done for a single region and/or season. So, our goal is to develop techniques to explore patterns of variation in ST and time to provide ecologists and land managers with more accurate information about how species‐habitat associations (requirements) change.
  • Chimney Swift
    Indigo Bunting
  • Taking a data intensive science approach requires a data management and research environment that supports the entire data life cycle; from acquisition, storage, management, and integration, to data exploration, analysis, visualization and other computing and information processing services.
    Kelling, S., W. M. Hochachka, D. Fink, M. Riedewald, R. Caruana, G. Ballard, and G. Hooker. 2009. Data‐intensive Science: A New Paradigm for Biodiversity Studies. BioScience59:613‐620.
  • Scientific Exploration, Visualization, and
    Analysis Working Group
    • Data Discovery, Access, and Synthesis
    • Model Development
    • Managing Computational Requirements
    • Exploring and Visualizing Model Results
    • Examples
    Steve Kelling (co-chair), Cornell Lab of Ornithology
    Bob Cook (co-chair), Oak Ridge National Lab
    John Cobb, Oak Ridge National Lab
    Theo Damoulis, Cornell University
    Tom Dietterich, Oregon State
    Juliana Freire, University of Utah
    Daniel Fink, Cornell Lab of Ornithology
    Damian Gesler, iPlant
    Bill Michener, University of New Mexico
    Jeff Morisette, USGS
    Patrick O’Leary U of Idaho
    Alyssa Rosemartin NPN
    Suresh SanthanaVannan, Oak Ridge National Lab
    Claudio Silva, University of Utah
    Kevin Webb, Cornell Lab of Ornithology
    Kelling, S., R. Cook, T. Damoulas, D. Fink, J. Freire, W. M. Hochachka, W. K. Michener, K. Rosenberg, and C. Silva, 2011 IN PRESS.
    Estimating species distributions, across space through time and with features of the environment.
  • Observational Data Sources
    Sensors, sensor networks, and
    remote sensing gather observations
    Photo courtesy of www.carboafrica.net
  • Data Interoperability
    Our major data interoperability challenge rectifying object‐based models (i.e. vector entities such as
    locations where birds are observed), with field‐based models (i.e. raster imagery comprised of attribute
    values in gridded in space) of storing geographic information. To make data interoperable we had to apply
    that conflate point‐location based observations (e.g. bird observations) to match raster attribute data
    at the resolution of the raster data. For each observation location, we determine the cell in the raster
    grid into which the observation's location falls. We use the value of that cell's attribute as the attribute
    value for each observation.
  • Patterns in Bird Species Occurrence Explored through Data Intensive Analysis and Visualization
    Bird observations and environmental data from > 100,000 locations in US integrated and analyzed using High Performance Computing Resources
    Model results
    eBird
    Occurrence of Indigo Bunting (2008)
    Land Cover
    Jan
    Sep
    Dec
    Jun
    Apr
    Meteorology
    Potential Uses-
    • Examine patterns of migration
    • Infer impacts ofclimate change
    • Measure patterns of habitat useage
    • Measure population trends
    Spatio-Temporal Exploratory Models predict the probability of occurrence of bird species across the United States at a 35 km x 35 km grid.
    MODIS – Remote sensing data
  • Observations from Bird Watchers (citizen scientists)—huge number of birders collecting 16 million observations each year
    Combine with environmental factors like land cover, landscape fragmentation, topography, human population, weather, and remote sensing data (green‐ness of terrestrial vegetation).
    Integrating the data into one database is challenge.
    This huge amount of data can only be analyzed on Supercomputers, using the NSF TeraGrid High Performance Computing
    Models used in the creation of the 2011 United States of America State of the Birds Report entitled Birds in Public Lands and Waters.
  • Biodiversity Research and Conservation in a Digital World
    Gaining insight into the complexities and processes of natural systems is no longer an exclusive realm of theory and experiment; computation and access to largequantities of data is now an equal and indispensible partner for advances in scientific knowledge, land management, and informed decision making.
  • Funding and Acknowledgements
    National Science Foundation
    Leon Levy Foundation
    Wolf Creek Foundation
    The volunteers who contributed millions of hours
    gathering bird observations.
  • Acknowledgements
    eBird and the Avian Knowledge Network
    Art Munson - CU
    Daniel Fink - CU
    Wesley Hochachka - CU
    Denis Lepage - BSC
    Rich Caruana - MS
    Mirek Riedewald - NEU
    Daria Sorokina - CMU
    Kevin Webb - CU
    Giles Hooker - CU
    Brian Sullivan - CU
    Chris Wood - CU
    Marshall Iliff - CU
    Computational Sustainability
    Carla Gomes - CU
    Tom Dietterich - OSU
    Daniel Sheldon - OCU
    Ken Rosenberg - CU
    Rebecca Hutchinson - OSU
    Weng-Keen Wong - OSU
    Megan MacDonald - CU
    Stefan Hames - CU
    Theo Damoulas -CU
    BistraDilkina - CU
    DataONE
    Bill Michener - UNM
    Bob Cook - ORNL
    Jeff Morrisette - USGS
    Juliana Freire - UUT
    Claudio Silva - UUT
    Matt Jones - UCSB
    Suresh SanthanaVannan - ORNL