DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS<br />D. Todd Jones-Farrand<br />Presented on 10/7/2009<br />As part of the NABCI Regional Population Objectives Workshop<br />Purpose<br />The purpose of this presentation was to review existing statistical methods for linking bird populations to habitat conditions (quantity & quality). Because this is a broad topic on which much has been written, this presentation focused on techniques currently used or potentially useful to Joint Ventures (JV) in the process of setting population-based habitat objectives. Further, this presentation was developed as a starting point for discussions on how to treat statistical models in a guidance document for JVs. As a consequence, the presentation raised many more questions than it answered.<br />Introduction<br />The comments & examples in this presentation were biased towards breeding season habitat for forest birds in the Midwest & Eastern US. This bias is in part due to the research experience of the author. However, most JVs in the Western US have used database models instead of statistical models. The Prairie Pothole JV has made extensive use of statistical models to support conservation planning for grassland birds and waterbirds. <br />The selection of a modeling approach or technique results from balancing many competing interests. These tradeoffs include strength of inference (natural history vs. ecological theory), extent of explanation (pattern vs. process), data requirements (little vs. large), costs (little vs. large), scale (fine vs. course), and scope (generality vs. precision). Although these are often presented as dichotomies, each represents a continuum of possibilities. Along these continuums, statistical models fall somewhere in the middle between qualitative models that require less inputs and demographic models that require much input. Some additional considerations include model purpose (explanation, prediction, or decision support), whether to collect your own data, and selection of an appropriate response variable (habitat quality, abundance, or viability).<br />Using statistical models has particular advantages and caveats. Statistical models allow estimates of detection probability (some techniques), precision, and error propagation (although rarely investigated). On the other hand, they perform better for habitat specialists than generalists, require that a species be broadly distributed (to get adequate sample size), and produce predictions that are restricted to the range of input values.<br />Using statistical models for linking populations to habitat conditions entails 2 primary assumptions – habitat is limiting, and structure (measurable) is a good surrogate for habitat (includes hard-to-measurable aspects such as food, competition, etc.). Statistical models come in 2 main varieties – those based on statistical theory (e.g. regression) and those based on machine learning algorithms [e.g. classification and regression trees (C&RT)].<br />Input Data Considerations<br />Building statistical models requires 2 types of data – population (response/dependent variable) & habitat (predictors/independent variables). There are also 2 ways to get these data – get it from someone else or generate it yourself. However, some forms of habitat data (e.g. land cover) should only be self-generated under exceptional circumstances, especially at the scale of a BCR or JV.<br />Population data can take the form of presence, presence/absence, counts (relative abundance), or density. The 2 primary sources of population data for JVs are the Breeding Bird Survey (BBS) and point count data. The advantages and caveats of BBS data have been commented on widely in the literature. Of particular interest to JVs are its broad spatial & temporal coverage opposed to its lack of protocol for estimating detection probability and the difficulty in using it to test site-scale habitat relationships. Point counts have the opposite advantages and caveats, though temporal & spatial coverage concerns can be addressed through coordinated monitoring projects (e.g. R8Bird).<br />There are many sources of habitat data available for regions or continental extents. These cover scales from site to landscape. The presentation focused on 4 datasets – National Land Cover Data (NLCD), LANDFIRE, GAP, & Forest Inventory and Analysis (FIA) data. The 3 remotely sensed datasets (NLCD, LANDFIRE, & GAP) vary in their spatial extent (regional to continental), classification systems (ecological systems vs. broad/general cover types), and update frequency, but all 3 suffer from classification accuracy issues. FIA is a statistically valid point sample across all ownerships that yields information on vegetation structure as well as general information about landscape composition (e.g. acres of forest, proportion of forests in a particular condition). However, variable plot densities across states (some states pay for extra plots) and landowner privacy issues are important caveats.<br />Examples of Select Modeling Techniques<br />The presentation included 4 examples of using statistical models for decision support & prediction. Two examples used BBS data [C&RT and Hierarchical Spatial Count (HSC) models], whereas 2 examples used point count data (occupancy & time removal models).<br />The examples were chosen to highlight tradeoffs mentioned earlier. C&RT & HSC used existing data and modern techniques to predict relative abundance per route. However, the modelers mapped outputs at different scales (county vs. 100-m pixel). Although the C&RT models are somewhat easier to build, the categorical nature of C&RT outputs provides little information for decision support. These models would likely produce different population estimates using the Rosenberg & Blancher (2005) approach.<br />The occupancy and time removal approaches involved implementing point count surveys. Although each technique produced estimates of detection probability that could be used to estimate population size more accurately, their small spatial coverage limits inferences to the area sampled. Point count data is typically costly and labor intensive to collect, reducing spatial coverage. However, the time removal model example was based on coordinated monitoring project wherein data was collected by donated state agency staff time, reducing costs. Further, occupancy models have been applied to BBS data (though not in support of conservation planning), which could greatly reduce costs and increase spatial extent.<br />Modeling Issues<br />Two important topics span all statistical modeling approaches and were highlighted by the examples. The first topic concerns data issues relating to model input variables, data quality, and error. The choice of model input variables depends upon the purpose of the model (explanatory, prediction, or decision support). Model purpose is rarely stated explicitly; rather we tend to develop 1 model and attempt to use it for all 3 purposes. This lack of clarity creates a disconnect between model development and use of model outputs. For example, a northern bobwhite habitat model developed for the Central Hardwoods JV included “shape index of pastures” as an input variable. Whereas this variable might enhance the predictive accuracy of the model, it seems highly unlikely that planners will get many producers to change the shape of their pastures to improve bobwhite habitat. Data quality is an extremely important issue that affects model outputs. Although classification accuracy of land cover datasets was mentioned previously, all geospatial datasets contain some error. Data quality of the BBS (in terms of JV goals) might be improved through changes in the protocol (adding distance or time measurements) or via analysis techniques (e.g. occupancy modeling). The issue of data quality also raises the issue of error propagation. Error in geospatial datasets combines in a multiplicative way with error from models to reduce predictive ability of models. This issue is known but its impact is rarely investigated.<br />The second topic concerns issues of scale regarding extent, resolution, and interpretation. To date, application of statistical modeling techniques by JVs has spanned extents ranging from small landscapes (e.g. a wildlife management area) to entire BCRs or even full JV administrative areas (multiple BCRs). At the larger end of the range, these models cover portions of multiple states but rarely entire states. Many states are partners in multiple (2-5) JVs, which presents them a problem of having information for only part of their planning jurisdiction. In a worst case scenario, a state could end up with different models produced by different JVs that imply different management scenarios. Coordinated objectives was a goal of this workshop, and synergy among JVs was recently identified as a goal in the NAWMP Assessment. We may need to start extending models to cover entire breeding ranges of priority species. Resolution and interpretation are related problems of input data and model outputs. Although NLCD is mapped at a 30-m pixel resolution, the minimum mapping unit for accuracy is 1 ha. Similarly, although the C&RT and HSC examples given earlier each predict bird counts per BBS route, each was mapped at a different resolution (county & 100-m pixel respectively). Careful attention should be paid to input data resolution so that model outputs are interpreted properly. Although model outputs can be mapped at nearly any resolution, they are best mapped at a resolution consistent with the coarsest resolution of input data. This reduces the temptation to over interpret model results.<br />Recommendations for Selecting a Statistical Modeling Technique<br />Seven general recommendations were made relative to developing and using statistical models to support developing population objectives.<br /><ul><li>Assemble a modeling team
“Population objectives depend on opinions and technical capacity” – R. Dettmers. Enlist folks with the proper modeling expertise when necessary, and try to enlist species experts and folks who will be using the models early in the model development process. When in doubt go with the technique you/they are most familiar with. Modeling is as much art as science and experience with a modeling technique is valuable. This step will help make the model more useful.
It is important to keep your modeling objective clearly in mind during model development. This includes how you are going to use the model, but also how you are going to test the model. Careful selection of input variables will be important, especially if you plan to use the model for decision support (are the variables manageable?). If you have good-quality existing data, chose an approach appropriate to those data. If you are going to collect data, develop your modeling approach in parallel to your data collection design.
Evaluate models prior to using them for planning
Model evaluation increases credibility. Model evaluation techniques (e.g. cross-validation are plentiful), but assessment with independent data is best when possible (see recommendation #2). Even if your model evaluates well, you can still improve it and learn about your system by conducting a sensitivity analysis. It is important to implement this step early. In the SHC diagram, model evaluation occurs twice – during biological planning and during monitoring and evaluation of conservation actions. Focusing efforts on evaluation at the first point reduces the chance that you’ve wasted a lot of time and money when you get to the second point.
Models are formal statements of hypotheses. Assumptions inherent in the model should be stated explicitly and tested. These are the tenants of assumption-driven research. But there is a tension between having enough information to make a decision and perfecting knowledge & understanding. We’ve all heard the adage, “All models are wrong, some models are useful.” At what point does the model become “useful?” While the answer to this question is likely case-specific, it needs to be asked & answered. Having a useful model does not mean the iterations are over – a model may outlive its useful life. As new data and techniques come available consider increasing the sophistication of your modeling approach – go from moving in circles (e.g. SHC) to progressing forward in spirals.
Use multiple modeling approaches whenever possible
Several recent papers (2009) have shown the value assessing data with multiple statistical models simultaneously. Thuiller et al. developed an extension (BIOMOD) for the r statistical package that automates the process for some datasets. Jones-Farrand et al. unpublished are comparing statistical models to theoretical models, essentially comparing where the birds are compared to where we think is best for them. Concurrence and disagreement between models can yield valuable information about the system, even when the approaches are built on different datasets and assumptions.
Be careful converting model outputs to population estimates
Although statistical models can be a key component of the process of setting population objectives, population objectives are neither inputs nor outputs of statistical models (as they are for energetic models). Neither are habitat objectives. Statistical models can only provide information to support decisions on which tradeoffs between alternative management scenarios the partnerships want to select. Statistical models can be particularly helpful for helping set population objectives when they account for variability in habitat quality and predictive errors.
Time and resources are almost always limiting factors in any modeling effort. Some techniques require more time and technical capacity than others. Also, the environmental variables used as independent variables in statistical models are only part of habitat. If the model performs poorly, is it because you are missing an import (but measurable) factor? Another consideration that is of particular interest to the author of this presentation is that statistical modeling approaches link declining populations to the habitat conditions that presumably caused their declines. Can they then predict the conditions necessary to reverse declines? This seeming logical inconsistency may be a function & limitation of using static habitat models. However, because statistical models presume habitat is limiting, they may be incapable of testing that assumption (i.e. can they point to migratory or wintering habitat limitations?).</li></ul>Discussion Questions<br />The presentation ended with a series of questions to spark thought & discussion among attendees at the workshop.<br /><ul><li>Is quick and easy worth doing?
This question was asked recently. Basically, the questioner was asking, “if you can do an estimate quick and easy based on a lot of assumptions that then you have to go out and test, wouldn’t it be more efficient to just go out and collect the necessary data in the first place and build a better model on that?” Arguments can be made for both sides.