SlideShare a Scribd company logo
1 of 8
Application Form
NARS Campus Research Challenge
Please answer all questions applicable to your project as specifically as possible (print or type). The
Application Form and supporting documents must be sent to NARS Campus Research Challenge at
nars-challenge@epa.gov
University/ College: SUNY-ESF
University Department or Program:
Graduate Program in Environmental
Science
Address: 219 Onondaga Ave
City: Syracuse State: NY Zip: 27614
Primary Contact: John Lombardi
Position / Title: Graduate Student
Email: jalomb01@syr.edu
Phone: 7708550377
Project Start Date: July 15th, 2014
Project Completion Date: January 20th,
2015
Please select one or more of the following data sets that the proposed project will use:
 National Lakes Assessment 2007
Application Form
NARS Campus Research Challenge
Combining Citizen Science (Volunteer) Data with the National Lakes Assessment (NLA)
Probability Sample to Improve Precision of NLA Estimates
Submitted by John Lombardi (jalomb01@syr.edu)
SUNY College of Environmental Science and Forestry
Proposed approach
This project will investigate how to combine volunteer-contributed, citizen science data with data
from a probability sampling design to improve the precision of statistical estimates of aquatic resource
condition. Specifically, the project will focus on quantifying the potential benefit of using citizen science
data collected from lakes in combination with the data collected by the National Lakes Assessment
(NLA). The goal is to use citizen science data within a statistically rigorous protocol in which the basis of
the estimates is a probability sampling design (such as those underlying the National Aquatic Resource
surveys). The hypothesis is that citizen science data can be used to reduce standard errors of estimates
produced from the NARS data.
The context for the project is as follows. Citizen science data collected by volunteers has become
increasingly prominent in environmental monitoring (e.g.,Bruhn and Sorano 2005; Conrad and Hilchey
2011). However,citizen science data are typically not collected from a statistically rigorous sampling
design, where a “rigorous” sampling design is defined as a probability sampling design in which the
probability of including a specific unit (e.g.,a lake or stream reach) is known. Probability sampling
designs are implemented with a randomization protocol and this randomization is typically lacking in
citizen science data. Citizen science data are typically collected at locations that are convenient to access
or of high interest to volunteers collecting the data. While citizen science data are valuable for qualitative
understanding of patterns,these data are questionable when used to infer conditions of a population (e.g.,
the proportion of lakes rated as “poor” for the stressor phosphorus). Peterson et al. (1999) demonstrated
the importance of sample representativeness for regional estimates of lake condition. One of the main
approaches for using data from a non-probability sample (such as citizen science data) is to implement an
analysis that attempts to weight the data in a manner that makes the sample representative of the
population. Overton et al. (1993) was an early application of this approach for environmental monitoring
(stream discharge) and the general approach is captured by methods using propensity scores (Lee 2006;
Lee and Valliant 2009; Rosenbaum and Rubin 1983).
Methods based on propensity scores do not use data from a probability sample and these methods
therefore would not take advantage of the NARS data. Fortunately, Brus and de Gruijter (2003) devised
an approach in which data from a probability sampling design form the basis of the inference and data
from a non-probability sampling design are incorporated as “auxiliary information” to reduce the standard
errors of the estimates. The approach is called “model-assisted estimation” because the auxiliary
information (in this case the citizen science data) is incorporated via a model of the relationship between
one or more auxiliary variables and the target variable of interest. Model-assisted estimators are unbiased
and they incorporate the per unit estimation weights associated with the sampling design.
An example illustrates the model-assisted estimation strategy as well as the approach that will be
used to achieve the project objectives. The project will focus on use of the National Lake Assessment
Application Form
NARS Campus Research Challenge
data. Sources of citizen science lake data will be identified by first searching the Volunteer Water Quality
Monitoring website (http://www.usawaterquality.org/volunteer/links.html). For example, New York has
a volunteer program called Citizen Science Lake Assessment Program (CSLAP) that collects data on lake
variables such phytoplankton and water clarity. Suppose the goal is to estimate the mean of total
phosphorus (a variable collected by CSLAP) for the population of New York lakes. The Brus and de
Gruijter (2003) model-assisted strategy requires developing a model to predict total phosphorus for each
lake in New York that is not included in the CSLAP volunteer network. The prediction model could be
constructed from a spatial approach such as kriging, or a regression approach could be used to predict
total phosphorus (e.g.,a model with predictor variables such as lake area and surrounding land cover). A
key feature of this approach is that the prediction model is used to “assist” the estimator (i.e., reduce the
standard error) ultimately obtained from the NLA data. The better the prediction of total phosphorus
from the CSLAP data,the greater the reduction in standard error of the model-assisted estimator based on
the NLA data. Even if the prediction of total phosphorus obtained from the CSLAP data is poor, the
model-assisted estimator will still be unbiased, but no reduction in standard error will be achieved.
Model-assisted estimators described by Särndal et al. (1992) can accommodate the unequal probability
sampling design of the NLA for regression and difference estimators (i.e., the estimators will take into
account the estimation weights provided with the NLA data and incorporating these weights into the
estimation strategy is critical). Brus and de Gruijter (2003) describe two general model-assisted
estimators, a regression estimator and a difference estimator, and both will be evaluated in this project.
The approach to meeting the project objectives will be the following. The first step will be to
identify which states have useable citizen science lake data and what specific lake variables are collected.
Then for each variable, a regression or spatial prediction model will be constructed to predict that variable
for all lakes in the state. These predictions from the citizen science data will then be incorporated in a
model-assisted estimator that uses the NLA sample data as the foundation for the lake condition
estimates. Standard errors of the model-assisted estimators will be computed and compared to standard
errors of NLA estimates that do not incorporate the citizen science data. The process will be repeated for
five to seven states to provide the general methodology for application of the approach to other states.
Timeline
July 15th
, 2014 – Identify which states have citizen science lake monitoring programs and identify five to
seven states that have data most amenable to use for creating the auxiliary data to be incorporated in the
model-assisted estimators. Create a GIS layer of the population of lakes for each state that will be used in
the project and locate the lakes monitored by citizen science. The early phase of the work will be devoted
to identifying the most useable citizen science data.
August 2014 - Become familiar with the R code used for analysis of the NLA data (Kincaid and Olsen
2013) and modify the code to accommodate the model-assisted estimators that will be used to incorporate
the citizen science data.
September 2014 – For each state and for as many lake variables as possible, develop regression models
using the citizen science data to predict lake condition at unsampled locations. Use these prediction
models to obtain the auxiliary variables for all lakes in the population of each state.
Application Form
NARS Campus Research Challenge
October/November 2014 – Compare the standard errors of the model-assisted estimators incorporating the
citizen science data to the estimators obtained from the NLA sample data alone. Identify variables and
general scenarios in which the model-assisted estimators provide a substantial reduction in standard error
relative to the basic NLA estimators. These situations will illustrate the most striking cases of the benefit
of incorporating citizen science data into an NLA model-assisted estimation strategy
December 2014 – Develop recommendations for increasing the value of citizen science data to improve
precision of estimates for lake monitoring. Recommendations will focus on the suite of variables
measured and guidance for expanding the sample of lakes included in the citizen science network.
Project outcomes
The two primary outcomes of the project would be the statistical methodology for incorporating
citizen science data into NLA sample-based estimators of lake condition and an assessment of the benefit
of incorporating citizen science data. Specifically, we will produce standard errors for the estimates
incorporating the citizen science data and compare those to the standard errors produced by the
conventional estimates that do not use citizen science data. The reduction in standard errors achieved by
incorporating the citizen science data will quantify the potential benefit of these data. The comparison of
standard errors will be the primary metric for demonstrating the success of the project. The research and
results of this project will be the basis of a chapter in my M.S. thesis and I will submit a condensed
version of the thesis chapter to a peer-reviewed journal.
Demonstrating that that citizen science monitoring data can enhance the precision of estimates
based on the NLA probability sample will add credence to the value of citizen science beyond just the
benefit of public education and participation. In many cases the work and effort of citizen science
programs comes at a low cost; however, the lack of an underlying probability sampling design constrains
the utility of citizen science data for application in a statistically rigorous setting. If volunteer data can be
used for more than just educational purposes then volunteer programs could spring up in other states and
cities. This would further increase education and awareness to local environmental issues as well as
provide more data for scientific analysis. Results from this project can also be used to help improve
future volunteer monitoring projects as recommendations will be provided for how to improve the data
collected for better utility in the model-assisted estimators.
Partner capabilities
I am interested in the proposed project because I have had experience with the educational side of
volunteer monitoring and would like to see more projects like CSLAP funded and supported. I am also
interested in the application of statistical methods applied to water-related issues and, more generally, the
use of quantitative methodology in environmental and natural science applications. I have experience
with model-based approaches for determining water quality but would like to get experience working
with data from a complex sampling design via a design-based approach. ArcGIS and R are tools that I
hope to gain more experience with that would also be used in this project.
The primary faculty advisor will be Dr. Steve Stehman. Dr. Stehman’s research focus is
environmental sampling and he teaches a graduate level applied sampling methods course at SUNY ESF.
Dr. Stehman will supervise the theoretical statistical developments needed to derive the appropriate
estimators incorporating the citizen science data. Dr. Karin Limburg, an aquatic ecologist at SUNY ESF,
Application Form
NARS Campus Research Challenge
has agreed to provide guidance and oversight on the scientific issues related to use and interpretation of
the NLA variables (please see accompanying letter of support from Dr. Limburg). In particular, Dr.
Limburg will assist with constructing credible models needed to predict lake condition variables from the
citizen science data.
References
Bruhn, L. C. , and P. A. Sorano. (2005). Long Term (1974-2001) Volunteer Monitoring of
Water Clarity Trends in Michigan Lakes and Their Relation to Ecoregion and Land Use/Cover.
Lake and Reservoir Management 21(1): 10-23. Michigan State University - Patricia Soranno.
Web. 5 May 2014. http://www.fw.msu.edu/~soranno/documents/BruhnandSoranno2005.pdf
Brus, D. J., and J. J. de Gruijter. (2003). A Method to Combine Non-probability Sample Data
with Probability Sample Data in Estimating Spatial Means of Environmental
Variables. Environmental Monitoring and Assessment 83: 303-317. Springer Link. Web. 5 May
2014. http://link.springer.com/article/10.1023%2FA%3A1022618406507
Conrad, C. C., and K. G. Hilchey. (2011). A Review of Citizen Science and Community-based
Environmental Monitoring: Issues and Opportunities. Environmental Monitoring and
Assessment 176(1-4): 273-291. Springer Link. Web. 5 May 2014.
http://link.springer.com/article/10.1007%2Fs10661-010-1582-5
Kincaid, T. M. and Olsen, A. R. (2013). spsurvey: Spatial Survey Design and Analysis. R
package version 2.6. URL: http://www.epa.gov/nheerl/arm/.
Lee, S. (2006). Propensity Score Adjustment as a Weighting Scheme for Volunteer Panel Web
Surveys. Journal of Official Statistics 22(2): 329-349. JOS Online. Web. 5 May 2014.
http://www.jos.nu/Articles/abstract.asp?article=222329
Lee, S., and R. Valliant. (2009). Estimation for Volunteer Panel Web Surveys Using Propensity
Score Adjustment and Calibration Adjustment. Sociological Methods & Research 37(3): 319-
343. Sage. Web. 5 May 2014. http://dx.doi.org/10.1177/0049124108329643
Overton, J. M., T. C. Young, and W. S. Overton. (1993). Using ‘Found’ Data to Augment a
Probability Sample: Procedure and Case Study. Environmental Monitoring and Assessment 26:
65-83.
Peterson, S. A., N. S. Urquhart, and E. B. Welch. (1999). Sample Representativeness: A Must
for Reliable Regional Lake Condition Estimates. Environmental Science and Technology 33:
1559-1565.
Application Form
NARS Campus Research Challenge
Rosenbaum, P. R., and D. B. Rubin. (1983). The Central Role of the Propensity Score in
Observational Studies for Causal Effects. Biometrika 70(1): 41. Oxford Journal. Web. 5 May
2014. http://biomet.oxfordjournals.org/content/70/1/41.full.pdf
Särndal, C-E., B. Swensson, and J. H. Wretman. (1992). Model Assisted Survey Sampling.
NewYork: Springer-Verlag.
Stevens, D. L., Jr. and A. R. Olsen. (2003). Variance Estimation for Spatially Balanced Samples
of Environmental Resources. Environmetrics 14:593-610.
Stevens, D. L., Jr. and A. R. Olsen. (2004). Spatially-balanced Sampling of Natural Resources.
Journal of the American Statistical Association 99(465): 262-278.
Application Form
NARS Campus Research Challenge
May 15, 2014
Letter of Support for John Lombardi’s application to the National Aquatic Resource Survey
(NARS) Campus Research Challenge
Dear Evaluation Committee:
I am writing to document that I will be supervising John Lombardi if his project titled
“Combining Citizen Science (Volunteer) Data with the National Lakes Assessment (NLA)
Probability Sample to Improve Precision of NLA Estimates” is selected for the National Aquatic
Resource Survey (NARS) Campus Research Challenge. I am John’s major professor for his
M.S. degree in the Graduate Program of Environmental Science at SUNY ESF. John has
completed one year of Master’s level coursework and beginning in June will start the research
for his Master’s thesis. John’s original plan was a research topic focusing on citizen science data
for forest monitoring, but when we encountered the NARS challenge, we saw a great opportunity
to extend his work to also include the potential utility of citizen science data for lake monitoring.
John’s undergraduate degree is in Mathematics and he possesses excellent analytic, GIS, and
computing skills to accomplish the project objectives specified. His ability to conduct statistical
analyses using the R package will be particularly helpful in this project because the basic
estimation procedures used for the NLA data have been coded in R.
On a personal note, I find the NARS challenge highly appealing because the statistical
issues addressed in my PhD research arose from the circa 1985 EPA national lake and stream
surveys. EPAs early adoption of these probability sampling methods represented a breakthrough
in statistically rigorous monitoring strategies. I would be pleased to be able to supervise John’s
work using these more recent 2007 NLA data nearly 30 years after my introduction to the ideas
of probability sampling in environmental monitoring. John Lombardi is a very capable and
conscientious student and with statistics supervision provided by me and aquatic systems
supervision provided by Dr. Limburg, I am fully confident that he will successfully achieve the
objectives of his NARS Challenge project.
Sincerely,
Application Form
NARS Campus Research Challenge
Dr. Steve Stehman
Professor of Biometry, SUNY ESF
svstehma@syr.edu

More Related Content

Similar to Lombardi_EPA_NARS_Proposal_Final

United States Geological Survey, Dr. William Guertal
United States Geological Survey, Dr. William GuertalUnited States Geological Survey, Dr. William Guertal
United States Geological Survey, Dr. William GuertalTWCA
 
Spatial data analysis 1
Spatial data analysis 1Spatial data analysis 1
Spatial data analysis 1Johan Blomme
 
FUNDING FOR ENVIRONMENTAL RESEARCH AND DEVELOPMENT BY NASA
FUNDING FOR  ENVIRONMENTAL RESEARCH AND DEVELOPMENT  BY NASA FUNDING FOR  ENVIRONMENTAL RESEARCH AND DEVELOPMENT  BY NASA
FUNDING FOR ENVIRONMENTAL RESEARCH AND DEVELOPMENT BY NASA Lyle Birkey
 
urpl969-group2-paper-03May06
urpl969-group2-paper-03May06urpl969-group2-paper-03May06
urpl969-group2-paper-03May06Wintford Thornton
 
2003-12-02 Environmental Information Systems for Monitoring, Assessment, and ...
2003-12-02 Environmental Information Systems for Monitoring, Assessment, and ...2003-12-02 Environmental Information Systems for Monitoring, Assessment, and ...
2003-12-02 Environmental Information Systems for Monitoring, Assessment, and ...Rudolf Husar
 
National Geospatial Program Presentation - IMIA Asia Pacific Conference
National Geospatial Program Presentation - IMIA Asia Pacific ConferenceNational Geospatial Program Presentation - IMIA Asia Pacific Conference
National Geospatial Program Presentation - IMIA Asia Pacific ConferenceInternational Map Industry Association
 
UNDERSTANDING LEAST ABSOLUTE VALUE IN REGRESSION-BASED DATA MINING
UNDERSTANDING LEAST ABSOLUTE VALUE IN REGRESSION-BASED DATA MININGUNDERSTANDING LEAST ABSOLUTE VALUE IN REGRESSION-BASED DATA MINING
UNDERSTANDING LEAST ABSOLUTE VALUE IN REGRESSION-BASED DATA MININGIJDKP
 
Time series forecasting of solid waste generation in arusha city tanzania
Time series forecasting of solid waste generation in arusha city   tanzaniaTime series forecasting of solid waste generation in arusha city   tanzania
Time series forecasting of solid waste generation in arusha city tanzaniaAlexander Decker
 
Gis Based Analysis of Supply and Forecasting Piped Water Demand in Nairobi
Gis Based Analysis of Supply and Forecasting Piped Water Demand in NairobiGis Based Analysis of Supply and Forecasting Piped Water Demand in Nairobi
Gis Based Analysis of Supply and Forecasting Piped Water Demand in Nairobiinventionjournals
 
Spatial data analysis
Spatial data analysisSpatial data analysis
Spatial data analysisJohan Blomme
 
The Gila-Salt-Verde River System: Improving River Forecasts and Emergency Man...
The Gila-Salt-Verde River System: Improving River Forecasts and Emergency Man...The Gila-Salt-Verde River System: Improving River Forecasts and Emergency Man...
The Gila-Salt-Verde River System: Improving River Forecasts and Emergency Man...Douglas B. Blatchford, PE, PH, CEM, CFM
 
Runoff Prediction of Gharni River Catchment of Maharashtra by Regressional An...
Runoff Prediction of Gharni River Catchment of Maharashtra by Regressional An...Runoff Prediction of Gharni River Catchment of Maharashtra by Regressional An...
Runoff Prediction of Gharni River Catchment of Maharashtra by Regressional An...ijtsrd
 
Developing best practice for infilling daily river flow data
Developing best practice for infilling daily river flow dataDeveloping best practice for infilling daily river flow data
Developing best practice for infilling daily river flow datahydrologywebsite1
 
213180005 Seminar presentation.pptx
213180005 Seminar presentation.pptx213180005 Seminar presentation.pptx
213180005 Seminar presentation.pptxKUNDESHWARPUNDALIK
 
A New Multi-Objective Optimization Model of Water Resources Considering Fairn...
A New Multi-Objective Optimization Model of Water Resources Considering Fairn...A New Multi-Objective Optimization Model of Water Resources Considering Fairn...
A New Multi-Objective Optimization Model of Water Resources Considering Fairn...Carlos Gamarra
 
Improving statistical models for flood risk assessment
Improving statistical models for flood risk assessmentImproving statistical models for flood risk assessment
Improving statistical models for flood risk assessmentRoss Towe
 

Similar to Lombardi_EPA_NARS_Proposal_Final (20)

United States Geological Survey, Dr. William Guertal
United States Geological Survey, Dr. William GuertalUnited States Geological Survey, Dr. William Guertal
United States Geological Survey, Dr. William Guertal
 
Spatial data analysis 1
Spatial data analysis 1Spatial data analysis 1
Spatial data analysis 1
 
Floodplain Mapping Status in Wisconsin
Floodplain Mapping Status in WisconsinFloodplain Mapping Status in Wisconsin
Floodplain Mapping Status in Wisconsin
 
FUNDING FOR ENVIRONMENTAL RESEARCH AND DEVELOPMENT BY NASA
FUNDING FOR  ENVIRONMENTAL RESEARCH AND DEVELOPMENT  BY NASA FUNDING FOR  ENVIRONMENTAL RESEARCH AND DEVELOPMENT  BY NASA
FUNDING FOR ENVIRONMENTAL RESEARCH AND DEVELOPMENT BY NASA
 
urpl969-group2-paper-03May06
urpl969-group2-paper-03May06urpl969-group2-paper-03May06
urpl969-group2-paper-03May06
 
2003-12-02 Environmental Information Systems for Monitoring, Assessment, and ...
2003-12-02 Environmental Information Systems for Monitoring, Assessment, and ...2003-12-02 Environmental Information Systems for Monitoring, Assessment, and ...
2003-12-02 Environmental Information Systems for Monitoring, Assessment, and ...
 
National Geospatial Program Presentation - IMIA Asia Pacific Conference
National Geospatial Program Presentation - IMIA Asia Pacific ConferenceNational Geospatial Program Presentation - IMIA Asia Pacific Conference
National Geospatial Program Presentation - IMIA Asia Pacific Conference
 
UNDERSTANDING LEAST ABSOLUTE VALUE IN REGRESSION-BASED DATA MINING
UNDERSTANDING LEAST ABSOLUTE VALUE IN REGRESSION-BASED DATA MININGUNDERSTANDING LEAST ABSOLUTE VALUE IN REGRESSION-BASED DATA MINING
UNDERSTANDING LEAST ABSOLUTE VALUE IN REGRESSION-BASED DATA MINING
 
Time series forecasting of solid waste generation in arusha city tanzania
Time series forecasting of solid waste generation in arusha city   tanzaniaTime series forecasting of solid waste generation in arusha city   tanzania
Time series forecasting of solid waste generation in arusha city tanzania
 
Gis Based Analysis of Supply and Forecasting Piped Water Demand in Nairobi
Gis Based Analysis of Supply and Forecasting Piped Water Demand in NairobiGis Based Analysis of Supply and Forecasting Piped Water Demand in Nairobi
Gis Based Analysis of Supply and Forecasting Piped Water Demand in Nairobi
 
Spatial data analysis
Spatial data analysisSpatial data analysis
Spatial data analysis
 
The Gila-Salt-Verde River System: Improving River Forecasts and Emergency Man...
The Gila-Salt-Verde River System: Improving River Forecasts and Emergency Man...The Gila-Salt-Verde River System: Improving River Forecasts and Emergency Man...
The Gila-Salt-Verde River System: Improving River Forecasts and Emergency Man...
 
Runoff Prediction of Gharni River Catchment of Maharashtra by Regressional An...
Runoff Prediction of Gharni River Catchment of Maharashtra by Regressional An...Runoff Prediction of Gharni River Catchment of Maharashtra by Regressional An...
Runoff Prediction of Gharni River Catchment of Maharashtra by Regressional An...
 
Developing best practice for infilling daily river flow data
Developing best practice for infilling daily river flow dataDeveloping best practice for infilling daily river flow data
Developing best practice for infilling daily river flow data
 
MICHAEL_LARRY
MICHAEL_LARRYMICHAEL_LARRY
MICHAEL_LARRY
 
213180005 Seminar presentation.pptx
213180005 Seminar presentation.pptx213180005 Seminar presentation.pptx
213180005 Seminar presentation.pptx
 
A New Multi-Objective Optimization Model of Water Resources Considering Fairn...
A New Multi-Objective Optimization Model of Water Resources Considering Fairn...A New Multi-Objective Optimization Model of Water Resources Considering Fairn...
A New Multi-Objective Optimization Model of Water Resources Considering Fairn...
 
Improving statistical models for flood risk assessment
Improving statistical models for flood risk assessmentImproving statistical models for flood risk assessment
Improving statistical models for flood risk assessment
 
Q4103103110
Q4103103110Q4103103110
Q4103103110
 
CWEMFPresentation_Bonnie
CWEMFPresentation_BonnieCWEMFPresentation_Bonnie
CWEMFPresentation_Bonnie
 

Lombardi_EPA_NARS_Proposal_Final

  • 1. Application Form NARS Campus Research Challenge Please answer all questions applicable to your project as specifically as possible (print or type). The Application Form and supporting documents must be sent to NARS Campus Research Challenge at nars-challenge@epa.gov University/ College: SUNY-ESF University Department or Program: Graduate Program in Environmental Science Address: 219 Onondaga Ave City: Syracuse State: NY Zip: 27614 Primary Contact: John Lombardi Position / Title: Graduate Student Email: jalomb01@syr.edu Phone: 7708550377 Project Start Date: July 15th, 2014 Project Completion Date: January 20th, 2015 Please select one or more of the following data sets that the proposed project will use:  National Lakes Assessment 2007
  • 2. Application Form NARS Campus Research Challenge Combining Citizen Science (Volunteer) Data with the National Lakes Assessment (NLA) Probability Sample to Improve Precision of NLA Estimates Submitted by John Lombardi (jalomb01@syr.edu) SUNY College of Environmental Science and Forestry Proposed approach This project will investigate how to combine volunteer-contributed, citizen science data with data from a probability sampling design to improve the precision of statistical estimates of aquatic resource condition. Specifically, the project will focus on quantifying the potential benefit of using citizen science data collected from lakes in combination with the data collected by the National Lakes Assessment (NLA). The goal is to use citizen science data within a statistically rigorous protocol in which the basis of the estimates is a probability sampling design (such as those underlying the National Aquatic Resource surveys). The hypothesis is that citizen science data can be used to reduce standard errors of estimates produced from the NARS data. The context for the project is as follows. Citizen science data collected by volunteers has become increasingly prominent in environmental monitoring (e.g.,Bruhn and Sorano 2005; Conrad and Hilchey 2011). However,citizen science data are typically not collected from a statistically rigorous sampling design, where a “rigorous” sampling design is defined as a probability sampling design in which the probability of including a specific unit (e.g.,a lake or stream reach) is known. Probability sampling designs are implemented with a randomization protocol and this randomization is typically lacking in citizen science data. Citizen science data are typically collected at locations that are convenient to access or of high interest to volunteers collecting the data. While citizen science data are valuable for qualitative understanding of patterns,these data are questionable when used to infer conditions of a population (e.g., the proportion of lakes rated as “poor” for the stressor phosphorus). Peterson et al. (1999) demonstrated the importance of sample representativeness for regional estimates of lake condition. One of the main approaches for using data from a non-probability sample (such as citizen science data) is to implement an analysis that attempts to weight the data in a manner that makes the sample representative of the population. Overton et al. (1993) was an early application of this approach for environmental monitoring (stream discharge) and the general approach is captured by methods using propensity scores (Lee 2006; Lee and Valliant 2009; Rosenbaum and Rubin 1983). Methods based on propensity scores do not use data from a probability sample and these methods therefore would not take advantage of the NARS data. Fortunately, Brus and de Gruijter (2003) devised an approach in which data from a probability sampling design form the basis of the inference and data from a non-probability sampling design are incorporated as “auxiliary information” to reduce the standard errors of the estimates. The approach is called “model-assisted estimation” because the auxiliary information (in this case the citizen science data) is incorporated via a model of the relationship between one or more auxiliary variables and the target variable of interest. Model-assisted estimators are unbiased and they incorporate the per unit estimation weights associated with the sampling design. An example illustrates the model-assisted estimation strategy as well as the approach that will be used to achieve the project objectives. The project will focus on use of the National Lake Assessment
  • 3. Application Form NARS Campus Research Challenge data. Sources of citizen science lake data will be identified by first searching the Volunteer Water Quality Monitoring website (http://www.usawaterquality.org/volunteer/links.html). For example, New York has a volunteer program called Citizen Science Lake Assessment Program (CSLAP) that collects data on lake variables such phytoplankton and water clarity. Suppose the goal is to estimate the mean of total phosphorus (a variable collected by CSLAP) for the population of New York lakes. The Brus and de Gruijter (2003) model-assisted strategy requires developing a model to predict total phosphorus for each lake in New York that is not included in the CSLAP volunteer network. The prediction model could be constructed from a spatial approach such as kriging, or a regression approach could be used to predict total phosphorus (e.g.,a model with predictor variables such as lake area and surrounding land cover). A key feature of this approach is that the prediction model is used to “assist” the estimator (i.e., reduce the standard error) ultimately obtained from the NLA data. The better the prediction of total phosphorus from the CSLAP data,the greater the reduction in standard error of the model-assisted estimator based on the NLA data. Even if the prediction of total phosphorus obtained from the CSLAP data is poor, the model-assisted estimator will still be unbiased, but no reduction in standard error will be achieved. Model-assisted estimators described by Särndal et al. (1992) can accommodate the unequal probability sampling design of the NLA for regression and difference estimators (i.e., the estimators will take into account the estimation weights provided with the NLA data and incorporating these weights into the estimation strategy is critical). Brus and de Gruijter (2003) describe two general model-assisted estimators, a regression estimator and a difference estimator, and both will be evaluated in this project. The approach to meeting the project objectives will be the following. The first step will be to identify which states have useable citizen science lake data and what specific lake variables are collected. Then for each variable, a regression or spatial prediction model will be constructed to predict that variable for all lakes in the state. These predictions from the citizen science data will then be incorporated in a model-assisted estimator that uses the NLA sample data as the foundation for the lake condition estimates. Standard errors of the model-assisted estimators will be computed and compared to standard errors of NLA estimates that do not incorporate the citizen science data. The process will be repeated for five to seven states to provide the general methodology for application of the approach to other states. Timeline July 15th , 2014 – Identify which states have citizen science lake monitoring programs and identify five to seven states that have data most amenable to use for creating the auxiliary data to be incorporated in the model-assisted estimators. Create a GIS layer of the population of lakes for each state that will be used in the project and locate the lakes monitored by citizen science. The early phase of the work will be devoted to identifying the most useable citizen science data. August 2014 - Become familiar with the R code used for analysis of the NLA data (Kincaid and Olsen 2013) and modify the code to accommodate the model-assisted estimators that will be used to incorporate the citizen science data. September 2014 – For each state and for as many lake variables as possible, develop regression models using the citizen science data to predict lake condition at unsampled locations. Use these prediction models to obtain the auxiliary variables for all lakes in the population of each state.
  • 4. Application Form NARS Campus Research Challenge October/November 2014 – Compare the standard errors of the model-assisted estimators incorporating the citizen science data to the estimators obtained from the NLA sample data alone. Identify variables and general scenarios in which the model-assisted estimators provide a substantial reduction in standard error relative to the basic NLA estimators. These situations will illustrate the most striking cases of the benefit of incorporating citizen science data into an NLA model-assisted estimation strategy December 2014 – Develop recommendations for increasing the value of citizen science data to improve precision of estimates for lake monitoring. Recommendations will focus on the suite of variables measured and guidance for expanding the sample of lakes included in the citizen science network. Project outcomes The two primary outcomes of the project would be the statistical methodology for incorporating citizen science data into NLA sample-based estimators of lake condition and an assessment of the benefit of incorporating citizen science data. Specifically, we will produce standard errors for the estimates incorporating the citizen science data and compare those to the standard errors produced by the conventional estimates that do not use citizen science data. The reduction in standard errors achieved by incorporating the citizen science data will quantify the potential benefit of these data. The comparison of standard errors will be the primary metric for demonstrating the success of the project. The research and results of this project will be the basis of a chapter in my M.S. thesis and I will submit a condensed version of the thesis chapter to a peer-reviewed journal. Demonstrating that that citizen science monitoring data can enhance the precision of estimates based on the NLA probability sample will add credence to the value of citizen science beyond just the benefit of public education and participation. In many cases the work and effort of citizen science programs comes at a low cost; however, the lack of an underlying probability sampling design constrains the utility of citizen science data for application in a statistically rigorous setting. If volunteer data can be used for more than just educational purposes then volunteer programs could spring up in other states and cities. This would further increase education and awareness to local environmental issues as well as provide more data for scientific analysis. Results from this project can also be used to help improve future volunteer monitoring projects as recommendations will be provided for how to improve the data collected for better utility in the model-assisted estimators. Partner capabilities I am interested in the proposed project because I have had experience with the educational side of volunteer monitoring and would like to see more projects like CSLAP funded and supported. I am also interested in the application of statistical methods applied to water-related issues and, more generally, the use of quantitative methodology in environmental and natural science applications. I have experience with model-based approaches for determining water quality but would like to get experience working with data from a complex sampling design via a design-based approach. ArcGIS and R are tools that I hope to gain more experience with that would also be used in this project. The primary faculty advisor will be Dr. Steve Stehman. Dr. Stehman’s research focus is environmental sampling and he teaches a graduate level applied sampling methods course at SUNY ESF. Dr. Stehman will supervise the theoretical statistical developments needed to derive the appropriate estimators incorporating the citizen science data. Dr. Karin Limburg, an aquatic ecologist at SUNY ESF,
  • 5. Application Form NARS Campus Research Challenge has agreed to provide guidance and oversight on the scientific issues related to use and interpretation of the NLA variables (please see accompanying letter of support from Dr. Limburg). In particular, Dr. Limburg will assist with constructing credible models needed to predict lake condition variables from the citizen science data. References Bruhn, L. C. , and P. A. Sorano. (2005). Long Term (1974-2001) Volunteer Monitoring of Water Clarity Trends in Michigan Lakes and Their Relation to Ecoregion and Land Use/Cover. Lake and Reservoir Management 21(1): 10-23. Michigan State University - Patricia Soranno. Web. 5 May 2014. http://www.fw.msu.edu/~soranno/documents/BruhnandSoranno2005.pdf Brus, D. J., and J. J. de Gruijter. (2003). A Method to Combine Non-probability Sample Data with Probability Sample Data in Estimating Spatial Means of Environmental Variables. Environmental Monitoring and Assessment 83: 303-317. Springer Link. Web. 5 May 2014. http://link.springer.com/article/10.1023%2FA%3A1022618406507 Conrad, C. C., and K. G. Hilchey. (2011). A Review of Citizen Science and Community-based Environmental Monitoring: Issues and Opportunities. Environmental Monitoring and Assessment 176(1-4): 273-291. Springer Link. Web. 5 May 2014. http://link.springer.com/article/10.1007%2Fs10661-010-1582-5 Kincaid, T. M. and Olsen, A. R. (2013). spsurvey: Spatial Survey Design and Analysis. R package version 2.6. URL: http://www.epa.gov/nheerl/arm/. Lee, S. (2006). Propensity Score Adjustment as a Weighting Scheme for Volunteer Panel Web Surveys. Journal of Official Statistics 22(2): 329-349. JOS Online. Web. 5 May 2014. http://www.jos.nu/Articles/abstract.asp?article=222329 Lee, S., and R. Valliant. (2009). Estimation for Volunteer Panel Web Surveys Using Propensity Score Adjustment and Calibration Adjustment. Sociological Methods & Research 37(3): 319- 343. Sage. Web. 5 May 2014. http://dx.doi.org/10.1177/0049124108329643 Overton, J. M., T. C. Young, and W. S. Overton. (1993). Using ‘Found’ Data to Augment a Probability Sample: Procedure and Case Study. Environmental Monitoring and Assessment 26: 65-83. Peterson, S. A., N. S. Urquhart, and E. B. Welch. (1999). Sample Representativeness: A Must for Reliable Regional Lake Condition Estimates. Environmental Science and Technology 33: 1559-1565.
  • 6. Application Form NARS Campus Research Challenge Rosenbaum, P. R., and D. B. Rubin. (1983). The Central Role of the Propensity Score in Observational Studies for Causal Effects. Biometrika 70(1): 41. Oxford Journal. Web. 5 May 2014. http://biomet.oxfordjournals.org/content/70/1/41.full.pdf Särndal, C-E., B. Swensson, and J. H. Wretman. (1992). Model Assisted Survey Sampling. NewYork: Springer-Verlag. Stevens, D. L., Jr. and A. R. Olsen. (2003). Variance Estimation for Spatially Balanced Samples of Environmental Resources. Environmetrics 14:593-610. Stevens, D. L., Jr. and A. R. Olsen. (2004). Spatially-balanced Sampling of Natural Resources. Journal of the American Statistical Association 99(465): 262-278.
  • 7. Application Form NARS Campus Research Challenge May 15, 2014 Letter of Support for John Lombardi’s application to the National Aquatic Resource Survey (NARS) Campus Research Challenge Dear Evaluation Committee: I am writing to document that I will be supervising John Lombardi if his project titled “Combining Citizen Science (Volunteer) Data with the National Lakes Assessment (NLA) Probability Sample to Improve Precision of NLA Estimates” is selected for the National Aquatic Resource Survey (NARS) Campus Research Challenge. I am John’s major professor for his M.S. degree in the Graduate Program of Environmental Science at SUNY ESF. John has completed one year of Master’s level coursework and beginning in June will start the research for his Master’s thesis. John’s original plan was a research topic focusing on citizen science data for forest monitoring, but when we encountered the NARS challenge, we saw a great opportunity to extend his work to also include the potential utility of citizen science data for lake monitoring. John’s undergraduate degree is in Mathematics and he possesses excellent analytic, GIS, and computing skills to accomplish the project objectives specified. His ability to conduct statistical analyses using the R package will be particularly helpful in this project because the basic estimation procedures used for the NLA data have been coded in R. On a personal note, I find the NARS challenge highly appealing because the statistical issues addressed in my PhD research arose from the circa 1985 EPA national lake and stream surveys. EPAs early adoption of these probability sampling methods represented a breakthrough in statistically rigorous monitoring strategies. I would be pleased to be able to supervise John’s work using these more recent 2007 NLA data nearly 30 years after my introduction to the ideas of probability sampling in environmental monitoring. John Lombardi is a very capable and conscientious student and with statistics supervision provided by me and aquatic systems supervision provided by Dr. Limburg, I am fully confident that he will successfully achieve the objectives of his NARS Challenge project. Sincerely,
  • 8. Application Form NARS Campus Research Challenge Dr. Steve Stehman Professor of Biometry, SUNY ESF svstehma@syr.edu