Estimating Uncertainty in HSPF based Water Quality Model:
Application of Monte-Carlo Based Techniques

Anurag Mishra

Diss...
Estimating Uncertainty in HSPF based Water Quality Model:
Application of Monte-Carlo Based Techniques
Anurag Mishra

Abstr...
Acknowledgments
I would like to thank my major advisor Dr. Brian L. Benham for his continued support and
guidance througho...
Table of Contents
List of Figures ...........................................................................................
Chapter 5. Evaluation of the applicability of Generalized Likelihood Uncertainty Estimation and Markov
Chain Monte Carlo t...
List of Figures
Figure 2-1 A two-phase Monte Carlo analysis to illustrate the effect of knowledge uncertainty and
stochast...
Figure 5-9 Gelman-Rubin statistic and variance estimates of (a) Lower zone nominal soil moisture
(LZSN-Cropland), and (b) ...
List of Tables
Table 3-1 Land use distribution of Mossy Creek watershed (Benham et al., 2004) ...............................
Table 5-12 Percent of water quality criterion violations by the average time series and the probability
intervals for two ...
Chapter 1. Introduction
Under section 303(d) of the 1972 Clean Water Act, states, territories, and authorized tribes
are r...
Without a measure of uncertainty in model predictions, one cannot assess the probability of
achieving applicable water qua...
(Helton, 1994; McIntosh et al., 1994), Generalized Likelihood Uncertainty Estimation (GLUE)
(Beven and Binley, 1992), Baye...
of the study’s three objectives. These chapters were developed as papers in the format accepted
for the Transactions of th...
Pappenberger, F. and K.J. Beven. 2006. Ignorance is Bliss: Or seven reasons Not to Use
Uncertainty Analysis. Water Resourc...
Chapter 2. Literature Review
The following sections review the literature pertinent to TMDLs, water quality simulation
mod...
Water quality simulation modeling software can be empirical or a combination of empirical
and process-based. Watershed-sca...
In HSPF, the watershed is divided into subwatersheds that are modeled using modules
that simulate processes in, or on perv...
watershed, but are less accurate when describing similar processes occurring in complex, multidimensional, heterogeneous, ...
parameter space for uncertainty analysis. In the following sections, different techniques for
propagating parameter uncert...
The predefined parameter probability distributions reflect parameter uncertainty.
Parameter distributions can be obtained ...
“Knowledge Uncertain
Parameters”

“Stochastically Variable
Parameters”
n iterations for
parameters x and y

2nd set of val...
investigated; instead, Generalized Likelihood Uncertainty Estimation (GLUE) that is a successor
of RSA was investigated.

...
N = shaping parameter, chosen by the user.

 m (W j ) 


Lm = ∑ 2 
 j =1 σ ej 



N

2.5

where, Lm = likelihood...
evaluate predictive uncertainty using different likelihood measures. They also demonstrated that
using additional data to ...
σ = standard deviation of the data error.
The likelihood varies as the function of data error, number of data points, and ...
(θnew) is sampled near the previous parameter vector (θold) using the symmetric probability
distribution or π(θold|θnew) =...
2.5

Summary
Water quality modeling is often central to TMDL development and other similar watershed

management efforts. ...
Benham, B.L., K. Branna, K. Christophel, T. Dillaha, L. Henry, S. Mostaghimi, R. Wagner, J.
Wynn, G. Yagow, and R. Zeckosk...
Helton, J.C. 1994. Treatment of uncertainty in performance assessment for complex systems.
Risk Analysis. 14(4): 483-511.
...
Setegn, S.G., R. Shrinivasan, A.M. Melesse, B. Dargahi. 2009. SWAT model application and
uncertainty analysis in the Lake ...
Chapter 3. Evaluation of the applicability of single-phase and
two-phase Monte Carlo analysis to estimate uncertainty in
H...
TMDL = ∑WLA + ∑ LA + MOS

3.1

where,
ΣWLA = waste load allocation (point sources), and
ΣLA = load allocation (non-point s...
1993). First-order methods assume linear models, which limit their usability with respect to a
complex model like HSPF (Su...
independently evaluate the knowledge uncertainty and stochastic variability associated with
predicted in-stream IB concent...
n random values are generated for the stochastic parameters, x and y, from the predefined
distributions for each parameter...
Creek TMDL were direct deposition of feces in the stream by cattle (cattle loitering and defecating
in the stream), and ru...
Figure 3.3 Mossy Creek watershed and its subwatersheds (Benham et al., 2004)
Used under fair use guidelines, 2011
Table 3-...
the parameters that were estimated only through calibration were considered knowledge
uncertain.

3.1.3.4 Stochastic Param...
Therefore, IRC was assigned a triangular distribution with lower limit, mode and upper limit as
0.5, 0.7 and 0.7, respecti...
Table 3-3 Distribution of knowledge uncertain hydrology parameters for all land uses
Parameter
LZSN (inches)
AGWRC
DEEPFR
...
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Mishra a d_2011
Upcoming SlideShare
Loading in …5
×

Mishra a d_2011

359 views
224 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
359
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Mishra a d_2011

  1. 1. Estimating Uncertainty in HSPF based Water Quality Model: Application of Monte-Carlo Based Techniques Anurag Mishra Dissertation submitted to the faculty of Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Biological Systems Engineering Brian L. Benham Dan L. Gallagher Kenneth H. Reckhow Eric P. Smith Mary Leigh Wolfe July 28, 2011 Blacksburg, Virginia Keywords: water quality modeling, TMDL, fecal coliform, HSPF, uncertainty analysis, Monte-Carlo, Bayesian techniques, GLUE, MCMC, two-phase Monte Carlo analysis © copyright 2011, Anurag Mishra
  2. 2. Estimating Uncertainty in HSPF based Water Quality Model: Application of Monte-Carlo Based Techniques Anurag Mishra Abstract: To propose a methodology for the uncertainty estimation in water quality modeling as related to TMDL development, four Monte Carlo (MC) based techniques—single-phase MC, two-phase MC, Generalized Likelihood Uncertainty Estimation (GLUE), and Markov Chain Monte Carlo (MCMC) —were applied to a Hydrological Simulation Program–FORTRAN (HSPF) model developed for the Mossy Creek bacterial TMDL in Virginia. Predictive uncertainty in percent violations of instantaneous fecal coliform concentration criteria for the prediction period under two TMDL pollutant allocation scenarios was estimated. The average percent violations of the applicable water quality criteria were less than 2% for all the evaluated techniques. Singlephase MC reported greater uncertainty in percent violations than the two-phase MC for one of the allocation scenarios. With the two-phase MC, it is computationally expensive to sample the complete parameter space, and with increased simulations, the estimates of single and two-phase MC may be similar. Two-phase MC reported significantly greater effect of knowledge uncertainty than stochastic variability on uncertainty estimates. Single and two-phase MC require manual model calibration as opposed to GLUE and MCMC that provide a framework to obtain posterior or calibrated parameter distributions based on a comparison between observed and simulated data and prior parameter distributions. Uncertainty estimates using GLUE and MCMC were similar when GLUE was applied following the log-transformation of observed and simulated FC concentrations. GLUE provides flexibility in selecting any model goodness of fit criteria for calculating the likelihood function and does not make any assumption about the distribution of residuals, but this flexibility is also a controversial aspect of GLUE. MCMC has a robust formulation that utilizes a statistical likelihood function, and requires normal distribution of model errors. However, MCMC is computationally expensive to apply in a watershed modeling application compared to GLUE. Overall, GLUE is the preferred approach among all the evaluated uncertainty estimation techniques, for the application of watershed modeling as related to bacterial TMDL development. However, the application of GLUE in watershed-scale water quality modeling requires further research to evaluate the effect of different likelihood functions, and different parameter set acceptance/rejection criteria.
  3. 3. Acknowledgments I would like to thank my major advisor Dr. Brian L. Benham for his continued support and guidance throughout my PhD program. Dr. Benham gave me great latitude in pursuing the research area of my interest. He had been extremely patient during my learning phase, my rough times, and has been very instrumental in shaping my career. I would also like to thank my committee members Dr. Dan Gallagher, Dr. Kenneth Reckhow, Dr. Eric Smith and Dr. Mary Leigh Wolfe for accepting to serve on my committee and providing valued input and feedback. I am glad to have gotten in touch with Dr. Scotland Leman during my research. Dr. Leman taught me several concepts related to statistical analysis and Markov Chain Monte Carlo. He helped me in learning MATLAB programming as well. I am extremely grateful to the staff at Center for Watershed studies for helping me during my research and Graduate Research Assistantship. I am especially indebted to Dr. Rebecca Zeckoski for teaching me several important details of watershed modeling, and programming in Visual Basic. Kevin Brannan and Gene Yagow were always there to answer my questions, no matter how busy they were, no matter how simple were the questions. I want to thank Denton Yoder for helping me in understanding basic concepts related to programming and database management. I want to thank my employer AQUA TERRA Consultants in their continued encouragement, and support. Finally, I want to thank my wife Vineeta for her love, and constant support during my Graduate studies, and my parents for their patience, support and well wishes during my studies. iii
  4. 4. Table of Contents List of Figures ............................................................................................................................................... vi List of Tables ............................................................................................................................................... viii Chapter 1. Introduction.................................................................................................................................. 1 1.1 Goals and Objectives: ...................................................................................................................... 3 1.2 Dissertation Organization ................................................................................................................. 3 References:............................................................................................................................................... 4 Chapter 2. Literature Review ........................................................................................................................ 6 2.1 Water Quality and Total Maximum Daily Load (TMDL) ................................................................... 6 2.2 Water Quality Modeling and TMDL .................................................................................................. 6 2.2.1 Modeling Bacteria as a Water Quality Constituent.................................................................. 7 2.2.2 Hydrological Simulation Program-FORTRAN (HSPF) ............................................................ 7 2.3 Uncertainty in Water Quality Modeling............................................................................................. 8 2.4 Estimating Uncertainty in Water Quality Modeling ........................................................................... 9 2.4.1 First order approximation ...................................................................................................... 10 2.4.2 Monte Carlo Simulation ......................................................................................................... 10 2.4.3 Two-phase Monte Carlo simulation ....................................................................................... 11 2.4.4 Regionalized Sensitivity Analysis .......................................................................................... 12 2.4.5 Generalized Likelihood Uncertainty Estimation (GLUE) ....................................................... 13 2.4.6 Bayesian Monte Carlo Uncertainty Analysis ......................................................................... 15 2.4.7 Markov Chain Monte Carlo .................................................................................................... 16 2.5 Summary ........................................................................................................................................ 18 References:............................................................................................................................................. 18 Chapter 3. Evaluation of the applicability of single-phase and two-phase Monte Carlo analysis to estimate uncertainty in HSPF based water quality modeling. ................................................................................... 22 Introduction ............................................................................................................................................. 22 3.1 Materials and Methods ................................................................................................................... 25 3.1.1 Monte Carlo Simulation ......................................................................................................... 25 3.1.2 Two phase Monte Carlo simulation ....................................................................................... 25 3.1.3 Modeling ................................................................................................................................ 26 3.2 Results and Discussion .................................................................................................................. 37 3.2.1 Single-phase Monte Carlo Simulation ................................................................................... 37 3.2.2 Two-phase Monte Carlo Simulation ...................................................................................... 39 3.3 Summary and Conclusion .............................................................................................................. 43 References:............................................................................................................................................. 44 Chapter 4. Evaluation of the applicability of using log-transformed in-stream indicator bacteria concentrations to calculate a likelihood function for estimating uncertainty using the Generalized Likelihood Uncertainty Estimation (GLUE) technique with an HSPF model. .............................................. 47 Introduction ............................................................................................................................................. 47 4.1 Materials and Methods ................................................................................................................... 49 4.1.1 Study Area ............................................................................................................................. 49 4.1.2 Mossy Creek Watershed Model ............................................................................................ 50 4.1.3 Generalized Likelihood Uncertainty Estimation (GLUE) ....................................................... 53 4.1.4 Hydrologic and Water Quality Calibration of Mossy Creek Watershed Model ...................... 55 4.1.5 TMDL Pollutant Allocation Scenarios .................................................................................... 55 4.2 Results and Discussions ................................................................................................................ 56 4.3 Summary and Conclusions ............................................................................................................ 65 References:............................................................................................................................................. 66 iv
  5. 5. Chapter 5. Evaluation of the applicability of Generalized Likelihood Uncertainty Estimation and Markov Chain Monte Carlo to estimate uncertainty in HSPF based water quality modeling. ................................. 69 Introduction ............................................................................................................................................. 69 5.1 Materials and Methods ................................................................................................................... 71 5.1.1 Study Area ............................................................................................................................. 71 5.1.2 Mossy Creek Watershed Model ............................................................................................ 72 5.1.3 Generalized Likelihood Uncertainty Estimation..................................................................... 75 5.1.4 Markov Chain Monte Carlo (MCMC) ..................................................................................... 77 5.1.5 Mossy Creek Model Calibration and Validation .................................................................... 80 5.1.6 TMDL Pollutant Allocation Scenarios .................................................................................... 80 5.2 Results and Discussions ................................................................................................................ 81 5.2.1 GLUE ..................................................................................................................................... 81 5.2.2 MCMC ................................................................................................................................... 87 5.3 Summary and Conclusions ............................................................................................................ 95 References:............................................................................................................................................. 96 Chapter 6. Estimating Uncertainty in Indicator Bacteria TMDL Developed Using HSPF: Reflection on the applications of Monte Carlo based techniques. .......................................................................................... 99 References:........................................................................................................................................... 103 v
  6. 6. List of Figures Figure 2-1 A two-phase Monte Carlo analysis to illustrate the effect of knowledge uncertainty and stochastic variability (adapted from Hession et al., 1996). ................................................................. 12 Figure 3-1 A two-phase Monte Carlo analysis to illustrate the effect of knowledge uncertainty and stochastic variability (adapted from Hession et al., 1996). ................................................................. 26 Figure 3-2 Mossy Creek Watershed (Benham et al., 2004) ...................................................................... 27 Figure 3-3 Mossy Creek watershed and its subwatersheds (Benham et al., 2004) .................................. 28 Figure 3-4 Observed and simulated fecal coliform concentrations at the water quality observation station ............................................................................................................................................................ 36 Figure 3-5 For TMDL allocation scenario S1, (a) 80% probability interval, and (b) 95% probability interval; for TMDL allocation scenario S2, (c) 80% probability interval, and (d) 95% probability interval. Representative plots show results for first six months of the simulation period. ................................ 38 Figure 3-6 Distribution of cumulative distribution functions (CDF) resulting due to knowledge uncertainty for TMDL allocation scenario S1. Each individual CDF is a result of stochastic variability. .............. 40 Figure 3-7 Distribution of cumulative distribution functions (CDF) resulting due to knowledge uncertainty for TMDL allocation scenario S2. Each individual CDF is a result of stochastic variability. .............. 40 Figure 3-8 Comparison of TMDL allocation Scenarios by plotting the CDF of median of family of CDFs obtained from Two-phase Monte Carlo Simulation ............................................................................ 41 Figure 3-9 For TMDL allocation scenario S1, (a) 80% probability interval, and (b) 95% probability interval; for TMDL allocation scenario S2, (c) 80% probability interval, and (d) 95% probability interval. Representative plots show results for first six months of simulation period. ...................................... 42 Figure 4-1 Mossy Creek Watershed (Benham et al., 2004) ...................................................................... 50 Figure 4-2 Mossy Creek watershed and its subwatersheds (Benham et al., 2004). ................................. 51 Figure 4-3 Histogram and cumulative distribution function of likelihood functions for hydrologic calibration. .......................................................................................................................................... 55 Figure 4-4 Posterior distribution of two hydrologic parameters, (a) LZSN – Pasture, and (b) DEEPFR obtained using GLUE technique......................................................................................................... 57 Figure 4-5 Posterior distribution of two water quality parameters obtained using GLUE technique with non-transformed fecal coliform concentrations (a and b), and log-transformed fecal coliform concentrations (c and d). .................................................................................................................... 60 Figure 4-6 TMDL allocation scenario S1, 80% probability interval (a) and 95% probability interval (b); TMDL allocation scenario S2, 80% probability interval (c) and 95% probability interval (d); using non-transformed FC concentration. Representative plots showing first six months of simulation period. ................................................................................................................................................. 63 Figure 4-7 TMDL allocation scenario S1, 80% probability interval (a) and 95% probability interval(b); TMDL allocation scenario S2, 80% probability interval (c) and 95% probability interval (d); using logtransformation of FC concentration. Representative plots show results for first six months of simulation period. ............................................................................................................................... 64 Figure 5-1 Mossy Creek Watershed (Benham et al., 2004) ...................................................................... 72 Figure 5-2 Mossy Creek watershed and its subwatersheds (Benham et al., 2004). ................................. 73 Figure 5-3 Histogram and cumulative distribution function of likelihood functions for hydrologic calibration. .......................................................................................................................................... 77 Figure 5-4 Posterior distribution of two hydrologic parameters, (a) LZSN – Pasture, and (b) DEEPFR obtained using GLUE technique......................................................................................................... 82 Figure 5-5 Posterior distributions of two water quality parameters (a) FSTDEC, and (b) Accumulation of fecal coliform in pasture; obtained using GLUE technique. ............................................................... 84 Figure 5-6 For TMDL allocation scenario S1, 80% probability interval (a) and 95% probability interval (b); TMDL allocation scenario S2, 80% probability interval (c) and 95% probability interval (d). Representative plots showing first six months of simulation period. .................................................. 86 Figure 5-7 Markov Chains of two hydrology parameters. The chains illustrated here are for last 50,000 iterations out of 100,000 iterations. .................................................................................................... 87 Figure 5-8 Markov chains of two water quality parameters. The chains illustrated here are for last 50,000 iterations out of 100,000 iterations. .................................................................................................... 88 vi
  7. 7. Figure 5-9 Gelman-Rubin statistic and variance estimates of (a) Lower zone nominal soil moisture (LZSN-Cropland), and (b) fraction of water lost to deep aquifers (DEEPFR). ................................... 89 Figure 5-10 Gelman-Rubin statistic and variance estimates of (a) fecal coliform on pasture per day (ACQOP), and (b) first order decay rate of fecal coliform (FSTDEC). ............................................... 89 Figure 5-11 Posterior distribution of two hydrology and two water quality parameters obtained using MCMC ................................................................................................................................................ 91 Figure 5-12 For TMDL allocation scenario S1, (a) 80% probability interval (b) 95% probability interval; for TMDL allocation scenario S2 (c) 80% probability interval (d) 95% probability interval. Representative plots show results for first six months of simulation period. The posterior distributions were obtained using MCMC. .......................................................................................... 94 Figure 6-1 The percent violations and uncertainty reported by fourfour uncertainty estimation techniques for the two TMDL allocation scenarios in Mossy Creek watershed. ................................................ 101 vii
  8. 8. List of Tables Table 3-1 Land use distribution of Mossy Creek watershed (Benham et al., 2004) .................................. 28 Table 3-2 Distribution of stochastically variable parameter INFILT (index to mean infiltration rate, in/hr) by land use. ............................................................................................................................................. 29 Table 3-3 Distribution of knowledge uncertain hydrology parameters for all land uses ............................ 31 Table 3-4 Distribution of hydrologic parameters which vary according to the land use and time of year, for the month of January. ......................................................................................................................... 31 Table 3-5 Summary of water quality parameters which have been reported as sensitive and are typically calibrated in hydrologic modeling. ...................................................................................................... 33 Table 3-6 Summary statistics for the hydrologic calibration and validation period .................................... 35 Table 3-7 Parameter distribution of hydrologic parameters following model calibration ........................... 35 Table 3-8 TMDL pollutant allocation scenarios for Mossy Creek TMDL resulting in no violations (Benham et al., 2004) ......................................................................................................................................... 37 Table 3-9 Percent of violations of single-sample fecal coliform criteria for the two TMDL allocation scenarios during the prediction period. .............................................................................................. 38 Table 3-10 Example of cumulative probability for numbers of single-sample fecal coliform criterion violations for a given knowledge uncertain simulation ....................................................................... 39 Table 4-1 Land use distribution of Mossy Creek watershed (Benham et al., 2004) .................................. 51 Table 4-2 Distribution of hydrology parameters that apply to all land uses ............................................... 52 Table 4-3 Distribution of hydrologic parameters that vary according to land use and time of year (for the month of January)............................................................................................................................... 52 Table 4-4 Summary of water quality parameters which have been reported as sensitive and are typically calibrated when using HSPF. ............................................................................................................. 53 Table 4-5 TMDL pollutant allocation scenarios for Mossy Creek TMDL resulting in no violations (Benham et al., 2004) ......................................................................................................................................... 56 Table 4-6 Posterior distribution of all the hydrology parameters in Mossy Creek watershed model. ........ 58 Table 4-7 Quantiles of the HSPEXP (Expert system for HSPF) statistics for the validation period when Monte Carlo simulations were conducted with “prior” and “posterior” distributions ........................... 58 Table 4-8 Posterior water quality parameters obtained using GLUE without non log-transformed fecal coliform concentrations ...................................................................................................................... 61 Table 4-9 Posterior water quality parameters obtained using log-transformed fecal coliform concentrations .................................................................................................................................... 62 Table 4-10 Percent of water quality criterion violations by the average time series, and the probability intervals for two TMDL allocation scenarios when GLUE was performed with and without logtransformation of FC concentration. ................................................................................................... 64 Table 5-1 Land use distribution of Mossy Creek watershed (Benham et al., 2004) .................................. 73 Table 5-2 Distribution of hydrology parameters that apply to all and land uses ........................................ 74 Table 5-3 Distribution of hydrologic parameters which vary according to the land use and time of year, for the month of January. ......................................................................................................................... 74 Table 5-4 Summary of water quality parameters which have been reported as sensitive and are typically calibrated when using HSPF .............................................................................................................. 75 Table 5-5 TMDL pollutant allocation scenarios resulting in no violations of instantaneous criteria for indicator bacteria (Benham et al., 2004). ........................................................................................... 81 Table 5-6 Posterior distribution of all the hydrology parameters in Mossy Creek watershed model. ........ 82 Table 5-7 Quantiles of the HSPEXP (expert system for HSPF) statistics for the validation period when Monte Carlo simulations were conducted with “posterior” and “prior” distributions ........................... 83 Table 5-8 Posterior water quality parameters distributions obtained using GLUE. ................................... 85 Table 5-9 Percent of water quality criterion violations by the average time series and the probability intervals for two TMDL allocation scenarios, when GLUE was used for estimating posterior parameter distributions. ...................................................................................................................... 86 Table 5-10 Posterior distributions of hydrology parameters obtained after the application of MCMC technique. ........................................................................................................................................... 92 Table 5-11 Posterior distributions of water quality parameters obtained after the application of MCMC technique. ........................................................................................................................................... 93 viii
  9. 9. Table 5-12 Percent of water quality criterion violations by the average time series and the probability intervals for two TMDL allocation scenarios, when MCMC was used for estimating posterior parameter distributions. ...................................................................................................................... 93 Table 6-1 TMDL pollutant allocation scenarios resulting in no violations of instantaneous criteria for indicator bacteria (Benham et al., 2004) .......................................................................................... 100 ix
  10. 10. Chapter 1. Introduction Under section 303(d) of the 1972 Clean Water Act, states, territories, and authorized tribes are required to develop a list of “impaired” waters. According to the U.S. Environmental Protection Agency (USEPA), over 40% of the assessed waters in the United States (some 60,000 individual river or stream segments, lakes, and estuaries) are impaired, primarily because of nonpoint source pollution (USEPA, 2009). The states, territories and authorized tribes are required to develop a Total Maximum Daily Load (TMDL) for these impaired waters. A TMDL specifies the reductions in the pollutant sources that will bring the impaired waters into compliance with the water quality standards. Mathematically, a TMDL is written as TMDL = ∑WLA + ∑ LA +MOS Where, 1.1 ΣWLA = waste load allocation (point sources) ΣLA = load allocation (non-point sources) MOS = margin of safety Developing a TMDL often includes modeling the processes that contribute to the impairment with the application of a water quality simulation modeling software. The margin of safety (MOS) is often included in the TMDL calculation to account for the inherent uncertainty present in a natural system. Uncertainty is always present when modeling a natural system (Morgan and Henrion, 1990). However, typically no formal calculation is performed to quantify this uncertainty, as there is limited science-based guidance available to estimate the amount of uncertainty that is associated with modeling the processes dealing with water quality. In 2001, the USEPA (2001) estimated the annual average cost for TMDL development to be $63-69 million per year for the next 15 years. The report also estimated that the cost of TMDL development and implementation could exceed $1 billion per year. High concentrations of pathogen indicator organisms (e.g. fecal coliforms, E. coli) are currently the leading cause of impairments, and responsible for 14% of the identified impairments nationwide (USEPA, 2009). In spite of the significant costs associated with developing bacterial TMDLs, there has been little attempt to quantify the uncertainty associated with the modeling that is often conducted when developing this type of TMDL. 1
  11. 11. Without a measure of uncertainty in model predictions, one cannot assess the probability of achieving applicable water quality criteria, nor assess the risk associated with not achieving those criteria. Currently, when developing a TMDL, the MOS (when explicitly defined) is often an arbitrarily set percentage of the TMDL. The additional information about modeling uncertainty would provide decision makers and stakeholders with additional knowledge allowing them to make a more informed judgment when choosing a MOS and when comparing pollutant load allocation scenarios. Beven (1993) described inclusion of uncertainty analysis in the modeling process as “intellectual honesty”, which becomes imperative if significant public resources are at stake in the process. Most water quality simulation modeling software used for TMDL development are a combination of process-based and empirical models that do not include detailed uncertainty analysis capabilities. The ‘Hydrological Simulation Program–FORTRAN (HSPF)’ model (Bicknell et al., 2005), which is supported by USEPA as part of a larger modeling package ‘Better Assessment Science Integrating Point and Nonpoint Sources (BASINS)’- is frequently used for developing bacterial impairment TMDLs. HSPF can simulate hydrology and various water quality constituents like sediment, indicator bacteria (IB), nitrates, phosphorus etc. in watersheds of varying size (Bicknell et al., 2005). HSPF outputs a deterministic time series of hydrology and water quality constituents without quantifying uncertainty. Uncertainty in model predictions can be estimated using two categories of methods, namely Monte Carlo methods and first-order variance propagation (Beck, 1987; Summers et al., 1993). First-order methods assume linear models that limit their usability with respect to complex modeling software like HSPF (Summers et al., 1993). Paul et al. (2004) conducted a first-order analysis to estimate the contribution of sensitive parameters to the fraction of variance of simulated peak in-stream fecal coliform (FC) concentration in a watershed modeled with HSPF. They inferred that small uncertainties in selected water quality parameters could result in large uncertainties in the prediction of in-stream FC concentration. Paul et al. (2004) recommended the use of Monte Carlo based analysis to evaluate uncertainty in bacteria modeling using HSPF. Monte Carlo simulation is a method that involves performing repeated simulations of the model in question using randomly selected parameter values from predetermined input parameter probability distributions. The process is repeated for a number of iterations sufficient to converge on an estimate of the probability distribution of output variables (Gardner and O’Neill, 1983). Monte Carlo simulations can be used to estimate uncertainty in water quality simulation modeling along with other Monte Carlo based methods that include,-two-phase Monte Carlo simulation 2
  12. 12. (Helton, 1994; McIntosh et al., 1994), Generalized Likelihood Uncertainty Estimation (GLUE) (Beven and Binley, 1992), Bayesian Monte Carlo (BMC) (Dilks et al., 1992), and Markov Chain Monte Carlo (MCMC) (Kuczera and Parent, 1998). There have been some applications of Monte Carlo based techniques to estimate uncertainty in hydrologic modeling (e.g., Balin, 2004; Beven and Binley, 1992; Benaman and Shoemaker, 2002; Donigian et al., 2007; Hession et. al, 1996; Makowski et al., 2002; Stow et al., 2007), however, there are very few applications of these techniques to estimate uncertainty in water quality simulation modeling and TMDL development. It has been argued that the presence of too many competing methods for assessing uncertainty makes it difficult for modelers to select a method and interpret the results (Pappenberger and Beven, 1996). The research reported herein attempts to compare different Monte Carlo based techniques that can be used to estimate uncertainty in water quality modeling related to TMDL development. 1.1 Goals and Objectives: The goal of this research was to evaluate selected uncertainty estimation techniques when applied to a HSPF model used to develop a bacterial impairment TMDL. To accomplish this goal, specific objectives of this research were to: 1. compare the applicability of single- and two-phase Monte Carlo in estimating uncertainty in HSPF-based water quality simulation modeling for TMDL development. (Chapter 3) 2. assess the impact of using log-transformed in-stream fecal coliform concentrations on predictive uncertainty when using the Generalized Likelihood Uncertainty Estimation (GLUE) technique with HSPF-based water quality simulation model for TMDL development. (Chapter 4) 3. evaluate the applicability of GLUE and Markov Chain Monte Carlo (MCMC) in estimating uncertainty in the water quality simulation modeling when using HSPF for bacterial TMDL development. (Chapter 5) 1.2 Dissertation Organization Chapter one of this dissertation introduces the general concept of uncertainty analysis in water quality modeling. The second chapter provides a detailed literature review of water quality modeling and uncertainty analysis techniques. Chapters three, four, and five are specific to each 3
  13. 13. of the study’s three objectives. These chapters were developed as papers in the format accepted for the Transactions of the American Society of Agricultural and Biological Engineers and are intended to stand alone. There is some limited repetition in these chapters. The sixth chapter is an overall conclusion chapter. References: Balin, D. 2004. Hydrological Behaviour through Experimental and Modelling Approaches. Application to the Haute-Mentue Catchment. School of Acrhitecture, Civil and Environmental Engineering. Lausanne, Switzerland. Ecole Polytechnique Fédérale de Lausanne. Beck, M.B. 1987. Water quality modeling: A review of the analysis of uncertainty. Water Resources Research 23(8): 1393-1442. Benaman, J., and C.A. Shoemaker. 2002. Sensitivity and uncertainty analysis of a distributed watershed model for the TMDL process. National TMDL Science and Policy 2002 Speciality Conference, Water Environment Federation. Beven, K. 1993. Prophecy, Reality and uncertainty in distributed hydrological modeling. Adv. Water Resources. 16(1): 41-51. Beven, K.. and A. Binley (1992). The Future of Distributed Models: Model Calibration and Uncertainty Prediction. Hydrological Processes 6(3): 279-298. Bicknell, B.R., J.C. Imhoff, J.L. Kittle, Jr. T.H. Jobes, and A.S. Donigian, Jr. 2005. HSPF Version 12.2 User’s Manual. AQUA TERRA Consultants. Mountain View, CA. Dilks, D.W., R.P. Canale, and P.G. Meier. 1992. Development of Bayesian Monte Carlo Techniques for Water Quality Model Uncertainty. Ecological Modeling. 62(1-3): 149-162. Donigian, A.S., and J.T. Love. 2007. The Housatonic River Watershed Model: Model Application and Uncertainty Analyses. 7th International IWA Symposium on System Analysis and Integrated Assessment in Water Management, May 7-9, 2007. Washington, DC. WATERMATEX Proceedings on CD-ROM. Gardner, R.H., and R.V. O’Neil. 1983. Parameter Uncertainty and Model Predictions: A Review of Monte Carlo Results. In Uncertainty and Forecasting of Water Quality, eds. M.B. Beck, and G. Van Straten, 345 -257. Berlin, Germany: Springer-Verlag. Hession, W.C., D.E. Storm, and C.T. Haan. 1996. Two-phase uncertainty analysis: an example using universal soil loss equation. Trans. ASAE. 39(4): 1309-1319. Helton, J.C. 1994. Treatment of uncertainty in performance assessment for complex systems. Risk Analysis. 14(4): 483-511. Kuczera, G., and E. Parent. 1998. Monte Carlo Assessment of Parameter Uncertainty in Conceptual Catchment Models: the Metropolis Algorithm. Journal of Hydrology. 211: 69-85. Makowski, D., D. Wallach, and M. Tremblay. 2002. Using a Bayesian approach to parameter estimation; Comparison of GLUE and MCMC methods. Agronomie. 22: 191-203. MacIntosh, D.L., G.W. Suter., and F.O. Hoffman. 1994. Uses of probabilistic exposure models in ecological risk assessments of contaminated sites. Risk Analysis. 14(4): 405-419. Morgan, M.G., and N. Henrion. 1990. Uncertainty. Cambridge University Press, New York City, NY. 4
  14. 14. Pappenberger, F. and K.J. Beven. 2006. Ignorance is Bliss: Or seven reasons Not to Use Uncertainty Analysis. Water Resources. Research. 42(5). W05302, doi:10.1029/2005WR004820. Paul, S., P.K. Haan, M.D. Matlock, S. Mukhtar, and S.D. Pillai. 2004. Analysis of the HSPF water quality parameter uncertainty in predicting peak in-stream fecal coliform concentrations. Trans. ASAE. 47(1): 69-78. Stow, C.A., K.H. Reckhow, S.S. Qian, E.C. Lamon, G.B. Arhonditsis, M.E. Borsuk, and D. Seo. 2007. Approaches to evaluate water quality model parameter uncertainty for adaptive TMDL implementation. J. Amer. Water Resources Ass. 43(6): 1499-1507. Summers, J.K., H.T. Wilson, and J. Kou. 1993. A method for quantifying the prediction uncertainties associated with water quality models. Ecological Modeling. 65: 161-176 USEPA. 2001. The National Cost of the Total Maximum Daily Load Program (Draft Report). Office of Water, United States Environmental Protection Agency. Washington D.C. USEPA. 2009. National Section 303(d) list fact sheet. United States Environmental Protection Agency. Available from: URL: http://oaspub.epa.gov/waters/national_rept.control. 5
  15. 15. Chapter 2. Literature Review The following sections review the literature pertinent to TMDLs, water quality simulation modeling, and uncertainty analysis. 2.1 Water Quality and Total Maximum Daily Load (TMDL) According to the U.S. Environmental Protection Agency (USEPA), over 40% of the assessed waters in the United States (some 60,000 individual river or stream segments, lakes, and estuaries) are impaired, primarily because of nonpoint source pollution (USEPA, 2009). Pathogens, typically represented by a fecal indicator bacteria (IB), are one of the leading causes of water quality impairments in the US and about 14% of the assessed river length (~150,362 km), and 3.3% of the assessed lakes, reservoir and ponds (~2347 square km) in the US are impaired due to excessive IB (USEPA, 2002). Elevated concentrations of IB are responsible for over 30% of the identified impairments in the state of Virginia (USEPA, 2011). Fecal coliform (FC) and enterococci (primarily for marine waters) are common IB, although E. Coli (EC) are used more frequently as an IB when assessing flowing, fresh water. The 1972 Clean Water Act requires total maximum daily loads (TMDLs) be developed for impaired water bodies. A TMDL is the maximum amount of pollutant a waterbody can receive and still meet its intended use (Benham, 2002). Mathematically, a TMDL is represented as TMDL = ∑WLA + ∑ LA + MOS 2.1 where, ΣWLA = waste load allocations (point sources), and ΣLA = load allocations (non-point sources), MOS = margin of safety 2.2 Water Quality Modeling and TMDL Developing a TMDL often involves a process where the contributions of different pollution sources are quantified and linked to the water quality of the impaired waterbody. In this process, the allowable pollutant load is partitioned among the considered sources. Water quality simulation models are often used to link pollutant sources to water quality. These models use mathematical relationships to represent the fate and transport of pollutants from the source to the waterbody. Once developed, calibrated and validated, a water quality simulation model can be used to determine the needed pollutant reductions to achieve the TMDL, and assess the impact of various pollutant control management strategies. 6
  16. 16. Water quality simulation modeling software can be empirical or a combination of empirical and process-based. Watershed-scale hydrology and water quality simulation modeling software that have been used for TMDL development include Hydrological Simulation Program–FORTRAN (HSPF) (Bicknell et al., 2005), Soil and Water Assessment Tool (SWAT) (Neitsch et al., 2005), Agricultural Non-Point Source Model (AGNPS) (Young et al., 1987), and the Annualized AGNPS (AnnAGNPS) (Bingner and Theurer, 2001). HSPF has been widely used for developing IB impairment TMDLs (eg. Benham et al., 2005; VADCR, 2003; Yagow, 2001). SWAT has also been used in developing IB impairment TMDLs, but fewer TMDLs have been developed using SWAT compared to HSPF. Further, the microbial component of SWAT has not yet been validated against the measured data at watershed scale (Im et al., 2004). 2.2.1 Modeling Bacteria as a Water Quality Constituent As mentioned earlier, pathogens are the second most widespread cause of water quality impairment in the United States. Pathogens however are difficult to identify and count, and are typically quantified in terms of a fecal IB (Rosen, 2000). Presence of IB means that the pathogenic organism may be present. Water quality criteria, therefore, typically specify concentration of a specific IB species. Indicator bacteria fate and transport involves several processes that include, but are not limited to, manure production by animals, transport within the water column and groundwater, dieoff, and regrowth. The type of animal and diet affects the production of IB. Manure management practices, hydrology, and sediment transport all affect IB transport. Soil moisture, temperature, pH, solar radiation, and time affects IB regrowth and die-off in and on the soil, and in the water. Water quality modeling software that have been used to model IB fate and transport include the Agriculture Runoff Management II: Animal Waste Version model (ARM II) (Overcash et al., 1983), the Utah State Model (UTAH) (Springer et al., 1983), the MWASTE model (Moore et al., 1989), the COLI model (Walker et al., 1990), SWAT (Neitsch et al., 2005), and HSPF (Bicknell et al., 2005). These models utilize different levels of complexity when simulating IB fate and transport processes. HSPF has been widely applied to develop IB impairment TMDLs, (Benham et al., 2004; Yagow et al., 2001), and was used for the research reported here. 2.2.2 Hydrological Simulation Program-FORTRAN (HSPF) HSPF is a continuous, watershed-scale modeling software that simulates hydrology and water quality processes (Bicknell et al., 2005). HSPF works on a mass balance approach, where water and water quality constituents are routed through appropriate pathways. HSPF uses several modules and sub-modules to simulate various processes. 7
  17. 17. In HSPF, the watershed is divided into subwatersheds that are modeled using modules that simulate processes in, or on pervious land areas (PERLND), impervious land areas (IMPLND), and reaches and reservoirs (RCHRES). The sub-modules PWATER and IWATER simulate runoff from PERLND and IMPLND, respectively; and the sub-module HYDR routes water through the RCHRES. Soluble water quality constituents are simulated using the PQUAL sub-module on PERLND, IQUAL on IMPLND, and GQUAL in RCHRES. When using HSPF to develop TMDLs for IB impairment, FC is typically simulated as a planktonic (dissolved) constituent. Although FC can move with the water like a dissolved pollutant, it can also be adsorbed to sediments, both on the land surface and in the waterbody (Yagow et al., 2001). At present, there are insufficient data to parameterize HSPF to allow the user to simulate FC as anything but a planktonic constituent. When modeling FC with HSPF, the modeler can specify different buildup/washoff relationships for pervious and impervious areas. FC accumulation rate is input by the user for each PERLND and IMPLND segment as a constant input (ACQOP) or varying monthly (MONACCUM). Users generally calculate this input rate outside of HSPF using tools such as Bacteria Source Load Calculator (BSLC) (Zeckoski et al., 2005). FC can also be input directly into the streams using input time series. FC concentration in groundwater and interflow can be input to HSPF as a constant or monthly variable. Die-off of FC on the land-surface is represented indirectly by providing an asymptotic limit of bacteria build up (SQOLIM-PERLND) in HSPF. Users input the asymptotic bacterial build-up limit value in HSPF for each PERLND and IMPLND. Generally, this limit is calculated assuming a first order die-off rate relationship. This asymptotic limit can vary by PERLND, IMPLND, and month. Release of bacteria in overland runoff is controlled by a user-defined parameter (WSQOP) that specifies the amount of runoff needed to wash off 90% of FC. The released FC is modeled in overland runoff, in streams, and in groundwater as a suspended or planktonic constituent. In-stream die-off is modeled using a temperature-dependent first order relationship (Chick’s law). 2.3 Uncertainty in Water Quality Modeling Models are a simplification of reality. With different models, varying levels of simplification are employed (Morgan and Henrion, 1992). Simplification introduces uncertainty in model output. Beven (1989) observed that the equations used in physically based models are good descriptors of processes that occur in a well-defined, spatially homogenous, and structurally stationary model 8
  18. 18. watershed, but are less accurate when describing similar processes occurring in complex, multidimensional, heterogeneous, temporally variable “real” watersheds. Beck (1987) expressed that uncertainty in water quality modeling is pervasive. The sources of uncertainty in water quality modeling can be broadly classified into knowledge uncertainty and stochastic variability. Knowledge uncertainty results when a modeler does not have complete knowledge about the modeled system or the parameters representing the system. Knowledge uncertainty is a property of the analyst conducting the study and the data available (Helton, 1994). Stochastic variability is a property of the system being modeled and arises because a system can behave in many different ways, as is expected in natural systems (Helton, 1994). In TMDL development, a margin of safety is often included to account for the inherent uncertainty in water quality modeling. However, typically no formal calculation is performed to estimate the model or modeling uncertainty. As the cost of developing and implementing a TMDL is projected to be more than $1 billion per year (USEPA, 2001), it is important that the modelers quantify the uncertainty that is present in modeling estimates. 2.4 Estimating Uncertainty in Water Quality Modeling Rigorous uncertainty analysis in water quality modeling is rare (Stow et al., 2007). Existence of too many competing methods of conducting uncertainty analysis and interpreting the results is considered a hindrance in rigorous water quality modeling uncertainty analysis (Pappenberger and Beven, 2006). One of the few attempts at estimating the propagation of parameter uncertainty when predicting in-stream FC concentration modeled using HSPF was conducted by Paul et al. (2004). They used first-order variance analysis to estimate the fraction of variance in simulated peak in-stream FC concentration that could be attributed to input parameters. They interpreted that small uncertainties in model input parameters can result in large uncertainties in predicted FC concentration. They realized the limitations of first-order variance methodology in uncertainty analysis and recommended using Monte Carlo based methods to estimate predictive uncertainty in in-stream FC concentration. Donigian and Love (2007) used Monte Carlo simulations to estimate uncertainty in hydrology and sediment modeling in a HSPF model developed for Housatonic river watershed. Stow et al. (2007) used various Monte-Carlo techniques in combination with a simple Streeter-Phelps dissolved oxygen model to estimate input parameters and predictive uncertainty. Stow and his colleagues concluded that as a model becomes more complex, with more and more parameters, it becomes increasingly difficult to effectively and efficiently sample appropriate 9
  19. 19. parameter space for uncertainty analysis. In the following sections, different techniques for propagating parameter uncertainty are discussed. 2.4.1 First order approximation In first order approximation (FOA), variance of the output, Var(O) is estimated as N Var (O) = ∑ Si2Var ( Pi ) 2.2 i =1 Where, Si is the absolute sensitivity of the model output with respect to the parameter Pi and N is the number of sensitive parameters. The fraction of the total variance of the output, Fi can be attributed to a particular input parameter as Fi = S i2V ar ( Pi ) N ∑S 2 i V ar ( Pi ) 2.3 i =1 FOA is computationally simpler to apply than other uncertainty estimations techniques and, therefore, has been widely used for uncertainty analysis (Tyagi and Haan, 2001). FOA assumes that the model has linear functional relations, small coefficients of variations of sensitive parameters, and near normal parameter distributions (Tyagi and Haan, 2001). In hydrologic and water quality modeling, these assumptions are rarely satisfied. Despite the shortcomings, FOA has been used by researchers to obtain information about various models and their parameters, and the effects on uncertainty in the model output. However, researchers also recommended using Monte Carlo simulation for uncertainty analysis. In this research, FOA was not investigated as an uncertainty quantification technique. 2.4.2 Monte Carlo Simulation In a Monte Carlo (MC) simulation, repeated runs of the model in question are executed using randomly selected input parameter values. The parameter values are chosen randomly for each simulation from a predetermined parameter-specific probability distribution. The process is repeated for a number of runs sufficient to converge on an estimate of the probability distribution of output variables (Gardner and O’Neill, 1983). As a contrast to deterministic modeling where only a single set of input parameter values is used to simulate water quality output, MC simulations can be used to estimate the water quality output resulting from a set of parameters derived from predefined parameter distributions. 10
  20. 20. The predefined parameter probability distributions reflect parameter uncertainty. Parameter distributions can be obtained from a review of the pertinent literature, historical data, professional judgment, or other uncertainty estimation techniques like Generalized Likelihood Uncertainty Estimation (GLUE), Bayesian Monte Carlo (BMC), and Markov Chain Monte Carlo (MCMC), which are discussed later. Depending upon the model and existing knowledge of the parameters, the modeler may need to provide the covariance among the parameters to sample the parameter values effectively. As discussed later, the parameter distributions may reflect the covariance implicitly if they are obtained from techniques like GLUE and MCMC. 2.4.3 Two-phase Monte Carlo simulation A two-phase Monte Carlo approach can be used to propagate and analyze stochastic variability and knowledge uncertainty separately in hydrology and water quality models (Hession et al., 1996). In a two-phase MC (TPMC) procedure, model parameters are classified as either knowledge uncertain or stochastically variable. Parameters about which knowledge is limited, or there is insufficient field data available to estimate their values, are considered knowledge uncertain. Values for these parameters are typically obtained through a model-calibration process. Stochastic parameters are those that vary spatially and/or temporally. Information about these parameters (typically a probability distribution) is generally estimated using available data and/or best professional judgment. Some parameters may be classified as both, knowledge uncertain and stochastically variable. Suppose a model has sensitive parameters a, b, x, and y of which the parameters a and b are knowledge uncertain and the parameters x and y are stochastically variable. To perform a TPMC analysis, i sets of knowledge uncertain parameters are generated by randomly sampling from predefined parameter probability distributions (figure 2.1). For each set of a and b parameter values, a set of n random values are generated for the stochastic parameters, x and y, from their respective predefined distributions (figure 2.1). The model is run for the n stochastically variable parameter values and the output is plotted as a cumulative distribution function (CDF). The CDF defines the probability of a given output (Helton, 1994). Each CDF represents the output distribution due to stochastic variability. Similar CDFs are generated for the i sets of knowledge uncertain parameters. The resulting family of CDF curves describes both the knowledge uncertainly and stochastic variability. 11
  21. 21. “Knowledge Uncertain Parameters” “Stochastically Variable Parameters” n iterations for parameters x and y 2nd set of values for parameters a and b . . . n iterations for parameters x and y . . . . . . . i th set of values for parameters a and b n iterations for parameters x and y . Probability 1st set of values for parameters a and b CDFs resulting from stochastic variability Family of CDFs that describe both knowledge uncertainty and stochastic variability. Figure 2.1 A two-phase Monte Carlo analysis to illustrate the effect of knowledge uncertainty and stochastic variability (adapted from Hession et al., 1996). Used under fair use guidelines, 2011 2.4.4 Regionalized Sensitivity Analysis Regionalized Sensitivity Analysis (RSA) or Generalized Sensitivity Analysis (GSA) is a MC sampling approach to evaluate the sensitivity of model parameters suggested by Hornberger and Spear (1981). RSA can be used for selecting future sampling parameter distributions. To conduct RSA, the modeler must first define the range of key response variables as behavioral (within an acceptable/reasonable range) or non-behavioral (outside an acceptable range). The modeler then samples parameters from a set of predefined parameter distributions that are called “prior distributions.” These prior distributions reflect the knowledge of the modeler about the parameters that define the system. Parameter samples generated from prior distributions are used in the model to simulate key response variables. The parameter sets generating responses in the acceptable behavioral range are accepted and the remaining outputs are rejected. The CDFs of parameter sets that generated both behavioral and non-behavioral response variables are compared. If the two CDFs are significantly different, then the key response variables are sensitive to the parameters, and viceversa. This exercise is conducted to identify critical uncertainties in the present knowledge of the system that can be used to better plan future research. In this research, RSA was not 12
  22. 22. investigated; instead, Generalized Likelihood Uncertainty Estimation (GLUE) that is a successor of RSA was investigated. 2.4.5 Generalized Likelihood Uncertainty Estimation (GLUE) Generalized Likelihood Uncertainty Estimation (GLUE) is a successor of RSA proposed by Beven and Binley (1992). The basic premise of GLUE is that there is not a single optimum set of parameters for a hydrologic model. Instead, there are multiple sets of parameters that acceptably represent a hydrologic model – a phenomena known as “equifinality.” In the GLUE approach, MC simulation is performed by generating different sets of parameters from prior distributions. In the majority of previous GLUE applications found in the literature, the prior distributions of parameters were uniform (Beven, 2001). A likelihood weight is assigned to all the parameter sets depending upon their ability to be a simulator of the system. When using GLUE, the likelihood term can be evaluated using any “goodness of fit” criterion that is used to compare observed and simulated response variables (Stow et al., 2007). This likelihood definition is different from the statistical definition of “likelihood function” and is a controversial aspect of GLUE (Stedinger et al., 2008). In GLUE, the likelihood can be calculated using several different methods and can take into account one or more response variables. Beven and Binley (1992) illustrated several ways to calculate likelihood values. The likelihood values can be based on a single or multiple observed responses as illustrated in equations 2.4 and 2.5, respectively (for a more detailed list, refer to Beven and Binley (1992)). Le = (σ e2 ) − N 2.4 Where,  n  σ e 2 = 1 n  ∑ (Yi − Qi )   i =1  2 Le = likelihood value, σe2 = variance of the residuals or mean square error n = number of data points Yi = observed data point Qi = simulated data point 13
  23. 23. N = shaping parameter, chosen by the user.  m (W j )    Lm = ∑ 2   j =1 σ ej    N 2.5 where, Lm = likelihood function based on m observed responses Wj = weight of response variable j σej = error variance of jth response variable When applying GLUE, a parameter vector θ is generated for each model run, and each model run results in a likelihood value. All the parameter sets having likelihood values that are in the acceptable range are retained for consideration. The likelihood values of the retained parameters sets are normalized so that the sum of likelihood values is unity. The normalized likelihood values can be treated as the probabilistic weighting function for the predicted variables and can be used to assess the uncertainty associated with the predictions. A distribution function of the predicted output may be calculated by plotting the predicted values against the likelihood of each prediction. Defining the uncertainty limits as the 5th and 95th percentile of the cumulative likelihood distribution yields a 90% probability interval. Likelihood weights can be used to update the prior parameter distribution of input parameters using Bayesian equation (equation 2.6) (Fisher, 1922). The distribution of parameters resulting due to updating with available data is called the “posterior distribution”. L p (θ | y ) = L y (θ | y ) L o (θ ) 2.6 where, LO(θ) = prior distribution of parameters Ly(θ|y) = calculated likelihood function of the parameter sets, and Lp(θ|y) = posterior likelihood distribution of parameter sets As evident from the equation 2.6, posterior distributions are a result of modeler’s prior knowledge about the system and the observed data. Beven and Binley (1992) showed that the uncertainty is reduced when the likelihoods are updated with new observations. However, they also note that uncertainty cannot decline indefinitely, and may increase, as the hydrological parameters are stochastic in nature and behave differently in different storm events in the same watershed. This property limits the possibility of finding one optimum parameter set for a watershed model. Freer at al. (1996) applied GLUE to a simple hydrologic model, TOPMODEL to 14
  24. 24. evaluate predictive uncertainty using different likelihood measures. They also demonstrated that using additional data to update the likelihood function could help to constrain the uncertainty bound of model prediction. The GLUE approach has been widely used to conduct uncertainty analysis using different hydrologic modeling software (Balin, 2004; Beven and Binley, 1992; Freer et al., 1996). There have been a few attempts to quantify uncertainty in water quality modeling using GLUE (e.g. Benaman and Shoemaker, 2002; Setegn el al., 2009; Stow et al., 2007; Zheng and Keller, 2007) and GLUE has been suggested as a viable approach to estimate uncertainty in TMDLs (Stow et al., 2007). The application of GLUE for estimating uncertainty in a watershed model developed using HSPF or for TMDL development is practically non-existent. 2.4.6 Bayesian Monte Carlo Uncertainty Analysis Bayesian Monte Carlo (BMC) uncertainty analysis (Dilks et al., 1992) is also a successor of the RSA technique illustrated by Hornberger and Spear (1981). However, as opposed to RSA, BMC does not categorize model outputs as acceptable or non-acceptable. In BMC, the likelihood function of each parameter set is used to weight the parameter set. The parameter sets with greater likelihood values have greater weight than the parameter sets with lower likelihood value. In this approach, the model assumes an error (ε) such as 2.7 Y = g [ x, θ ] + ε Where, Y = response variable g = model that is a function of state variable x, and input parameter θ ε = model error that is normally distributed The likelihood function for each parameter set in BMC is defined as (Dilks et al., 1992). L(θ | Y ) = 1  1 n  ε 2  exp  − ∑  i   2πσ  2 i =1  σ     2.8 where, L(θ|Y) = likelihood function, εi = error term at the individual data point, i, n = number of observed data points, and 15
  25. 25. σ = standard deviation of the data error. The likelihood varies as the function of data error, number of data points, and the standard deviation of data error. With high standard deviation, the value of likelihood remains constant over a wide range of data error, however, with low standard deviation, the likelihood value decreases as the model error increases. For multiple state variables, likelihood can be calculated as n m L(θ | e1 , e2 .....en ) = ∏∏ i =1 j =1  1 n m e 1 ij exp  − ∑∑   2 i =1 j =1  σ j 2πσ j       2     2.9 Dilks et al. (1992) applied the BMC technique to a Grand River dissolved oxygen model in Michigan. When the resulting parameter posterior distributions were used, uncertainty decreased significantly. Two of the applications of BMC include analysis of estimates for managing Lake Erie levels (Venkatesh and Hobbs, 1999), and a biochemical oxygen demand (BOD) decay model (Qian et al., 2003). No applications of BMC with HSPF have been published. 2.4.7 Markov Chain Monte Carlo Markov Chain Monte Carlo (MCMC) is a commonly used technique for Bayesian inference by statisticians (Kass et al., 1998). It can also be termed as a special case of BMC. The MCMC method generates samples of parameter values from the posterior distribution by constructing a Markov Chain that has the posterior distribution as its equilibrium distribution (Robert and Casella, 2004). Metropolis et al. (1953) proposed an algorithm to build a Markov chain. An important step in building a Markov chain is the choice of a statistical likelihood function. The statistical likelihood function in MCMC is similar to the likelihood function used in BMC. For n observations, as is the case with time-series output, the likelihood function is given by L(θ | Y ) = 1  1 n  exp  − 2 ∑ (Yi − Qi )2  n n ( 2π ) σ  2σ i =1  2.10 Where, σ = variance of residuals, Yi = i th observed data point, and Qi = i th simulated data point This equation assumes that the residuals between observed and simulated values, or the errors, are normally distributed. To build a Markov chain where a new parameter value is sampled using the previous value, a jump specification is required. The new parameter vector 16
  26. 26. (θnew) is sampled near the previous parameter vector (θold) using the symmetric probability distribution or π(θold|θnew) = π(θnew|θold). This symmetric distribution is centered on the last accepted parameter value by the relationship θnew|θold = N(θold ,s•I), where s is the variance scaling factor and I is the identity matrix. The variance scaling factor affects the movement of Markov chain towards equilibrium. A high variance scaling factor might lead to slow chain movement and a very small variance scaling factor can result in haphazard parameter chain movement in all the possible parameter space. Although there is guidance to estimate the scaling factor, it is typically obtained by trial and error (Gelman et al., 2000). Once the new parameter set is obtained, it is either accepted or rejected. This step is central point to the Metropolis algorithm. Acceptance or rejection of the new parameter set is determined by the ratio of the posterior probability density functions from the new and old parameter sets (equation 2.11). r= π (θnew | Y ) π (θold | Y ) 2.11 The Metropolis algorithm rule is used to accept or reject a new parameter: If r > 1, accept the new parameter set If r <1, generate a random number u from a uniform distribution [0,1] If r > u, accept the new parameter set If r < u, reject the new parameter set There are several special cases of the Metropolis algorithm, including Metropolis-Hastings Algorithm (Hastings, 1970), Gibbs algorithm (Geman and Geman, 1984), and Metropolis within Gibbs Algorithm (Gelfand and Smith, 1990). These algorithms provide different ways to sample new parameter values from the old parameters. The MCMC approach has been used by many researchers to estimate posterior distributions and uncertainty with various hydrologic modeling software (Balin, 2004; Kuczera and Parent, 1998; Makowski et al., 2002; Marshall et al., 2004), and it has been suggested as a viable approach for estimating uncertainty in water quality modeling (Stow et al., 2007). However, application of the MCMC approach to estimate uncertainty in a watershed-scale water quality model has been limited. In this research, MCMC was used as a Bayesian technique to estimate uncertainty in a water quality model developed using HSPF. 17
  27. 27. 2.5 Summary Water quality modeling is often central to TMDL development and other similar watershed management efforts. It is widely recognized that the added information about the inherent modeling uncertainty in water quality modeling can aid stakeholders and decision makers in making more informed watershed management decisions. Stakeholders and decision makers can use uncertainty information to help decide among different water quality management plans and/or to direct planning efforts towards specific pollution sources. With the advent of faster computers, Monte Carlo methods have gained popularity as viable techniques for uncertainty analysis. However, the presence of many competing uncertainty estimation techniques makes it difficult to conduct the uncertainty analysis and interpret the results. Bayesian uncertainty estimation techniques have been shown to be particularly useful since they allow the model parameters to be updated as the new data becomes available. Stow et al. (2007) compared several of these Monte Carlo based techniques using a simple Streeter-Phelps model. They suggested MCMC as a viable technique to conduct uncertainty analysis for complex watershed models. Qian et al. (2003) compared MCMC and BMC on a simple BOD decay model and suggested MCMC as a better uncertainty estimations approach for higher dimensional models. Most of these techniques have, however, rarely been used on watershed-scale water quality models that are often used in developing watershed management plans, like HSPF. The research presented here performs uncertainty analysis associated with HSPF modeling of FC in a small watershed in Virginia using single-phase MC simulation, two-phase MC, GLUE, and MCMC. These techniques and their results were compared with each other and suggestions were made for future research and applications of these techniques. References: Balin, D. 2004. Hydrological Behaviour through Experimental and Modelling Approaches. Application to the Haute-Mentue Catchment. School of Acrhitecture, Civil and Environmental Engineering. Lausanne, Switzerland. Ecole Polytechnique Fédérale de Lausanne. Beck, M.B. 1987. Water quality modeling: A review of the analysis of uncertainty. Water Resources Research 23(8): 1393-1442. Benaman, J., and C.A. Shoemaker. 2002. Sensitivity and uncertainty analysis of a distributed watershed model for the TMDL process. National TMDL Science and Policy 2002 Speciality Conference, Water Environment Federation. Benham, B.L., C. Baffaut, R.W. Zeckoski, K.R. Mankin, Y.A. Pachepsky, A.M. Sadeghi, K.M. Brannan, M.L. Soupir, and M.J. Habersack. 2006. Modeling bacteria fate and transport in watersheds to support TMDLs. Tran. ASABE. 49(4): 987-1002. 18
  28. 28. Benham, B.L., K. Branna, K. Christophel, T. Dillaha, L. Henry, S. Mostaghimi, R. Wagner, J. Wynn, G. Yagow, and R. Zeckoski. 2004. Total maximum daily load development for Mossy Creek and Long Glade Run: Bacteria and general standard (Benthic) impairments. Richmond, Va.: Virginia Department of Environmental Quality. Available at http://www.deq.state.va.us/tmdl/homepage.html. Assessed Jan 25, 2005. Benham, B.L., K. Brannan, T. Dillaha, S. Mostaghimi, G. Yagow. 2002. TMDLs (Total Maximum Daily Loads) – Terms and Definitions. Virginia Cooperative Extension. Pub No. 442-550. Beven, K.J. 2001. Rainfall-Runoff Modeling – The Primer, John Wiley and Sons, New York, 360p. Beven, K. 1993. Prophecy, Reality and uncertainty in distributed hydrological modeling. Adv. Water Resources. 16(1): 41-51. Beven, K.. and A. Binley (1992). The Future of Distributed Models: Model Calibration and Uncertainty Prediction. Hydrological Processes 6(3): 279-298. Beven, K. 1989. Changing Ideas in Hydrology – The case of Physically Based Models. Journal of Hydrology. 105(1-2): 157-172. Bicknell, B.R., J.C. Imhoff, J.L. Kittle, Jr. T.H. Jobes, and A.S. Donigian, Jr. 2005. HSPF Version 12.2 User’s Manual. AQUA TERRA Consultants. Mountain View, CA. Bingner, R.L., and F.D. Theurer. 2001. AnnAGNPS Technical Processes: Documentation Version 2. Unpublished report. Oxford, Miss.: USDA-ARS National Sedimentation Laboratory. Dilks, D.W., R.P. Canale, and P.G. Meier. 1992. Development of Bayesian Monte Carlo Techniques for Water Quality Model Uncertainty. Ecological Modeling. 62(1-3): 149-162. Donigian, A.S., and J.T. Love. 2007. The Housatonic River Watershed Model: Model Application and Uncertainty Analyses. 7th International IWA Symposium on System Analysis and Integrated Assessment in Water Management, May 7-9, 2007. Washington, DC. WATERMATEX Proceedings on CD-ROM. Fisher, R.A. 1922. On the Mathematic Foundation of Theoretical Statistics. Phil. Trans. Roy. Soc. London, A. 222: 309-368. Freer, J., K. Beven, and B. Ambroise. 1996. Bayesian Estimation of Uncertainty in Runoff Prediction and the Value of Data: An Application of the GLUE Approach. Water Res. Res. 32(7): 2161-2173. Gardner, R.H., and R.V. O’Neil. 1983. Parameter Uncertainty and Model Predictions: A Review of Monte Carlo Results. In Uncertainty and Forecasting of Water Quality, eds. M.B. Beck, and G. Van Straten, 345 -257. Berlin, Germany: Springer-Verlag. Gelfand, A.E., and A.F.M. Smith. 1990. Sampling Based Approaches to Calculating Marginal Densities. J. Amer. Statistical Ass.85: 398-409. Gelman, A., J.B. Carlin, H.S. Stern, and R.D.B. 2000. Bayesian Data Analysis. Boca Raton, London, New York, Washington D.C., Chapman&Hall/CRC. Gelman A., and D. Rubin. 1992. Inference from Iterative Simulation using Multiple Sequences. Statistical Science. 7: 457-511. Geman, S., and D. Geman. 1984. Stochastic Relaxation, Gibbs’ distribution and Bayesian restoration of images. IEEE trans. PAMI 6: 721-741. Hastings, W.K. 1970. Monte Carlo methods using Markov chains and their applications. Biometrika.57: 97-109. 19
  29. 29. Helton, J.C. 1994. Treatment of uncertainty in performance assessment for complex systems. Risk Analysis. 14(4): 483-511. Hession, W.C., D.E. Storm, and C.T. Haan. 1996. Two-phase uncertainty analysis: an example using universal soil loss equation. Trans. ASAE. 39(4): 1309-1319. Hornberger, G.M., and R.C. Spear. 1981. An approach to the preliminary analysis of environmental systems. Journal of Env. Mgmt (12): 7-18. Im, Sangjun, K.M. Brannan, S.M. Mostaghimi, and Jaepil Cho. 2004. Simulating Fecal Coliform Bacteria Loading from an Urbanizing Watershed. Journal of Environmental Science and Health. A39(3): 663-679. Kass, R.E., B.P. Carlin, A. Gelman, R.M. Neal. 1998. Markov Chain Monte Carlo in Practice. A roundtable Discussion. The American Statistician. 52(2): 93-100. Kuczera, G., and E. Parent. 1998. Monte Carlo Assessment of Parameter Uncertainty in Conceptual Catchment Models: the Metropolis Algorithm. Journal of Hydrology. 211: 69-85. Makowski, D., D. Wallach, and M. Tremblay. 2002. Using a Bayesian approach to parameter estimation; Comparison of GLUE and MCMC methods. Agronomie. 22: 191-203. Marshall, L. D. Nott, and A. Sharma. 2004. A comparative study of Markov chain Monte Carlo methods for conceptual rainfall-runoff modeling. Water Res. Research. 40(2). doi:10.1029/2003WR002378, 2004. MacIntosh, D.L., G.W. Suter., and F.O. Hoffman. 1994. Uses of probabilistic exposure models in ecological risk assessments of contaminated sites. Risk Analysis. 14(4): 405-419. Metropolis, N., A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, and E. Teller. 1953. Equation of state calculations by fast computing machines. J. Chemical Phys. 21(6): 1087-1092. Moore, J.A., J.D. Smythe, E.S. Baker, J.R. Miner, and D.C. Moffitt. 1989. Modeling Bacteria Movement in Livestock Manure Systems. Transactions of the ASAE, 32(3): 1049-1053. Morgan, M.G., and M. Henrion. 1992. Uncertainty. Cambridge University Press, Cambridge, UK. Neitsch, S.L., J.G. Arnold, J.R. Kiniry, and J.R. Williams. 2005. Soil and Water Assessment Tool Theoretical Documentation. Version 2005. Grassland Soil and Water Research Laboratory, Agriculture Research Service. Temple, TX. Overcash, M.R., K.R. Reddy, and R. Khaleel. 1983. Chemical processes and transport of animal waste pollutants. In Agricultural Management and Water Quality, 109-125. Ames, Iowa: Iowa State University Press. Pappenberger, F. and K.J. Beven. 2006. Ignorance is Bliss: Or seven reasons Not to Use Uncertainty Analysis. Water Res. Res. 42(5). W05302, doi:10.1029/2005WR004820. Paul, S., P.K. Haan, M.D. Matlock, S. Mukhtar, and S.D. Pillai. 2004. Analysis of the HSPF water quality parameter uncertainty in predicting peak in-stream fecal coliform concentrations. Trans. ASAE. 47(1): 69-78. Qian, S.S., C.A. Stow, and M.E. Borsuk. 2003. On Monte Carlo methods for Bayesian inference. Ecol. Modeling. 159(2-3): 269-277. Robert, C.P., and G. Casella. 2004. Monte Carlo Statistical Methods. Springer-Verlag, New York City, NY. Rosen, B.H. 2000. Waterborne Pathogens in Agriculture Watersheds. NRCS, Watershed Science Institute, Burlington, VT. 20
  30. 30. Setegn, S.G., R. Shrinivasan, A.M. Melesse, B. Dargahi. 2009. SWAT model application and uncertainty analysis in the Lake Tana Basin, Ethiopia. Hydological processes. 24(3): 357367 Springer, E.P., G.F. Gillford, M.P. Windham, R. Thelin, M. Kress. 1983. Fecal coliform release studies and development of a preliminary nonpoint source transport model for indicator bacteria. Logan, Utah: Utah State University, Utah Water Research Laboratory. Stedinger, J.R., R.M. Vogel, S.U. Lee, and R. Betchelder. 2008. Appraisal of the Generalized Likelihood Uncertainty Estimation (GLUE) Method. Water Res. Res. 44. W00B06, doi:10.1029/2008WR006822. Stow, C.A., K.H. Reckhow, S.S. Qian, E.C. Lamon, G.B. Arhonditsis, M.E. Borsuk, and D. Seo. 2007. Approaches to evaluate water quality model parameter uncertainty for adaptive TMDL implementation. J. Amer. Water Resources Ass. 43(6): 1499-1507. Tyagi, A., and C.T. Haan. 2001. Uncertainty Analysis using Corrected First Order Approximation method. Water Resources Research. 37(6): 1847-1858. USEPA (United States Environmental Protection Agency). 2011. Virginia Water Quality Assessment Report. Available: http://iaspub.epa.gov/waters10/attains_state.control?p_state=VA (Accessed 11 Feb 2011) USEPA. 2009. National Section 303(d) list fact sheet. [cited 2009 Nov]. Available from: URL: http://oaspub.epa.gov/waters/national_rept.control. USEPA. 2002. 2000 National Water Quality Inventory. USEPA Office of Water. Available at: http://www.epa.gov/305b/2000report Accessed May, 2004. USEPA. 2001. The National Cost of the Total Maximum Daily Load Program (Draft Report). Office of Water, United States Environmental Protection Agency. Washington D.C. VADCR. 2003. Bacteria TMDLs for Abrams Creek and Upper and Lower Opequon Creek Located in Frederick and Clarke County, Virgnia. Virginia Department of Environmenatal Quality, and Virginia Department of Conservation and Recreation. Richmond, VA, USA. Venkatesh, B.N., and B.F. Hobbs. 1999. Analyzing investments for Managing Lake Erie Levels Under Climate Uncertainty. Water Resources Research. 35(5): 1671-1684. Walker, S.E., S.M. Mostaghimi, T.A. Dillaha, F.E. Woeste. 1990. Modeling animal waste management practices: Impacts on bacteria levels in runoff from agricultural lands. Trans. ASAE 33(3): 807-817. Yagow, G.W., T.A. Dillaha., S.M. Mostaghimi, K.M. Brannan, C.D. Heatwole, and M.L. Wolfe. 2001. ASAE Annual International Meeting, Sacramento, CA. Paper – 01-2066. Young, R.A., C.A. Onstad., D.D. Bosch, and W.P. Anderson. 1987. AGNPS, Agricultural Nonpoint Source Pollution Model: A Watershed Analytic Tool. Conservation Research Report 35. Washington, D.C.,: USDA. Zeckoski, R. W., B.L. Benham, S. B. Shah, M.L. Wolfe, K.M. Brannan, M. Al-Smadi, T.A. Dillaha, S. Mostaghimi, C.D. Heatwole. 2005. BSLC: A tool for bacteria source characterization for watershed management. Applied Engg. In Agriculture.21(5): 879-889 Zheng, Y., and A.A. Keller. 2007. Uncertainty Assessment in Watershed-Scale Water Quality Modeling and Management:1. Framework and Application of Generalized Likelihood Uncertainty Estimation (GLUE) Approach. Water Res. Res. 43. doi: 10.1029/2006WR005345 21
  31. 31. Chapter 3. Evaluation of the applicability of single-phase and two-phase Monte Carlo analysis to estimate uncertainty in HSPF based water quality modeling. Abstract. Single-phase and two-phase Monte Carlo (MC) simulations were performed to estimate overall predictive uncertainty in violations of in-stream fecal coliform (FC) concentration for a Hydrological Simulation Program–FORTRAN (HSPF) model developed for Mossy Creek Total Maximum Daily Load (TMDL) in Virginia. Additionally, two-phase MC was also used to partition the effects of knowledge uncertainty and stochastic variability on output uncertainty. The two techniques were used in conjunction with two FC pollutant allocation scenarios presented in the Mossy Creek bacterial TMDL. The scenarios differed in the reductions specified from cattle directly depositing FC in the stream, and FC loadings from cropland. As estimated by the two techniques, the instantaneous FC criterion was violated less than one percent of the time (on daily basis) during the prediction period for both the allocation scenarios. However, the violations increased as high as 8% (two-phase MC) and 14% (single-phase MC) when 97.5% quantile of the output FC concentration was plotted for the scenario allowing a greater amount of direct deposit of FC in the stream. The two-phase MC results illustrated that cattle direct deposit of FC is a greater source of knowledge uncertainty than cropland FC loadings. Decision makers can use the results of an assessment like this to choose their level of confidence in achieving a water quality standard, selecting among the scenarios or prioritizing implementation efforts. Among single- and two-phase MC simulation, single-phase Monte Carlo is more computationally efficient while two-phase MC simulation can provide additional information about the effect of knowledge uncertainty and stochastic variability. With respect to watershed-scale water quality modeling, a satisfactory and unambiguous model parameter categorization may be difficult to achieve limiting the applicability of two-phase Monte Carlo simulation in these kinds of applications. Keywords. Water quality modeling, HSPF, Monte Carlo, two-phase Monte Carlo, TMDL, indicator bacteria, fecal coliform. Introduction Water quality models are often used to develop total maximum daily loads (TMDLs). A TMDL quantifies the amount of a given pollutant a waterbody can receive and still meet water quality standards. It includes pollution from permitted point sources, nonpoint, and natural background sources and a margin of safety (Benham et al., 2002). Mathematically, a TMDL can be represented as 22
  32. 32. TMDL = ∑WLA + ∑ LA + MOS 3.1 where, ΣWLA = waste load allocation (point sources), and ΣLA = load allocation (non-point sources), MOS = margin of safety A margin of safety is included within a TMDL to account for the inherent uncertainty present in determining the TMDL. Uncertainty, which is always present when simulating a natural system, is a result of limited knowledge of the process being modeled and inherent stochastic (spatial or temporal) variability within that system (Beck, 1987; Suter et al., 1987). Without some measure of the uncertainty, one cannot accurately assess the probability of achieving a given water quality criteria or the risk of violating it. Although needed, there is limited science-based guidance available on how to estimate the amount of uncertainty associated with a TMDL. In 2001, the U.S. Environmental Protection Agency (USEPA) estimated the annual average cost of developing TMDLs to be $63-69 million per year for the next fifteen years and the cost of implementing TMDLs to be between $1 and 3.4 billion per year for the next decade (USEPA, 2001). Pathogens, typically represented by a surrogate indicator bacteria (IB), being the second most widespread cause of water quality impairments (USEPA, 2006) will be responsible for a significant share of this expense. In spite of the potentially significant costs associated with developing bacterial impairment TMDLs, there have been only few attempts (e.g. Stow et al., 2007) to quantify the uncertainty associated with the modeling often performed to develop these TMDLs. Most water quality modeling software currently used when developing TMDLs includes modules that are empirical or a mix of empirical and process-based. These software do not typically include uncertainty analysis capabilities. Hydrological Simulation Program–FORTRAN (HSPF) is a continuous-simulation model that simulates various hydrologic and water quality processes (Bicknell et al., 2005). It is a lumped parameter, watershed scale model, and produces a deterministic time-series of hydrology and water quality. HSPF has been used to develop a significant number of IB impairment TMDLs in Virginia (e.g. Benham et al., 2005; VADCR, 2003; Yagow, 2001). Uncertainty in model predictions can be estimated using two categories of methods, namely Monte Carlo methods and first-order variance propagation (Beck, 1987; Summers et al., 23
  33. 33. 1993). First-order methods assume linear models, which limit their usability with respect to a complex model like HSPF (Summers et al., 1993). Paul et al. (2004) conducted a first-order analysis to estimate the contribution of sensitive parameters to the fraction of variance in simulated peak in-stream fecal coliform (FC) concentrations (a common IB) in a watershed modeled with HSPF. They inferred that small uncertainties in selected water quality parameters could result in large uncertainties in the prediction of in-stream FC concentration. Paul et al. (2004) realized the limitations of first order variance methods for uncertainty analysis and recommended Monte Carlo based analysis to evaluate uncertainty in IB modeling using HSPF. Monte Carlo simulation is a method in which repeated simulations of the model in question are performed using randomly selected input parameter sets. Parameter values are selected from parameter-specific probability distributions. The process is repeated for a number of simulations (iterations) sufficient to converge on an estimate of the probability distribution of output variables (Gardner and O’Neill, 1983). The results from these iterations can be aggregated to obtain relevant statistics about model output. Donigian and Love (2007) used Monte Carlo simulations to estimate uncertainty in hydrology and sediment modeling in a HSPF model developed for Housatonic river watershed. Two-phase Monte Carlo analysis is a MC-based technique that propagates stochastic variability and knowledge uncertainty separately based on the methodology proposed by Helton (1994) and MacIntosh et al. (1994). Stochastic variability is the property of a natural system and can be further divided into spatial and temporal variability. Knowledge uncertainty is due to incomplete understanding of the system being modeled and can also be termed as subjective uncertainty. Separating knowledge uncertainty and stochastic variability is important to draw useful insights into the model (Helton, 1994; MacIntosh et al., 1994). Knowledge uncertainty can be reduced by collecting more information about the system and hence it can be used as an indicator of the beneficial effects of collecting additional data (Hession et al., 1996). Stochastic variability normally cannot be reduced as it is the natural property of the system, but it can be quantified. Information about uncertainty in water quality modeling can be used by decision makers and stakeholders to choose their level of confidence in achieving a particular water quality standard and the associated pollutant reductions needed to achieve that confidence level. Thus, an understanding about the source and amount of uncertainty is needed to effectively compare TMDL pollutant allocation scenarios. For this study, a two-phase MC analysis was used to 24
  34. 34. independently evaluate the knowledge uncertainty and stochastic variability associated with predicted in-stream IB concentrations from a watershed model that was developed using HSPF. We also conducted simple Monte Carlo analysis on the same watershed model to compare the two uncertainty estimation techniques and evaluate their applicability. 3.1 Materials and Methods 3.1.1 Monte Carlo Simulation In a simple or single-phase Monte Carlo (MC) simulation, repeated model runs are performed using parameter values that are randomly selected from a predetermined probability distribution for each simulation. The predetermined parameter-specific probability distributions used in MC are reflective of parameter uncertainty. Parameter distributions can be obtained from a review of the pertinent literature, existing data, best professional judgment, or other uncertainty estimation techniques like Generalized Likelihood Uncertainty Estimation (GLUE), Bayesian Monte Carlo (BMC), and Markov Chain Monte Carlo (MCMC). The application of these other uncertainty estimation techniques is beyond the scope of the research reported here. Depending upon the model and existing knowledge about the parameters, the modeler may also need to provide the covariance among the parameters to sample the parameter values effectively. The parameter distributions that are obtained using some techniques (like GLUE, and BMC) account for covariance implicitly. In this application, the parameters were assumed independent and covariance relationship was not provided. 3.1.2 Two phase Monte Carlo simulation In a two-phase MC procedure (TPMC), model parameters are classified as either knowledge uncertain or stochastically variable. Parameters about which knowledge is limited, or there are insufficient field data available to estimate their values, are considered knowledge uncertain. Values for these parameters are typically obtained through a model-calibration process. Stochastic parameters are those that vary spatially and/or temporally. Information about these parameters (typically a probability distribution) is generally estimated using the data available. Some parameters may be classified as both, knowledge uncertain and stochastically variable. Suppose a model has parameters a, b, x, and y of which the parameters a and b are knowledge uncertain and parameters x and y are stochastically variable. To perform a TPMC analysis, i sets of knowledge uncertain parameters are generated by randomly sampling from a predefined parameter probability distribution (figure 3.1). For each set of a and b values, a set of 25
  35. 35. n random values are generated for the stochastic parameters, x and y, from the predefined distributions for each parameter. The model is run for the n stochastically variable parameter values and the output is plotted as a cumulative distribution function (CDF). The CDF defines the probability of a given output (Helton, 1994). Each CDF represents the output distribution due to stochastic variability. Similar CDFs are generated for the i sets of knowledge uncertain parameter random values. The resulting family of CDF curves describes both the knowledge uncertainly and stochastic variability. “Knowledge Uncertain Parameters” “Stochastically Variable Parameters” n iterations for parameters x and y 2nd set of values for parameters a and b . . . n iterations for parameters x and y . . . . . . . i th set of values for parameters a and b n iterations for parameters x and y . Probability 1st set of values for parameters a and b CDFs resulting from stochastic variability Family of CDFs that describe both knowledge uncertainty and stochastic variability. Figure 3.1 A two-phase Monte Carlo analysis to illustrate the effect of knowledge uncertainty and stochastic variability (adapted from Hession et al., 1996). Used under fair use guidelines, 2011 3.1.3 Modeling 3.1.3.1 Study Area Mossy Creek, located in Rockingham and Augusta counties in Virginia (figure 3.2), was selected for this research. Mossy Creek was listed as impaired in 1996 due to violations of the instantaneous FC criterion and a TMDL was developed for Mossy Creek by the Department of Biological Systems Engineering (BSE) at Virginia Tech (Benham et al., 2004). The Mossy Creek watershed (4076 ha) is characterized as a rolling valley with the Blue Ridge Mountains to the east and the Appalachian Mountains to the west. The predominant land uses in Mossy Creek watershed are forest, pasture, and croplands. The primary sources of FC identified in the Mossy 26
  36. 36. Creek TMDL were direct deposition of feces in the stream by cattle (cattle loitering and defecating in the stream), and runoff from pastures where grazing animals defecate. Figure 3.2 Mossy Creek Watershed (Benham et al., 2004). Used under fair use guidelines, 2011 Mossy Creek was monitored monthly by the Virginia Department of Environmental Quality (DEQ) between July 1992 and March 2003 for FC concentration and other selected water quality constituents at the station ID 1BMSS001.35 located near the outlet of the Mossy Creek watershed. BSE monitored Mossy Creek semi-monthly between February 1998 and December 2001 for selected water quality constituents including FC concentration near the DEQ site (Site QMA in figure 3.3 ). Daily flow data were also collected from May 1998 to December 2002 at the same site. 3.1.3.2 Mossy Creek Watershed Model HSPF was used to develop the Mossy Creek bacterial impairment TMDL (Benham et. al., 2004). Mossy Creek was divided into eight subwatersheds for modeling and land use identification purposes (figure 3.3, table 3-1). Other data required by the model included rainfall, FC loading from cattle and wildlife, inflows from springs, solar radiation, and temperature as time series. TMDL modeling data development and acquisitions are described in the Mossy Creek TMDL (Benham et. al., 2004). 27
  37. 37. Figure 3.3 Mossy Creek watershed and its subwatersheds (Benham et al., 2004) Used under fair use guidelines, 2011 Table 3-1 Land use distribution of Mossy Creek watershed (Benham et al., 2004) Land use Area (ha) Percent of total area (%) Forest Cropland Pasture Farmstead Low Density Residential High Density Residential Loafing Lot 1025.1 556.0 2347.6 55.0 87.0 3.6 1.6 25.15 13.64 57.59 1.35 2.13 0.09 0.04 Used under fair use guidelines, 2011 3.1.3.3 Hydrologic Parameters Simulation of FC by HSPF requires information about several hydrologic and water quality parameters. BASINS Technical Note 6 (USEPA, 2000) describes the hydrologic parameters and provides typical values and possible limits for all the hydrologic parameters. Typically, the values of these parameters are refined through model calibration. Al-Abed and Whiteley (2002), and Lawson (2003) listed several hydrological parameters that are typically calibrated and thus considered sensitive. This subset of sensitive parameters as described in subsequent sections was used in the MC and TPMC analysis. For the TPMC analysis, the parameters that could be estimated using GIS and field data were considered as stochastic (stochastically variable) and 28
  38. 38. the parameters that were estimated only through calibration were considered knowledge uncertain. 3.1.3.4 Stochastic Parameters Index to mean infiltration rate (INFILT) was the only model parameter considered to be stochastic as it can be estimated using GIS data for soil and land use. To estimate the distribution of INFILT for each land use, the Mossy Creek watershed was divided into 30 m by 30 m cells. A value of INFILT was assigned to each cell according to the land use and soil type based on guidance from BASINS technical note 6 (USEPA, 2000). A histogram of INFILT values of cells within each land use were then plotted to estimate land use-specific INFILT probability distributions. The INFILT histograms suggested a triangular distribution for all land uses except loafing lot (table 3-2). We assigned a uniform distribution to the INFILT parameter for the loafing lot land use, as it was a small area within the watershed. The limits of the distribution were set as the range of the observed loafing lot INFILT. Table 3-2 Distribution of stochastically variable parameter INFILT (index to mean infiltration rate, in/hr) by land use. Land use Forest Pasture High Density Residential Cropland Farmstead Low Density Residential Loafing Lot Distribution Triangular (0.05, 0.1, 1)† Triangular (0.04, 0.09, 0.9) Triangular (0.01, 0.01, 0.1) Triangular (0.03, 0.17. 0.24) Triangular (0.03, 0.15, 0.23) Triangular (0.03, 0.17, 0.26) Uniform (0.15, 0.23)‡ †Numbers in parentheses show lower limit, mode, and upper limit of the triangular distribution, respectively. ‡Numbers in parentheses show lower and upper limit of the uniform distribution, respectively. 3.1.3.5 Knowledge Uncertain Parameters The hydrologic parameters that were considered knowledge uncertain are listed in table 3-3 and table 3-4. Table 3-3 contains the parameters that were not varied according to the land use or time of the year. All the parameters listed in table 3-3, except interflow recession coefficient (IRC), were assigned a uniform distribution. Lower and upper limits for the distributions correspond to the typical minimum and maximum limits for these parameters from BASINS technical note 6 (USEPA, 2000). A uniform distribution was the most obvious choice for most parameters, as no additional information is available about these parameters in the BASINS technical note (USEPA, 2000). For IRC, while the typical upper limit is 0.7, it is also chosen as the starting value for calibration process, and can be taken to be the most probable value. 29
  39. 39. Therefore, IRC was assigned a triangular distribution with lower limit, mode and upper limit as 0.5, 0.7 and 0.7, respectively. 30
  40. 40. Table 3-3 Distribution of knowledge uncertain hydrology parameters for all land uses Parameter LZSN (inches) AGWRC DEEPFR BASETP AGWETP IRC INTFW † Parameter Description Lower zone nominal soil moisture storage Groundwater recession rate The fraction of infiltrating water lost to deep aquifers Evapotranspiration by riparian vegetation as active groundwater enters streambed Fraction of model segment that is subject to direct evaporation from groundwater storage Interflow recession coefficient Coefficient that determines the amount of water which enters the ground from surface detention and becomes interflow ‡Numbers Numbers in parentheses show lower and upper limit of the uniform distribution, respectively. the triangular distribution, respectively. Type of Distribution Uniform (3,8)† Uniform (0.92, 0.99) Uniform (0.0, 0.2) Uniform (0.0, 0.05) Uniform (0.0, 0.05) Triangular (0.5, 0.7, 0.7)‡ Uniform (1.0, 3.0) in parentheses show lower limit, mode, and upper limit of Knowledge uncertain parameters that were varied according to land use and time of year are listed in table 3-4. Again, upper and lower parameter distribution limits were assigned using BASINS technical note 6 (USEPA, 2000). The parameter distribution limits for the month of January are shown in table 3-4. Parameter values for the other months were calculated by multiplying the January distribution by a monthly adjustment factor that was generated based on similar TMDLs developed by BSE, and expert opinion. Table 3-4 Distribution of hydrologic parameters which vary according to the land use and time of year, for the month of January. Parameter UZSN (inches) Nominal upper zone soil moisture storage CEPSC (inches) Interception Storage Capacity LZETP Index to lower zone evapotranspiration †Numbers Land use Forest Cropland Pasture Farmstead, low and high density residential areas and loafing lots Forest Cropland Pasture Farmstead, low and high density residential areas and loafing lots Forest Cropland Pasture Farmstead, low and high density residential areas and loafing lots Distribution Uniform (0.2, 0.3)† Uniform (0.06, 0.1) Uniform (0.06, 0.1) Uniform (0.06, 0.1) Uniform (0.05, 0.075) Uniform (0.05, 0.075) Uniform (0.05, 0.075) Uniform (0.05, 0.075) Uniform (0.1, 0.2) Uniform (0.1, 0.2) Uniform (0.1, 0.2) Uniform (0.1, 0.2) in parentheses show lower and upper limit of the uniform distribution, respectively. 3.1.3.6 Water Quality Parameters Simulation of in-stream FC concentrations with HSPF requires the estimation of daily FC loading rates to the land surface (ACQOP), the asymptotic limit of accumulation of FC on the land surface (SQOLIM), and the FC loading rate directly deposited in the streams (direct deposit time series). The daily loading rates and asymptotic limits can be input as tables for monthly varying 31

×