The document outlines methods for automating the extraction of watershed characteristics from raster datasets to be used for regional environmental modeling. It describes developing custom ArcGIS tools to calculate descriptive statistics for multiple raster layers across thousands of watersheds in a batch process, avoiding manual and error-prone operations. A case study demonstrates extracting characteristics from 1 to 13 raster datasets representing topography, climate, soil, and land use across 1,466 watersheds. The automated process is estimated to save at least 95% of the labor time compared to manual methods. The tools make regional environmental studies more efficient.
JHydro - an implementation of the digital watershed
Automating regional statistics for environmental modeling
1. Automating regional descriptive statistic computations for environmental modeling Satoshi Hirabayashi Environmental Resources Engineering SUNY College of Environmental Science and Forestry, Syracuse, NY USA
2.
3. Low Streamflow Regional Regression (Kroll et al., 2004) Background % Standard Error 700 600 500 400 300 200 100 0 USGS USGS and Digital USGS, Digital, and Hydrogeology Entire US 29 regions 930 HCDN sites Focus !
I am gonna talk little bit about my past research, titled automating regional descriptive statistic computations for environmental modeling.
This is the same chart as Chuck’s talk, showing a comparison of low streamflow regression models constructed with three different sets of explanatory variables. My talk is focusing on these digitally derived watershed characteristics.
Low streamflow regression models generally take this form. Q7,10 is a 7-day 10-year streamflow statistic, Betas are model parameters, and X’s are watershed characteristics, like topography, climate, and soil information. The models can be constructed by first, deriving Xi’s from raster datasets using ArcGIS zonal statistics tool, and then inputing Q7,10 and potential X’s into a statistical software, SAS. We imput a large number of Xi’s as potential explanatory variables and the SAS picks Xi’s that best estimates the Q7,10.
This is how this tool works. Here is watershed layer, each polygon here represents a watershed boundary.
Then, overlay this layer on top a raster data.
The tool takes cells that are included within each watershed, and calculates descriptive statistics of these cell values, and results are stored in a table. In this table each row represents a watershed boundary, and columns represent descriptive statistics for this raster data. When you process another raster data, ideally, the results are appended to the same table, because eventually we want to have one table to input to SAS. But the zonal statistics tool can’t do that. Instead, separated tables are created for multiple raster data.
This is a problem of the zonal statistics tool. So what you need to do is to merge these tables created for multiple raster data into one table. This can be done by just copy and paste columns, but there is another problem. Columns in these tables have same name, mean or standard deviation, but in this table, those column names should be identifiable for each raster data, like a mean of elevation, standard deviation of precipitation, and so on. So you also need change the column names. When there are only 10 raster data, those table can be relatively easily merged manually.
But in our studies, we employ much more raster data. Here, my master’s thesis, fourteen hundred raster data were used, and I had three different watershed layers, each has 35 watersheds, so the number of tables I needed to merge were more than four thousand. In Chuck’s today’s talk, 28 raster and 112 tables needed to be merged. In my paper here, again fourteen hundred raster tables, and in this paper, 162 tables. So, for the first one, you need to manually copy & paste columns for 4000 times, and change the column names 4000 times.
So manual operation is very tedious, time-consuming, and prone to human errors. Motivated by these problems, we decided to develop a custom ArcGIS toolset.
Here is a user interface of that tool. Actually, that tool is just one tool in the GIS toolset we developed, named Arc watershed classification. In this toolset, most of the GIS operations for our research are customized and integrated. I only show this one tool today. Using this window, you can specify parameter files and other input to the tool. Then, press OK, everything is automatically done.
Here is a case study. In the same study region as Chuck’s talk, 144 watersheds.
We used hydro1k DEM.
Slope that is derived from the DEM
13 raster data from dataset called PRISM,representing monthly and yearly precipitation
and 12 raster data of soil classification from dataset called STATSGO,
And landcover from national landcover dataset.
Using these raster dataset, we used the developed tool and created a watershed characteristics database. This table can be inputted to SAS to construct regression equations.
The developed tool saved at least 95 % of the manual labor time. GIS toolset is versatile and can aid in a wide variety of environmental studies, meaning that the polygons don’t need to be watershed boundaries, that can be any boundaries like State, county, or town, and any raster dataset can be processed.