1. Understanding Soil Moisture Dynamics at the Small Catchment Scale:
A Geostatistical Approach
1. INTRODUCTION
The spatial and temporal variability in soil moisture is a phenomenon that has been well
documented [Western and Blöschl, 1998]. The underlying processes involved in the
redistribution of soil moisture include complex, nonlinear phenomenon [Famiglietti et al, 1998]
like soil texture, grain size distribution and atmospheric forcing. However, the successful
modelling of many ecological phenomenon, including subsurface flow and terrestrial energy
exchange, benefit largely from a better understanding of soil moisture dynamics [Ivanov et al,
2010]. In this study, data from the SoilNet wireless sensor networks (WSN) [Bogena et. al, 2010]
was used in order to understand soil moisture dynamics at a catchment scale. The sensors in
question are installed in the Wüstebach catchment, a subcatchment of the river Rur, situated
within the Eifel National Park. The catchment is a part of the TERENO Eifel/lower Rhine Valley
Observatory. A detailed description of the geographical location, topography, soil type and the
local climate can be found in Rosenbaum et al, 2012 as well as Bogena et al, 2010.
In order to ascertain the various influences on the dynamics of the soil moisture, geostatistical
analysis was performed on the data.
2. METHODS
As an initial step, the data was prepared as a time series, extending over a period of three years
and nine months, from 1st
July 2009 to 11th
March 2013. Although data from SoilNet is available
at a temporal resolution of 15 minutes, this particular study utilized data on a daily basis, since
the objective of the study was to understand the seasonal dynamics of soil moisture. Further, for
the purposes of this study, data was used only from 111 of the 150 enddevices installed in the
Wüstebach catchment. These 111 enddevices were chosen because of their more consistent
output in data over the study period, and since they were least plagued by instrumental errors.
On many occasions, it was found that enddevices or router units did not perform according to
expectations, mainly due to issues of maintenance [Rosenbaum et al, 2012].
The following procedures were employed in order to perform geostatistical analysis on the soil
moisture data.
2.1. Outlier Detection
The existence of outliers in spatially referenced data can have an influential effect on
geostatistical analysis. This is mainly because spatial outliers, being inconsistent data points,
can disrupt the stationarity of data that is an intrinsic assumption in geostatistical analysis
[Deutsch and Journel, 1992]. It is important to note that spatial outliers are local aberrations,
and as such, are only detected as outliers based on the data sourced from spatial neighbours
[Kou et al, 2006], and not from the ensemble data. The procedure adopted in order to detect the
existence of spatial outliers is discussed in detail in Kou et al, 2006 and in Chen et al, 2007.
2. It was assumed that the data was normally distributed. Under such an assumption, a probability
distribution function can be developed for a particular neighbourhood in which the datapoint in
question is located. Further, confidence intervals are inferred from this probability distribution
function, and the confidence value associated with the value of the datapoint under
consideration is evaluated. The datapoint is then accepted or rejected, based on the
confidence value. In this particular study, the neighbourhood was defined by the ten nearest
neighbours of the datapoint in question. This definition of the neighbourhood was adopted
since it was felt that the estimate made at a particular point must be consistently made from the
same number of neighbours. This, however, can be a particular disadvantage, especially for
points located at the boundaries of the studyarea, since the ten nearest neighbours for this
point will be quite distant from the point itself.
The datapoint was discarded if it was found to be outside the 98% confidence interval. In other
words, the point was discarded if the probability of the data coming from the normal distribution
adopted was less than 2%. On an average, it was found that around 3 to 4 data points per time
slice were rejected in this manner.
Additionally, it was found that a few sampling points in particular were consistently reported as
outliers. In many cases, the sampling points which performed poorly in the outlier test (ie: they
were reported as outliers for a large portion of the timeseries) were found to be points located
in groundwater influenced areas. It is therefore important to appreciate that the outlier test has
its own limitations. It is suggested that the outlier detection is performed on a regional basis ie:
the outlier detection is performed separately for groundwater influenced areas and groundwater
distant areas. Such a distinction could possibly be made from the soilmap.
2.2. Geostatistical Analysis
Geostatistical analysis has been used in the past [Rosenbaum et al, 2012; Western et al, 1999;
Huaxing et al, 2009] in order to ascertain the spatial structure of soil moisture distribution. As
such, geostatistics provides a rigorous, well defined method by which to study the distribution of
any form of spatially referenced data.
In this study, the data was analyzed by programs available from the Geostatistical Software
Library (GSLIB) [Deutsch and Journel, 1992].
The gamv routine of GSLIB was used to calculate semivariances. The number of lags was fixed
at 7 and the lag distance was 30 meters with a lag tolerance of 15 meters. It was generally
observed that an exponential model was often the best fit available, and hence it was uniformly
adopted in all cases (as was the case in Rosenbaum et al, 2012).
(1)(h) c γ = 0 + c1 1[ − e( )a
−3h
]
Here is the model semivariance as a function of the lag ‘h’, is the nugget, is the (h) γ c0 c1
structural semivariance (also known as the sill), and a is the range.
It is to be noted that, unlike Rosenbam et al, 2012, no accurate estimate of the “true” nugget
variance could be made, as the data set in question was devoid of the paired sensors
(separated by 0.05 meters) available in the case of Rosenbaum et al, 2012. Due to this
limitation, it must be mentioned that the fit parameters were not of a desirable quality and in
many cases were found to be unrealistic, since the data often displayed an unbound nature. It
3. was found in many cases that the model that was fit to the data had ranges and sills which were
of the order of 106
meters, which is more than three orders of magnitude higher than the
expected sill and range. In order to avoid this, the fitting algorithms was developed in such a
way, so as to ensure that the nugget variance was never negative (it was curtailed to a
minimum value of zero), and the range never exceeded 300 meters, as it was assumed that
spatial autocorrelation cannot exist beyond 300 meters. However, it must be noted that in the
case of Rosenbaum et al, 2012 data for which the model range exceeded 300 meters was
discarded, and was not used for further analysis.
In order to understand the spatial distribution of the data, kriging interpolation was performed on
the data set. Both, ordinary kriging and external drift kriging was performed on the data set. The
GSLIB routine kt3d was used to develop kriging results. The covariable for external drift kriging
was composed of a combination of the wetness index [Beven and Kirkby, 1979] and the soil
texture class. In order to compare the results of ordinary kriging and external drift kriging, cross
validation was performed and the rootmeansquare error (RMSE) was calculated for the
estimated value and the true value at the measurement points.
3. RESULTS AND DISCUSSION
3.1. Variogram Analysis
As mentioned earlier, due to the unavailability of data from paired sensors separated by 0.05
meters at each sampling location, as in the case of Rosenbaum et al, 2012, a rigorous estimate
of the ‘true’ nugget variance was not possible. This was mainly because the nugget had to be
obtained by means of extrapolation of the model that was fit to the data. In the case of
Rosenbaum et al, 2012, the nugget variance rarely approached 50 (vol%)2
, and was often
around 30 (vol%)2
for the sensors placed at 5cm depth. However, when an extrapolation of the
data is made to estimate the nugget, the nugget variance is comparatively much higher. The
nugget variance was consistently above 50 (vol%)2
, except for short periods between late
January and early March, during which the nugget took a constant value of 0 (vol%)2
. This
constant zero value is a consequence of the fitting algorithm which forces the nugget to zero if
the initial estimate is negative.
Further, of the 1403 days for which the data was analyzed, it was observed that around 900
days of data displayed a model variogram that was unbound in nature, ie: the estimated range
was well above 300 meters. This incorrect estimate can also be attributed to the lack of a ‘true’
nugget.
4.
Figure 1: An example of the zero nugget phenomenon (19.02.2011)
Figure 2: An example of an unbound variogram (30.07.2009)
The variogram is an important tool in geostatistics, and the data that is generated from the
variogram model is used subsequently in kriging analysis. Keeping this in mind, it is important
that the variogram is as accurate as possible and is able to represent the physical realities of
the study area. In this regard, the method of geostatistical analysis employed by Rosenbaum et
al, 2012, seems to be the only rigorous and accurate method of estimating the true nugget, and
in the light of the lower quality fits generated in this study, it must be emphasized that an
estimate of the true nugget is indispensable.
Another observation made in the variogram of most of the datasets was a sudden, anomalous
increase in the semivariance for a lag distance of 150 meters.
5.
Figure 3: Increased semivariance at 150 meters lag distance (27.08.2010)
This high value of semivariance was observed to be a consistent phenomenon, but the intensity
of the heightened semivariance differed with time with the difference being prominent during
the dry summer months, and almost nonexistent during the wet winter months. It was proposed
that the cause of this increased semivariance can be attributed to the hillslope length. Since the
valley regions are groundwater influenced, it is expected that these regions continue to remain
wet during the summer months, while the rest of the catchment is relatively dry. Since the 150
meter lag bin (ie: with a lag tolerance of 15 meters) would consist mainly of pairs composed
from valley bottom and hillslope top (very wet and very dry, respectively), we expect this
variance to be particularly high. Further, it is to be noted that the total sill (a measure of overall
variance) during the summer months is comparatively higher than the total sill during the winter
months which suggests that overall variances increase during the summer months.
3.2. Kriging Analysis
As a second step in the geostatistical analysis of the data, kriging analysis was performed in
order to understand the spatial distribution of soil moisture.
As a first step, ordinary kriging was performed. The discretization of the area was done by 10m
X 10m grids in both x and y directions. The estimate at a particular grid was made with data
from all points in the catchment, suitably weighted by the semivariance as obtained from the
variogram model which is the principle of kriging interpolation.
It was evident that the variogram had a very influential role in the final kriging output that was
created. It was observed that the variograms which were unbound and showed very high sills
and ranges, not to mention higher than expected nuggets, showed a distinct amount of
smoothing. These variograms resulted in a kriging map devoid of the expected variability in the
surface soil moisture. Further, it was also observed that the correction that was introduced,
which restricted nuggets to a minimum value of zero, was not physically sound since
subsequent time slices showed marked differences in the kriging map, even though there was
no significant precipitation event.
6.
Figure 4: Kriging map (15.11.2009) Figure 5: Kriging map (16.11.2009)
note the unbound variogram note the zero nugget and
and the high degree of smoothing the high local variability
It is to be noted in the runoffprecipitation graphs in both fig. 4 and fig. 5 that there was no
significant precipitation event during this period, yet the kriging outputs look surprisingly
different. In fig. 4 there is a marked smoothing effect, with all local variability being averaged out
this is due to the unbound variogram which results in an almost equal weightage for all pairs of
points. Fig. 5 shows a high degree of local variability, which can be attributed to the zero nugget
variance. Such a model would give very high weightage to nearby points and very low
weightage to distant points while calculating the interpolated estimate at a point. It is further
important to note that both situations unbound variogram with a high nugget, and a zero
nugget variogram are not representative of the physical realities which can only be closely
matched with an estimate of the true nugget.
In order to improve the estimate at a given point, external drift kriging was performed with a
covariable that was composed of a combination of the wetness index and the soil texture class.
In order to quantify the improvement in the estimate, cross validation was performed by a
resampling routine that compares the estimated value to the measured value. This cross
validation can be used to calculate the average rootmeansquared error (RMSE) for a given
period.
It was observed that external drift kriging performed only marginally better in the crossvalidation
tests. However, the advantage of external drift kriging lies in the reintroduction of local variability
in the data, as seen in fig. 6 (a,b,c).
The increased variability that external drift kriging permits is, in fact, representative of the input
data, since the input data showed certain measurement stations with a moisture content well
above 50 vol.% and well below 30 vol.% both of these values are found to be missing in the
ordinary kriging map of fig. 6(a). It is interesting to note that external drift kriging is able to
preserve the statistics of the data better than ordinary kriging, and this can possibly be tested
with comparisons to stochastic simulations.
8.
As seen in fig. 7, the RMSE for external drift kriging is only marginally better, however it is worth
noting that Famiglietti et al, 1998 observed that other factors like specific contributing area,
porosity and relative elevation appear to be more strongly correlated to the soil water content,
and hence, can be used as a more effective covariable for external drift kriging in future works.
A further, curious observation is the variation of rootmeansquared error with time and with the
average soil water content. It appears that in the first half of the data (prior to 1st
June 2011), the
rootmeansquared error is negatively correlated with the mean soil water content. Beyond 1st
June 2011, however, rootmeansquared error appears to be positively correlated with mean
soil water content.
This relationship is far more evident in fig. 8 which is a plot of the sum of squared errors as a
function of time, with the time series of the mean soil water content for the same period. This
particular plot was smoothed by a onemonth moving average in order to remove the ‘noisy’
nature of fig. 7. This ‘inversion’ of the correlation between the error in the estimation and the
mean soil water content raises the important question of whether such a correlation exists at all.
To answer this question a scatterplot of the sum of squared errors against the mean soil water
content was developed. Fig. 9 is quite similar to the plot of the standard deviation of against θ
the mean soil water content plots in Rosenbaum et al, 2012. This seems to be a further
confirmation of the observations of Vereecken et al, 2007 which suggests that the relationship
between the mean soil water content and the standard deviation of the soil water content is
unimodal.
Figure 8: Time series of the sum of squared errors the plot was smoothed by a onemonth
moving average
10. Previous studies on the soil moisture dynamics in hillslopes (Famiglietti et al, 1998) suggest that
areas with higher topographical curvature tend to have a higher moisture content since they are
prone to water storage due to the natural depression formed in the soil surface. However, the
region circled in red in fig. 10 seems to be anomalous. This region was observed to have a
persistent below average moisture content, even though the topography in this region seems to
suggest that the region ought to have a higher amount of soil moisture. It is evident that there
are other factors at work in this region, which results in the observations being anomalous for
example, the throughfall patterns and the vegetation cover may influence the region to have a
lower soil water content.
However, the importance of this anomaly is the evidence that topography and wetness index
alone cannot be used as reliable covariables. A more rigorous, well correlated parameter must
be introduced to improve the kriging results which rely on a covariable.
4. CONCLUSIONS
In this study, an effort was made to understand the principle geostatistical techniques employed
in the analysis of spatially referenced data. The analysis was performed on soil moisture data
available from the SoilNet wireless sensor network installed in the Wüstebach catchment as part
of the TERENO project.
The study revealed the importance of estimating a true nugget effect by installing paired sensor
nodes in close proximity to each other as was done by Rosenbaum et al, 2012. It was shown
that a poor estimate of geostatistical parameters resulted in a kriging result that was not
representative of the physical realities the kriging output was highly smoothed, with all local
variability being lost. The result also showed that an artificial curtailment of the model
parameters to realistic values does not seem to be a viable solution, since such a curtailment
can cause impossibly dramatic changes in kriging outputs within short spans of time, when
there has been no external changes in the environment in the form of precipitation.
The study also revealed the nature of the relationship between the error associated with kriging
interpolations and the mean soil water content, suggesting that the kriging error is closely
related to the statistical variability in soil water content specifically the standard deviation.
Further studies are required to confirm this conjecture.
Also, the poor performance of wetness index as a covariable for external drift kriging was
highlighted. Although topography plays an important role in the redistribution process of soil
moisture, it was shown that often, the soil water content can be persistently and prominently
different from what is expected at a point based on the topography. As such, it is not surprising
that the wetness index does not perform as an ideal covariable.
11. APPENDIX
In the above study, it was found that the geostatistical analysis produced poor results as far as
recreating the physical realities of the spatial and temporal variability of soil water content in the
Wüstebach catchment. The prime reason for this unsatisfactory result was the lack of an
accurate estimate of the true nugget effect. As mentioned earlier, in the work of Rosenbaum et
al, 2012 the data included a paired sensor configuration at each measuring station, with these
sensors separated by 0.05m. The semivariance produced by these ‘paired’ sensors was
considered to be the nugget.
However, this current study did not include the data from both pairs of sensors at each sampling
location. In order to be able to estimate the true nugget, the entire, unprocessed data set was
obtained from the TERENO database. This ‘raw data’ is recorded at a temporal resolution of
approximately 15 minutes. However, it was found that this raw data included numerous errors,
and extensive preprocessing was found necessary. This section details the initial processing
that the data was subjected to, and possible techniques to be applied subsequently to the data.
Initially, it was found that over half of the nodes (node ID 056 to 109) reported a constant,
erroneous data value for more than half of the time period for which SoilNet has been installed.
Therefore, the ‘processing’ was performed only on half of the time series (ie: from 06.09.2011 to
03.05.2013).
The first step of processing identified situations when one of the paired nodes reported an
erroneous value, and the other reported an acceptable value. Under such a situation, a
correlation plot was developed for a short period of time (as it was found that the long term
correlation was not satisfactory enough to make a prediction).
When a set of erroneous data was found, a correlation plot for data up to 1 month on either
direction of time was generated. A linear relationship was found to be a good model to fit with
the
Figure 11: Correlation between paired nodes for short time periods of up to 1 month
12. (Pearson R=0.97)
Pearson coefficient of correlation being well above 0.90 in most cases.
Once this correction was made, the bigger problem of correlating different nodes was
encountered, in the case where both sensors at one sampling location returned an erroneous
value.
This correction could not be implemented due to time constraints but a basic strategy has been
thought of.
Figure 12: Time series of erroneous readings for all nodes
Fig. 12 shows, as coloured bars, all the periods in time during which both sensors at a particular
node returned erroneous readings. It is disconcerting to note that there is a periodicity in the
errors, and that there are periods during which all nodes, essentially, are returning wrong
values, which implies that no correlation is possible.
However, intermediate periods show only a few nodes which are erroneous, and the technique
of correlation can be used effectively in recovering this data.
It is proposed that a particular node is chosen, based on its performance in a given period, and
all erroneous nodes are correlated to this particular node. As long as the coefficient of
correlation is of an acceptable value, say 0.90, then this is accepted as a fairly well correlated
data set and the predictions are accepted. If a good correlation with the selected node is not
possible, a subsequent node is chosen and the correlation plots are developed once more.
However, it is important to note that such a method may have the associated problem of losing
the local variability, because the correlation, especially for long periods in time, need not hold
true. To overcome this, it is also possible to correlate nodes to the nearest neighbour which
performs satisfactorily in a given period.
13.
ACKNOWLEDGEMENTS
I would like to place on record my sincere thanks and gratitude to Dr. Heye Bogena and Dr.
Michael Herbst, who played instrumental roles in guiding me during my stay at the
Forschungszentrum Jülich. They were extremely approachable and were encouraging at all
stages of the project. I hope I have repaid their investment of time and effort in me in some
small way.
I would also like to thank Dr. Harry Vereecken, who was kind enough to permit my stay at the
institute IBG3.
I am extremely indebted to Bernd Schilling, Inge Wiekenkamp, Nina Gottselig, Dr. Sander
Huisman, Anna Missong, Dr. Thomas Putz, Dr. Lutz Weihermüller, Roland Baatz and the rest of
the team that was part of the soil sampling campaign at Wüstebach who included me into the
IBG3 family through their warm hospitality and jovial camaraderie.
Lastly, my extreme gratitude to the DAAD for providing financially and logistically, since this
experience would not have been possible without their aid.
Arjun Narayanan
14.
REFERENCES
Beven, K.J., Kirkby, M.J. 1979. A physically based variable contributing area model of basin
hydrology. Hydrological Sciences Bulletin, 24:1, 4369, doi:10.1080/02626667909491834.
Bogena, H.R., Herbst, M., Huisman, J.A., Rosenbaum, U., Weuthen, A., Vereecken, H. 2010.
Potential of wireless sensor networks for measuring soil water content variability. Vadose Zone
J. 9:1002–1013 doi:10.2136/vzj2009.0173.
Chen, D., Lu C., Kou, Y., Chen, F. 2007. On detecting spatial outliers. Geoinformatica (2008)
12:455–475 doi10.1007/s1070700700388.
Deutsch, C.V., Journel, A.G. 1998. GSLIB: Geostatistical software library and user’s guide.
Oxford University Press. ISBN 0195100158.
Famiglietti, J.S., Rudnicki J.W., Rodell, M. 1998. Variability in surface moisture content along a
hillslope transect: Rattlesnake hill, Texas. Journal of Hydrology, 210, 259281.
Huaxing, B., Xiaoyin, L., Xin, L., Mengxia, G., Jun, L. 2009. A case study of spatial
heterogeneity of soil moisture in the Loess plateau, western China: A geostatistical approach.
International Journal of Sediment Research ,24 ,63–73.
Ivanov, V.Y., Fatichi, S., Jenerette, G.D., Espeleta, J.F., Troch, P.A., Huxman, T.E. 2010.
Hysteresis of soil moisture spatial heterogeneity and the “homogenizing” effect of vegetation.
Water Resources Research, 46, W09521, doi:10.1029/2009WR008611.
Kou, Y., Lu, C., Chen, D. Spatial weighted outlier detection. SIAM meeting.
Rosenbaum, U., H. Bogena, M. Herbst, J.A. Huisman, T.J. Peterson, A. Weuthen, Vereecken,
H., 2012. Seasonal and event dynamics of spatial soil moisture patterns at the small catchment
scale. Water Resources Research, doi: 10.1029/2011WR011518.
Vereecken, H., T. Kamai, T. Harter, R. Kasteel, J. Hopmans, and J. Vanderborght. 2007.
Explaining soil moisture variability as a function of mean soil moisture: A stochastic unsaturated
flow perspective. Geophysical Research Letters 34:L22402, doi:10.1029/2007GL031813.
Western, A.W., Blöschl, G. 1999. On the spatial scaling of soil moisture. Journal of Hydrology
217 (1999) 203–224.