Shahid Lecture-12- MKAG1273

MAL1303: STATISTICAL HYDROLOGY
Stochastic Methods in Hydrology
Dr. Shamsuddin Shahid
Department of Hydraulics and Hydrology
Faculty of Civil Engineering, Universiti Teknologi Malaysia
Room No.: M46-332; Phone: 07-5531624; Mobile: 0182051586
Email: sshahid@utm.my
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Markov Transition Matrix

For four class, there will be four cumulative distribution functions.
Cumulative distribution functions for each class is calculated as,
Fj (x) = P [next day rainfall < x; when rainfall today belongs to class Cj].
For Example,
FR5(x) = P [next day rainfall < x; when rainfall today belongs to class R5].
Cumulative Distribution Functions

Fj (x) = P [next day rainfall < x;
when rainfall today belongs to class Cj].
For Example:
FR5(x) = P [next day rainfall < x;
when rainfall today belongs to class R5].
P [next day rainfall < 5] = 2
Rainfall
10
5
1
6
23
4
3
2
0
20
5
2
3
0
4
3
1
0

FR5(x) = P [next day rainfall < x;
when rainfall today belongs to class R5].
Find the distribution
and distribution
parameters.
Consider, we found
distribution is
exponential,
FR5(x) =  exp (-)
Where,
 = 0.105

Calculate Daily Monsoon Rainfall
First, we need to define the initial condition.
Consider, Initial condition
R5 --- R10 --- R20 --- R>20
(1/4) (1/4) (1/4) (1/4)

(1/4) (1/4) (1/4) (1/4)
[0.25 0.25 0.25 0.25] X
0.39 0.21 0.27 0.14

R5 R10 R20 R>20
0.39 0.21 0.27 0.14
FR5(x) =  exp (-x)
Where,
= 0.105
Cumulative Distribution,
1 -  exp (-x)
Rainfall in Day1 (x) =
0.39 = 1 -0.105exp(-0.105x)

0.39 0.21 0.27 0.14 X
0.41 0.24 0.24 0.11

General equation is,
u(n) = u Pn
Or
u(n) = u(n-1) P
0.39 0.21 0.27 0.14 X

Stochastic refers to systems whose behaviour is intrinsically non-
deterministic. A stochastic process is one whose behavior is non-
deterministic, in that a system's subsequent state is determined
both by the process's predictable actions and by a random element.
Stochastic hydrology is mainly concerned with the assessment of
uncertainty in model predictions
Stochastic Process

Application of Stochastic Process in Hydrology
Stochastic hydrology is an essential base of water resources
systems analysis, due to the inherent randomness of the input,
and consequently of the results.
Stochastic process is applied for forecasting of hydrological
phenomena such as, flood, droughts, etc.
Stochastic process is applied for forecasting rainfall, river
discharge, etc.
Stochastic hydrology is very important in decision-making
process regarding the planning and management of water
systems.

A stationary time series is one whose statistical properties such as
mean, variance, autocorrelation, etc. are all constant over time.
Most statistical forecasting methods are based on the assumption
that the time series can be rendered approximately stationary through
the use of mathematical transformations.
A stationarized series is relatively easy to predict: you simply predict
that its statistical properties will be the same in the future as they
have been in the past.
Stationary Time Series

Linear Stochastic Models
1. Moving Average (MA)
2. Auto Regression (AR)
3. Auto Regressive Moving Average (ARMA)
4. Auto Regressive Integrated Moving Average (ARIMA)
Stochastic Models

Moving Average
The concept underlying moving average is that the k most recent
time periods is a good predictor of the current and next period
values.
The process is called moving averages because each average is
calculated by dropping the oldest observation and including the
next observation.

• The moving average removes some of the non-randomness in the data.
• Therefore, the moving average merely smooth the fluctuations in the
data.
• The moving average technique is a good forecasting approach to use if
the data is stationary.
k
Y....YYYY
F kttttt
t
1321
1




Where, Ft+1 is the forecast for period t+1, and
Yt is the actual value of period t
Moving Average

Moving Average
48.0
59.0
69.3
68.0
67.3
59.0
51.0
41.0
30.7
31.0
30.7
39.0
49.0
61.0
68.3
68.0
65.3
59.0

Moving Average
48.0
59.0
69.3 53.5
68.0 64.2
67.3 68.7
59.0 67.7
51.0 63.1
41.0 55.0
30.7 46.0
31.0 35.8
30.7 30.8
39.0 30.8
49.0 34.9
61.0 44.0
68.3 55.0
68.0 64.7
65.3 68.2
59.0 66.7

Moving Average
k
Y....YYYY
L kttttt'
t
1321  

Moving Average, Lt
48.0
59.0 53.5
69.3 64.2
68.0 68.7
67.3 67.7
59.0 63.1
51.0 55.0
41.0 46.0
30.7 35.8
31.0 30.8
30.7 30.8
39.0 34.9
49.0 44.0
61.0 55.0
68.3 64.7
68.0 68.2
65.3 66.7
59.0 62.1

Double Moving Average
k
L....LLLL
L
'
kt
'
t
'
t
'
t
'
t"
t
1321  
48.0
59.0 53.5
69.3 64.2 58.8
68.0 68.7 66.4
67.3 67.7 68.2
59.0 63.1 65.4
51.0 55.0 59.1
41.0 46.0 50.5
30.7 35.8 40.9
31.0 30.8 33.3
30.7 30.8 30.8
39.0 34.9 32.8
49.0 44.0 39.4
61.0 55.0 49.5
68.3 64.7 59.8
68.0 68.2 66.4
65.3 66.7 67.4
59.0 62.1 64.4

Difference between Actual value and first moving average is called Lag1.
Second Lag or Lag2 can be calculated as,
/
k
t
/
t LLlag





 


2
12
For example, if first moving average is calculate for K=3, then
/
t
/
t
/
t
/
t LLLLlag 1
2
132 





 



Data Forecast Error MA Lag1 Lag2
10.0
12.0 11.0
14.0 11.0 3.0 13.0 1.0 2.0
16.0 13.0 3.0 15.0 1.0 2.0
18.0 15.0 3.0 17.0 1.0 2.0
20.0 17.0 3.0 19.0 1.0 2.0
22.0 19.0 3.0 21.0 1.0 2.0
24.0 21.0 3.0 23.0 1.0 2.0
26.0 23.0 3.0 25.0 1.0 2.0
28.0 25.0 3.0 27.0 1.0 2.0
30.0 27.0 3.0 29.0 1.0 2.0
32.0 29.0 3.0 31.0 1.0 2.0
34.0 31.0 3.0 33.0 1.0 2.0
36.0 33.0 3.0 35.0 1.0 2.0
38.0 35.0 3.0 37.0 1.0 2.0
40.0 37.0 3.0 39.0 1.0 2.0
42.0 39.0 3.0 41.0 1.0 2.0
44.0 41.0 3.0 43.0 1.0 2.0
For constant trend, the
error is contact.
Double moving average is
used to remove the
constant trend.
Error is the sum of lag1
and lag2.
Therefore,
211 laglagMAFt 

Double Moving Average: Forecasting
Double moving average can be used for forecasting using following
formulas:
mbaF ttt 1
Where,
 //
t
/
tt
//
t
/
t
/
tt
LL
k
b
and
]LL[La




1
2

Data L'(t) L"(t) Lag2 Trend Forecast Error
10.0
12.0 11.0
14.0 13.0 12.0 1.0 2.0
16.0 15.0 14.0 1.0 2.0 16.0 0.0
18.0 17.0 16.0 1.0 2.0 18.0 0.0
20.0 19.0 18.0 1.0 2.0 20.0 0.0
22.0 21.0 20.0 1.0 2.0 22.0 0.0
24.0 23.0 22.0 1.0 2.0 24.0 0.0
26.0 25.0 24.0 1.0 2.0 26.0 0.0
28.0 27.0 26.0 1.0 2.0 28.0 0.0
30.0 29.0 28.0 1.0 2.0 30.0 0.0
32.0 31.0 30.0 1.0 2.0 32.0 0.0
34.0 33.0 32.0 1.0 2.0 34.0 0.0
36.0 35.0 34.0 1.0 2.0 36.0 0.0
38.0 37.0 36.0 1.0 2.0 38.0 0.0
40.0 39.0 38.0 1.0 2.0 40.0 0.0
42.0 41.0 40.0 1.0 2.0 42.0 0.0
44.0 43.0 42.0 1.0 2.0 44.0 0.0
 //
t
/
tt
//
t
/
t
/
tt
LL
k
b
and
]LL[La




1
2
ttt baF 1

Data L'(t) L"(t) Lag2 Trend Forecast
10
11
13
16
18
21
22 15.9
25 18.0
27 20.3
28 22.4
30 24.4
31 26.3
35 28.3 22.2 6.1 2.0
36 30.3 24.3 6.0 2.0 36.4
38 32.1 26.3 5.8 1.9 38.3
39 33.9 28.2 5.6 1.9 39.9
43 36.0 30.2 5.8 1.9 41.3
44 38.0 32.1 5.9 2.0 43.8
47 40.3 34.1 6.2 2.1 45.8
48 42.1 36.1 6.0 2.0 48.5
50 44.1 38.1 6.1 2.0 50.2
51 46.0 40.1 5.9 2.0 52.2
54 48.1 42.1 6.0 2.0 53.9

Autocorrelation
Autocorrelation is the correlation of a series with itself. This is
unlike cross-correlation, which is the correlation of two different
series.
Autocorrelation is useful for finding repeating patterns in a time
series, such as determining the presence of a periodic signal or
cycle.

Autocorrelation
t = 1

Autocorrelation
t = 3

Autocorrelation
t = 1; r = 0.9
t = 3; r = 0.5
t = 5; r = 0.0

Autocorrelation
t = 0 or t=20; r = 1.0
t = 15; r = 0.0
t = 10; r = -1.0

Autocorrelation

Autocorrelation
Test for significance of autocorrelation coefficient:
Where,
t is the lag
r is the autocorrelation coefficient at that lag, and
n is the number of observation

Autocorrelation
Hypothesis Testing:
H0: r is attributable to randomness. No cycle present in the time
series.
HA: A cycle present in the time series.
If the calculated value of Z > 1.96
Null hypothesis rejected

Overall Significance: Ljung-Box Statistics
Null hypothesis: At least one correlation is non-zero.
Test for significance of autocorrelation coefficient:
Where,
h is the number of autocorrelation coefficients being tested.
r is the autocorrelation coefficient at that lag, and
n is the number of observation
If, Qh > 2 (0.05, h), Null hypothesis is rejected.
 



h
k
kh rkn)n(nQ
1
21
2

10.0
11.5
10.0
16.5
11.0
12.5
14.0
14.5
16.0
14.5
21.0
15.5
15.0
16.5
17.0
20.5
18.0
25.5
18.0
17.5
20.0
20.5
24.0
Auto Regression (AR)

10.0
11.5
10.0
16.5
11.0
12.5
14.0
14.5
16.0
14.5
21.0
15.5
15.0
16.5
17.0
20.5
18.0
25.5
18.0
17.5
20.0
20.5
24.0
10
11
9
15
9
10
11
11
12
10
16
10
9
10
10
13
10
17
9
8
10
10
13
Trend = 0.5
xdt = x – (rank x Trend)
= 10 – (0 x 0.5) = 10
=11.5 - (1 x 0.5) = 11

10
11
9
15
9
10
11
11
12
10
16
10
9
10
10
13
10
17
9
8
10
10
13

10
11
9
15
9
10
11
11
12
10
16
10
9
10
10
13
10
17
9
8
10
10
13
lag-1 -0.34061
lag-2 -0.01525
lag-3 -0.14931
lag-4 -0.15717
lag-5 0.0482
lag-6 -0.30402
lag-7 0.940332
lag-8 -0.28836
lag-9 -0.10714

 



h
k
kh rkn)n(nQ
1
21
2
h = 9.
r is the autocorrelation coefficient at that lag
n = 23
Null hypothesis: At least one correlation is non-zero.
Qh = 42.59
2 (0.05, h) = 16.92
Qh > 2 , Reject H0
At least one correlation
is non-zero.
lag-1 -0.34061
lag-2 -0.01525
lag-3 -0.14931
lag-4 -0.15717
lag-5 0.0482
lag-6 -0.30402
lag-7 0.940332
lag-8 -0.28836
lag-9 -0.10714

Confidence interval of correlogram,
Z(/2)/n
Z at p = 0.05 = 1.96
n = 23
Z(/2)/n = 0.408
Lag = 7

Yt = 0.778Yt-7 + 2.337

10
11
9
15
9
10
11
11
12
10
16
10
9
10
10
13
10
17
9
8
10
10
13
10.12
15.56
9.34
8.56
10.12
10.12
12.45
10.21
14.45
9.60
9.00
10.21
10.21
12.02
10.28
13.58
9.81
9.34
10.28
10.28
11.69
Yt = 0.778Yt-7 + 2.337

48
59.00365
69.32472
68
67.31207
58.98174
50.97471
40.99635
30.67528
31
30.68793
39.01826
49.02529
61.00365
68.32472
68
65.31207
58.98174
48.97471
-
-

48
59.00365
69.32472
68
67.31207
58.98174
50.97471
40.99635
30.67528
31
30.68793
39.01826
49.02529
61.00365
68.32472
68
65.31207
58.98174
48.97471
-
-11/23/2015 Shamsuddin Shahid, FKA, UTM

Confidence interval of correlogram,
Z(/2)/n
Z at p = 0.05 = 1.96
n = 73
Z(/2)/n = 0.2294
Lag = 1, 2, 3, 5, 6, 7, 8, 9

Y(t) Y(t-1) Y(t-2) Y(t-3) Y(t-5) Y(t-6) Y(t-7) Y(t-8) Y(t-9)
31.00 30.68 41.00 50.97 67.31 68.00 69.32 59.00 48.00
30.69 31.00 30.68 41.00 58.98 67.31 68.00 69.32 59.00
39.02 30.69 31.00 30.68 50.97 58.98 67.31 68.00 69.32
49.03 39.02 30.69 31.00 41.00 50.97 58.98 67.31 68.00
61.00 49.03 39.02 30.69 30.68 41.00 50.97 58.98 67.31
68.32 61.00 49.03 39.02 31.00 30.68 41.00 50.97 58.98
68.00 68.32 61.00 49.03 30.69 31.00 30.68 41.00 50.97
65.31 68.00 68.32 61.00 39.02 30.69 31.00 30.68 41.00
58.98 65.31 68.00 68.32 49.03 39.02 30.69 31.00 30.68
48.97 58.98 65.31 68.00 61.00 49.03 39.02 30.69 31.00
38.00 48.97 58.98 65.31 68.32 61.00 49.03 39.02 30.69
32.68 38.00 48.97 58.98 68.00 68.32 61.00 49.03 39.02
32.00 32.68 38.00 48.97 65.31 68.00 68.32 61.00 49.03
33.69 32.00 32.68 38.00 58.98 65.31 68.00 68.32 61.00
41.02 33.69 32.00 32.68 48.97 58.98 65.31 68.00 68.32
51.03 41.02 33.69 32.00 38.00 48.97 58.98 65.31 68.00
58.00 51.03 41.02 33.69 32.68 38.00 48.97 58.98 65.31
69.32 58.00 51.03 41.02 32.00 32.68 38.00 48.97 58.98
70.00 69.32 58.00 51.03 33.69 32.00 32.68 38.00 48.97
65.31 70.00 69.32 58.00 41.02 33.69 32.00 32.68 38.00
- - - - - - - - -
- - - - - - - - -
48
59.00365
69.32472
68
67.31207
58.98174
50.97471
40.99635
30.67528
31
30.68793
39.01826
49.02529
61.00365
68.32472
68
65.31207
58.98174
48.97471
-
-

98877665543322110  

tttttttt
t
YbYbYbYbYbYbYbYbb
Y
48
59.00365
69.32472
68
67.31207
58.98174
50.97471
40.99635
30.67528
31
30.68793
39.01826
49.02529
61.00365
68.32472
68
65.31207
58.98174
48.97471
-
-11/23/2015 Shamsuddin Shahid, FKA, UTM

Autocorrelation
Limitations of Autocorrelation:
1. The observations must be regularly spaced through time.
2. Any linear trend in the data should be removed in advance. Linear
trends will cause a gradual decline in peaks on the
autocorrelogram with increasing lag.
3. In order for there to be sufficient comparisons in the calculation of
the coefficient, the rules of thumb are: (a) there should be at least
50 observations in the time series; and (b) the lag should not
exceed n/4
4. Significantly high r values at small lags may not reflect cyclicity but
just smoothness in the data.
5. Although significantly negative Z values are possible, these are
not important as they correspond to negative autocorrelation,
themselves due to peak-trough correspondences in the data;
these will inevitably occur in association with high positive(peak-
peak; trough-trough) autocorrelations and offer no additional
information.

Autoregressive Moving-Average (ARMA) models form a class of
linear time series models.
ARMA is a combination of AR and MA
Autoregressive Moving-Average (ARMA) =
Auto-Regression (AR) + Moving Average (MA)
Auto Regressive Moving Average (ARMA)

eYb.......YbYbYbbY LktLtLtLtt   83322110
LktkLtLtLtt eb.......ebebebbY   3322110
Auto-Regression (AR)
The error term is calculated from Moving average.

10
11
9
15
9
10
11
11
12
10
16
10
9
10
10
13
10
17
9
8
10
10
13
10.12
15.56
9.34
8.56
10.12
10.12
12.45
10.21
14.45
9.60
9.00
10.21
10.21
12.02
10.28
13.58
9.81
9.34
10.28
10.28
11.69
Yt = 0.778Yt-7 + 2.337
Auto Regressive Moving Average (ARMA

Data L'(t) L"(t) Lag2
10
11
9
15
9
10
11
11 10.75
12 11
10 10.875
16 11.75 1 0.766
10 11.125 0.125 -0.152
9 11.125 0.25 -0.305
10 11.125 -0.625 -0.152
10 11 -0.125 -0.152
13 11.25 0.125 0.307
10 11 -0.125 -0.152
17 11.875 0.875 0.919
9 11 -0.25 -0.305
8 10.75 -0.25 -0.458
10 10.875 -1 -0.152
10 10.875 -0.125 -0.152
13 11.25 0.5 0.307

Data Lag
16 0.766
10 -0.152
9 -0.305
10 -0.152
10 -0.152
13 0.307
10 -0.152
17 0.919
9 -0.305
8 -0.458
10 -0.152
10 -0.152
13 0.307

10
11
9
15
9
10
11
11
12
10
16
10
9
10
10
13
10
17
9
8
10
10
13
10.12
15.56
9.34
8.56
10.12
10.12
12.45
10.21
14.45
9.60
9.00
10.21
10.21
12.02
10.28
13.58
9.81
9.34
10.28
10.28
11.69
10.25
14.86
9.59
8.93
10.25
10.25
12.23
10.33
13.92
9.82
9.30
10.33
10.33
11.87
10.39
13.18
9.99
9.59
10.39
10.39
11.58

ARMA
10.25
14.86
9.59
8.93
10.25
10.25
12.23
10.33
13.92
9.82
9.30
10.33
10.33
11.87
10.39
13.18
9.99
9.59
10.39
10.39
11.58
10
11
9
15
9
10
11
11
12
10
16
10
9
10
10
13
10
17
9
8
10
10
13
AR
10.12
15.56
9.34
8.56
10.12
10.12
12.45
10.21
14.45
9.60
9.00
10.21
10.21
12.02
10.28
13.58
9.81
9.34
10.28
10.28
11.69

10
11
9
15
9
10
11
11
12
10
16
10
9
10
10
13
10
17
9
8
10
10
13
ARMA
10.25
14.86
9.59
8.93
10.25
10.25
12.23
10.33
13.92
9.82
9.30
10.33
10.33
11.87
10.39
13.18
9.99
9.59
10.39
10.39
11.58

21.75
26.86
22.09
21.93
23.75
24.25
26.73
25.33
29.42
25.82
25.80
27.33
27.83
29.87
28.89
32.18
29.49
29.59
30.89
31.39
33.08
10.0
11.5
10.0
16.5
11.0
12.5
14.0
14.5
16.0
14.5
21.0
15.5
15.0
16.5
17.0
20.5
18.0
25.5
18.0
17.5
20.0
20.5
24.0

Non-stationary Time Series
The models are applicable to stationary time series only.
If the parameters like autocorrelation varies with time, these
models can not be used

Auto Regressive Integrated Moving Average (ARIMA)
Most naturally-occurring time series in hydrology are not at all stationary
(at least when plotted in their original units). Instead they exhibit various
kinds of trends, cycles, and seasonal patterns.

• The best strategy may not be to try to directly predict the level of the series
at each period.
• Instead, it may be better to try to predict the change that occurs from one
period to the next (i.e., the quantity Y(t)-Y(t-1)).
• In other words, it may be helpful to look at the first difference of the series,
to see if a predictable pattern can be discerned there.
• For practical purposes, it is just as good to predict the next change as to
predict the next level of the series, since the predicted change can always be
added to the current level to yield a predicted level
Differencing

The seasonal difference of a time series is the series of changes from one
season to the next. For monthly data, in which there are 4 seasons, the
seasonal difference of Y at period t is Y(t)-Y(t-4).
The first difference of the seasonal difference of a monthly time series Y at
period t is equal to (Y(t) - Y(t-4)) - (Y(t-1) - Y(t-5). Equivalently, it is equal to
(Y(t) - Y(t-1)) - (Y(t-4) - Y(t-5)).
Seasonal Differencing

Several approaches are there to identify, measure and remote the trend
and seasonal components of the time series data.
One of the easiest and most common method is differencing.
The first difference,
Y’t = Yt – Yt-1
is one way to ca capture and remove the effect of the trend.
Seasonal Differencing

ARIMA models are, in theory, the most general class of models for
forecasting a time series which can be stationarized by
transformations such as differencing and logging.
A ARIMA model is classified as an ARIMA(p,d,q) model, where:
p is the number of autoregressive terms,
d is the number of nonseasonal differences, and
q is the number of lagged forecast errors in the prediction equation.
ARIMA(1,1,1)
ARIMA(1,0,1)
ARIMA(2,1,2)

Box-Jenkins methodology.
1. Model Selection
2. Parameter Estimations
3. Model Checking
Many cases it is a iterative processes.

Shahid Lecture-12- MKAG1273

More Related Content

What's hot

Viewers also liked

Similar to Shahid Lecture-12- MKAG1273

Recently uploaded

Shahid Lecture-12- MKAG1273