1
DATA DRIVEN AND HYBRID TECHNIQUES TO
PREDICT THE RESERVOIR INFLOWS:
Supervisor:
Dr.Dimitri P. Solomantine
Mentor:
Dr. Gerald Corzo
Daniel Alejandro Vázquez Bado
Outline
 Problem definition
 Objectives
 Practical Value
 Innovation
 Methodology
 Results & Discussions
 Work Plan
2
33
Study Area
BACKGROUND (1) Some Characteristis
The drainage area is 820,000 km2of
which
70% are in the upper basin,
strongly regularized by more
than 30 dams
the remaining 30%, is known
as incremental basin of Itaipu
Incremental basin area is 235,000
km2
Main rivers in this area are the Ivaí,
Piquiri and Iguazu rivers in Brazil and
Monday, Acaray rivers in Paraguay .
PROBLEM DEFINITION
Current State
o It is stated by the OPSH.DT (Hydrological Division of
Itaipu) the presence of a calibrated hydrological model,
however this model was never operational, probably
because of undesired performance
o Hydrological previsions are based mainly on the application
of fuzzy logic or experience/expert knowledge, however
due to the scale and spatial variability (rainfall mainly) of
the catchment is difficult to provide reliable hydrological
forecasts.
4
OBJECTIVES OF THE RESEARCH (1)
Main Objective
o The main objective of this research is to build a superior
inflow forecast model for the incremental basin of Itaipu,
superior in terms of performance (accuracy, generalization,
etc) for short and medium term predictions, giving
predictions.
5
OBJECTIVES OF THE RESEARCH (2)
Specific Objectives
o To develop the following models with the objective of
proving the hypothesis of a superior hybrid model
performance
Model A: Creation of semi-distributed hydrological
model for the incremental basin of Itaipu.
Model B: Creation of traditional artificial neural
network for the incremental basin of Itaipu.
Model C: Develop a hybrid model by using an
appropriate combination of sub-components, explore
data pre-processing methods for such analysis.
6
OBJECTIVES OF THE RESEARCH (3)
Specific Objectives
o Develop an internet-based (with the previous models
couple and automated) to dissertate results and allow the
model to be tested by many users, also intended for
possible real time application.
7
PRACTICAL VALUE & INNOVATIONS
o Important positive impact for the OPSH.DT, an additional
tool to help improving the hydrological predictions.
o GIU is going to be provided for the user/operator and
results are going to be showed on the web (internet
based).
o This platform also allows to be rigorously tested by many
users and therefore will facilitate the acceptance of such
proposed model.
8
9
RESEARCH METHODOLOGY:
Meteo Data Hydro Da
Data
Validation
ANN model
Optimize structure
Hybrid Model
Pre-process
Inputs
Combine
ANN/Wavelets
Optimize structure & noise
filtering techniques
Daily
Forecast
Monthly
forecast
Show hydrological predictions
for both models
Evaluate models and select the best
10
ANN
-Discharge time series (t)
-Precip(t+1)
-Precip(t)
-Precip(t+1)
to be optimized
-Discharge time series (t+1)
Regular ANN network components
APPROACH 1: ANN
Hidden layerInput layer Output layer
Figure 4.5 Schematization of a typical ANN structure
11
WAVELET
ANN
-Discharge decomposition (t)
-Precip dec(t+1)
-Precip dec(t)
-Precip dec(t+1)
to be optimized
-Discharge time series (t+1)
APPROACH 2: WAVELETS DECOMPOSITIONS AS
ANN’S INPUTS
Hidden layerInput layer Output layer
Figure 4.5 Schematization of the coupled Wavelet/ANN structure
12
APPROACH 3: IMPROVED WAVELET ANN
3
2
1
segment 1
segment 2
Br


original signal
de-noised signal---------------
---------------
Brief analysis:
 segment 1: if we observe the original signal
along this segment [140, 260], discharge is
increasing as a rainfall response somewhere,
so errors are more prone to occur during these
period and they are going to be higher in
magnitude, this is observed in the detail part
of the signal, small values are considered as
noise and they are going to be removed by
the threshold criteria. (obs: high values in
details do not mean high error or high noise)
 segment2: during low flows and no flows
egment 1
higher reduction of noises
along the segment
original signal
denoised signal---------------
---------------
t 2
along this segment [140, 260], discharge is
increasing as a rainfall response somewhere,
so errors are more prone to occur during these
period and they are going to be higher in
magnitude, this is observed in the detail part
of the signal, small values are considered as
noise and they are going to be removed by
the threshold criteria. (obs: high values in
details do not mean high error or high noise)
 segment2: during low flows and no flows
noises are going to be smaller and they are
also reflected in the details, the threshold
criteria is not finding errors or noise to
suppress.
low or zero reduction of
noises along the segment
2nd segment
14
RESULTS (1) COMPARISON BETWEEN ANN
& WANN (BAGMATI CASE ANALYSIS)
Training Verification
NSC: 0.8749 0.8199
Cor: 0.9355 0.9159
RMSE: 80.7868 142.6434
PERS: 0.7321 0.6562
Table 4.1 Performance of the regular Artificial Neural
Network
Training Verification
NSC: 0.96 0.96
Cor: 0.98 0.98
RMSE: 48.37 68.48
PERS: 0.92 0.9
Table 4.2 Performance of the Wavelet Neural
Network (db4 family, decomposition level 2)
Figure 4.6 WANN verification data set performance for different
decomposition levels and wavelet families.
15
Figure 4.8 Sensitivity analysis of No of epochs Figure 4.8 Efficiency in terms of time of both models
Figure 4.7 Sensitivity analysis of No of Hidden Nodes
RESULTS (2):
SENSITIVITY ANALYSIS
OF ANN STRUCTURE
16
WAVELETS IN OTHER BLACK BOX MODELS (1)
Figure 5.1 Improvement in performance for different DDMs when coupling
with neural network
17
WAVELETS IN OTHER BLACK BOX MODELS (2)
Figure 5.2 Improvement in performance for different DDMs when coupling
with neural network
18
OTHER ALTERNATIVES WITH WAVELETS
o Signal Denoising
osoft thresholding: all coefficients below a
certain threshold are set to zero and those
whose magnitude are greater than the
threshold are shrunk by that amount, this
method recognizes that the coefficients
contain in both signal and noise, and
attempts to isolate them.
ohard thresholding: retains just the values
above the threshold
some methods to be studied
othreshold selection rule based on
Stein's Unbiased Estimate of Risk
Minimax, uses a fixed threshold
chosen to yield minimax
performance for mean square error
against an ideal procedure
Use denoised
signal as
input for
the DDM
19
Station Name City River
Área
km2
Type
Zero de la
Coordenadas
Planas
Regla (m) Este Norte
Porto São
José
São Pedro do
Paraná
Río Paraná 674 P / F STH 231.958 790,021.51 7,484,951.80
Novo Balsa
Santa Maria
Palotina Río Piquiri 21 P / F STH 234.741 830,869.84 7,320,810.80
Porto Paraíso
do Norte
Rondon Río Ivai 28,427 P / F STH 246.302 943,480.73 7,413,841.81
Porto Caiuá Naviraí Río Paraná 717,814 P / F STH 226.851 836,342.48 7,423,106.03
Porto
Ivinhema
Ivinhema
Río
Ivinhema
31,905 P / F STH 242.167 857,165.60 7,520,737.35
Flórida Juti
Río
Amambay
7,252 P / F STH 254.186 749,764.61 7,457,688.68
Estrada do
Iguatemi
Iguatemi
Río
Iguatemi
6,832 P / F STH 235.709 747,067.57 7,373,368.24
Novo Porto
Taquara
Santa Isabel
do Ivaí
Río Ivai 34,432 P / F STH 229.955 877,189.34 7,429,826.65
Novo Porto 2 Nova Aurora Río Piquiri 12,124 P / F STH 280.715 889,646.85 7,298,604.28
Balsa do
Cantu
Altamira do
Paraná
Río Cantú 2,513 P / F STH 333 934,773.82 7,255,920.82
Ubiratã Ubiratã P STH --- 906,741.30 7,278,572.93
Palmital Palmital P STH --- 984,908.22 7,239,282.54
Ubá do Sul Lidianópolis Río Ivaí 12,701 P / F STH 361.554 1,047,174.60 7,330,641.25
Tereza
Cristina
Candido de
Abreu
Río Ivaí 3,572 P / F STH 473.502 1,092,592.18 7,241,116.07
Marquinho Marquinho P STH --- 978,091.57 7,214,274.97
Barbosa
Ferraz
Barbosa
Ferraz
Río
Curumbata
y
3,294 P / F STH 303.233
1,014,277.34 7,334,612.33
Manoel Ribas Manoel Ribas P STH --- 1,040,917.16 7,278,492.04
Porto
Guarani
Altamira do
Paraná Río Piquiri 4,233
P / F STH
338.097
928,071.69 7,243,269.01
µ 82,000 0 82,00041,000 Meters
DATA COLLETION: CASE STUDY - INCREMENTAL
BASIN OF ITAIPU
20
QUALITY REQUIREMENTS: QUANTIFICATION OF
DATA GAPS
Figure 6.1 & Figure 6.2 Temporal availability of data for all available
stations
21
"Rejected"
QUALITY REQUIREMENTS: CONSISTENCY ANALYSIS (1)
Figure 6.3 Double mass curve consistency test for stations Santa Maria and
Estrada do Iguatemi
22
QUALITY REQUIREMENTS: CONSISTENCY ANALYSIS (2)
23
QUALITY REQUIREMENTS: CONSISTENCY ANALYSIS (3)
24
X
X
X
X
X X
X
X Rejected
X
Rio Amambay
Rio Piquiri
Rio Ivai
Rio Iguatemi
RioParana
RioIvinhema
SELECTED STATIONS
Figure 6.4 Summary of the selected and rejected stations
25
INFILLING METHODS:
o Inverse Weighted Distance (IWD)
o PSD Daily Gridded Precipitation Observations (Science
Division, Earth System Research Laboratory)
o PSD Daily Gridded Precipitation Observations unbiased
o Power law transformation 𝑃∗
= 𝑎𝑃 𝑏
Figure 6.8 Infilling Methods performance
26
Figure 6.8 Time series generated by the infilling procedures used compared
with the ground station
NEXT STEPS
27
o Make use of filled precipitation data as input for the
proposed forecast models in addition with the flow records
of the available stations. (Itaipu case study)
o Daily forecast
o Monthly forecast
o Seasonal forecast
o Develop an internet based model in order to dissertate the
results for other users to test
28
LIMITATIONS:
o Although wavelet decomposition helps to increase model
performance it also increases the computational time
needed to run the model
o Rainfall network is considered poor, therefore data
imputations (infilling of gaps) are also inaccurate,
alternatives such as remote sensing can be tested and
then evaluated.
o The river is highly regulated, as a result the system
becomes highly nonlinear and consequently harder to
forecast

Test

  • 1.
    1 DATA DRIVEN ANDHYBRID TECHNIQUES TO PREDICT THE RESERVOIR INFLOWS: Supervisor: Dr.Dimitri P. Solomantine Mentor: Dr. Gerald Corzo Daniel Alejandro Vázquez Bado
  • 2.
    Outline  Problem definition Objectives  Practical Value  Innovation  Methodology  Results & Discussions  Work Plan 2
  • 3.
    33 Study Area BACKGROUND (1)Some Characteristis The drainage area is 820,000 km2of which 70% are in the upper basin, strongly regularized by more than 30 dams the remaining 30%, is known as incremental basin of Itaipu Incremental basin area is 235,000 km2 Main rivers in this area are the Ivaí, Piquiri and Iguazu rivers in Brazil and Monday, Acaray rivers in Paraguay .
  • 4.
    PROBLEM DEFINITION Current State oIt is stated by the OPSH.DT (Hydrological Division of Itaipu) the presence of a calibrated hydrological model, however this model was never operational, probably because of undesired performance o Hydrological previsions are based mainly on the application of fuzzy logic or experience/expert knowledge, however due to the scale and spatial variability (rainfall mainly) of the catchment is difficult to provide reliable hydrological forecasts. 4
  • 5.
    OBJECTIVES OF THERESEARCH (1) Main Objective o The main objective of this research is to build a superior inflow forecast model for the incremental basin of Itaipu, superior in terms of performance (accuracy, generalization, etc) for short and medium term predictions, giving predictions. 5
  • 6.
    OBJECTIVES OF THERESEARCH (2) Specific Objectives o To develop the following models with the objective of proving the hypothesis of a superior hybrid model performance Model A: Creation of semi-distributed hydrological model for the incremental basin of Itaipu. Model B: Creation of traditional artificial neural network for the incremental basin of Itaipu. Model C: Develop a hybrid model by using an appropriate combination of sub-components, explore data pre-processing methods for such analysis. 6
  • 7.
    OBJECTIVES OF THERESEARCH (3) Specific Objectives o Develop an internet-based (with the previous models couple and automated) to dissertate results and allow the model to be tested by many users, also intended for possible real time application. 7
  • 8.
    PRACTICAL VALUE &INNOVATIONS o Important positive impact for the OPSH.DT, an additional tool to help improving the hydrological predictions. o GIU is going to be provided for the user/operator and results are going to be showed on the web (internet based). o This platform also allows to be rigorously tested by many users and therefore will facilitate the acceptance of such proposed model. 8
  • 9.
    9 RESEARCH METHODOLOGY: Meteo DataHydro Da Data Validation ANN model Optimize structure Hybrid Model Pre-process Inputs Combine ANN/Wavelets Optimize structure & noise filtering techniques Daily Forecast Monthly forecast Show hydrological predictions for both models Evaluate models and select the best
  • 10.
    10 ANN -Discharge time series(t) -Precip(t+1) -Precip(t) -Precip(t+1) to be optimized -Discharge time series (t+1) Regular ANN network components APPROACH 1: ANN Hidden layerInput layer Output layer Figure 4.5 Schematization of a typical ANN structure
  • 11.
    11 WAVELET ANN -Discharge decomposition (t) -Precipdec(t+1) -Precip dec(t) -Precip dec(t+1) to be optimized -Discharge time series (t+1) APPROACH 2: WAVELETS DECOMPOSITIONS AS ANN’S INPUTS Hidden layerInput layer Output layer Figure 4.5 Schematization of the coupled Wavelet/ANN structure
  • 12.
  • 13.
    3 2 1 segment 1 segment 2 Br   originalsignal de-noised signal--------------- --------------- Brief analysis:  segment 1: if we observe the original signal along this segment [140, 260], discharge is increasing as a rainfall response somewhere, so errors are more prone to occur during these period and they are going to be higher in magnitude, this is observed in the detail part of the signal, small values are considered as noise and they are going to be removed by the threshold criteria. (obs: high values in details do not mean high error or high noise)  segment2: during low flows and no flows egment 1 higher reduction of noises along the segment original signal denoised signal--------------- --------------- t 2 along this segment [140, 260], discharge is increasing as a rainfall response somewhere, so errors are more prone to occur during these period and they are going to be higher in magnitude, this is observed in the detail part of the signal, small values are considered as noise and they are going to be removed by the threshold criteria. (obs: high values in details do not mean high error or high noise)  segment2: during low flows and no flows noises are going to be smaller and they are also reflected in the details, the threshold criteria is not finding errors or noise to suppress. low or zero reduction of noises along the segment 2nd segment
  • 14.
    14 RESULTS (1) COMPARISONBETWEEN ANN & WANN (BAGMATI CASE ANALYSIS) Training Verification NSC: 0.8749 0.8199 Cor: 0.9355 0.9159 RMSE: 80.7868 142.6434 PERS: 0.7321 0.6562 Table 4.1 Performance of the regular Artificial Neural Network Training Verification NSC: 0.96 0.96 Cor: 0.98 0.98 RMSE: 48.37 68.48 PERS: 0.92 0.9 Table 4.2 Performance of the Wavelet Neural Network (db4 family, decomposition level 2) Figure 4.6 WANN verification data set performance for different decomposition levels and wavelet families.
  • 15.
    15 Figure 4.8 Sensitivityanalysis of No of epochs Figure 4.8 Efficiency in terms of time of both models Figure 4.7 Sensitivity analysis of No of Hidden Nodes RESULTS (2): SENSITIVITY ANALYSIS OF ANN STRUCTURE
  • 16.
    16 WAVELETS IN OTHERBLACK BOX MODELS (1) Figure 5.1 Improvement in performance for different DDMs when coupling with neural network
  • 17.
    17 WAVELETS IN OTHERBLACK BOX MODELS (2) Figure 5.2 Improvement in performance for different DDMs when coupling with neural network
  • 18.
    18 OTHER ALTERNATIVES WITHWAVELETS o Signal Denoising osoft thresholding: all coefficients below a certain threshold are set to zero and those whose magnitude are greater than the threshold are shrunk by that amount, this method recognizes that the coefficients contain in both signal and noise, and attempts to isolate them. ohard thresholding: retains just the values above the threshold some methods to be studied othreshold selection rule based on Stein's Unbiased Estimate of Risk Minimax, uses a fixed threshold chosen to yield minimax performance for mean square error against an ideal procedure Use denoised signal as input for the DDM
  • 19.
    19 Station Name CityRiver Área km2 Type Zero de la Coordenadas Planas Regla (m) Este Norte Porto São José São Pedro do Paraná Río Paraná 674 P / F STH 231.958 790,021.51 7,484,951.80 Novo Balsa Santa Maria Palotina Río Piquiri 21 P / F STH 234.741 830,869.84 7,320,810.80 Porto Paraíso do Norte Rondon Río Ivai 28,427 P / F STH 246.302 943,480.73 7,413,841.81 Porto Caiuá Naviraí Río Paraná 717,814 P / F STH 226.851 836,342.48 7,423,106.03 Porto Ivinhema Ivinhema Río Ivinhema 31,905 P / F STH 242.167 857,165.60 7,520,737.35 Flórida Juti Río Amambay 7,252 P / F STH 254.186 749,764.61 7,457,688.68 Estrada do Iguatemi Iguatemi Río Iguatemi 6,832 P / F STH 235.709 747,067.57 7,373,368.24 Novo Porto Taquara Santa Isabel do Ivaí Río Ivai 34,432 P / F STH 229.955 877,189.34 7,429,826.65 Novo Porto 2 Nova Aurora Río Piquiri 12,124 P / F STH 280.715 889,646.85 7,298,604.28 Balsa do Cantu Altamira do Paraná Río Cantú 2,513 P / F STH 333 934,773.82 7,255,920.82 Ubiratã Ubiratã P STH --- 906,741.30 7,278,572.93 Palmital Palmital P STH --- 984,908.22 7,239,282.54 Ubá do Sul Lidianópolis Río Ivaí 12,701 P / F STH 361.554 1,047,174.60 7,330,641.25 Tereza Cristina Candido de Abreu Río Ivaí 3,572 P / F STH 473.502 1,092,592.18 7,241,116.07 Marquinho Marquinho P STH --- 978,091.57 7,214,274.97 Barbosa Ferraz Barbosa Ferraz Río Curumbata y 3,294 P / F STH 303.233 1,014,277.34 7,334,612.33 Manoel Ribas Manoel Ribas P STH --- 1,040,917.16 7,278,492.04 Porto Guarani Altamira do Paraná Río Piquiri 4,233 P / F STH 338.097 928,071.69 7,243,269.01 µ 82,000 0 82,00041,000 Meters DATA COLLETION: CASE STUDY - INCREMENTAL BASIN OF ITAIPU
  • 20.
    20 QUALITY REQUIREMENTS: QUANTIFICATIONOF DATA GAPS Figure 6.1 & Figure 6.2 Temporal availability of data for all available stations
  • 21.
    21 "Rejected" QUALITY REQUIREMENTS: CONSISTENCYANALYSIS (1) Figure 6.3 Double mass curve consistency test for stations Santa Maria and Estrada do Iguatemi
  • 22.
  • 23.
  • 24.
    24 X X X X X X X X Rejected X RioAmambay Rio Piquiri Rio Ivai Rio Iguatemi RioParana RioIvinhema SELECTED STATIONS Figure 6.4 Summary of the selected and rejected stations
  • 25.
    25 INFILLING METHODS: o InverseWeighted Distance (IWD) o PSD Daily Gridded Precipitation Observations (Science Division, Earth System Research Laboratory) o PSD Daily Gridded Precipitation Observations unbiased o Power law transformation 𝑃∗ = 𝑎𝑃 𝑏 Figure 6.8 Infilling Methods performance
  • 26.
    26 Figure 6.8 Timeseries generated by the infilling procedures used compared with the ground station
  • 27.
    NEXT STEPS 27 o Makeuse of filled precipitation data as input for the proposed forecast models in addition with the flow records of the available stations. (Itaipu case study) o Daily forecast o Monthly forecast o Seasonal forecast o Develop an internet based model in order to dissertate the results for other users to test
  • 28.
    28 LIMITATIONS: o Although waveletdecomposition helps to increase model performance it also increases the computational time needed to run the model o Rainfall network is considered poor, therefore data imputations (infilling of gaps) are also inaccurate, alternatives such as remote sensing can be tested and then evaluated. o The river is highly regulated, as a result the system becomes highly nonlinear and consequently harder to forecast