INDIAN INSTITUTE OF TECHNOLOGY ROORKEE
Forecasting of reservoir outflow using machine
learning models
Under the supervision of
Prof. Abhishek
Assistant Professor
IIT Roorkee
Presented by:
Shikhar Verma
2
Reservoir and water resource Management
• Reservoirs serve various purposes,
including irrigation, flood control,
power generation, and regulation of
river flows.
• The significance of reservoirs lies in
their ability to store water and
regulate its flow.
• Reservoirs also play a critical role in
protecting communities from the
devastating impact of floods.
3
Challenges in future
4
Why data driven model is preferred nowadays?
• Rivers often flow through multiple countries or administrative regions, each
with distinct policies and regulations, while dams within a river basin may be
operated by private companies, each with its own operating policies and
interests.
• Therefore, the operation of reservoirs is influenced by natural factors, operating
rules, and external demands, introducing significant uncertainty into predicting
reservoir outflows at any given time.
• Unlike physics-based models, data-driven models
are better equipped to handle the uncertainty
associated with water resource management,
making them essential tools for optimizing
water allocation and flood risk prevention in river basins.
5
6
Reservoir outflow forecasting using data driven
models
• A simplistic approach to forecasting reservoir outflow assumes the reservoir is
at 100% capacity, effectively canceling out the dam's regulating capacity, and
equates outflow to inflow. While oversimplifying river dynamics, this method
can offer a reasonable approximation during wet seasons, particularly for small
reservoirs nearing full capacity with minimal ability to alter natural river flow.
• Under normal conditions, a common approximation is to assume that the
reservoir outflow for a given day ("day d") will be the same as the previous day
("day d-1"). While this approach can provide acceptable approximations when
the flow doesn't vary significantly day to day, further improvement can be
achieved by employing multivariate solutions that assign different weights to
several known variables.
INDIAN INSTITUTE OF TECHNOLOGY ROORKEE
Different Machine learning techniques
8
Dataset Partitioning
9
Multivariate Linear Regression
The multivariate linear regression model can be represented as:
𝑌=𝛽0+𝛽1𝑋1+𝛽2𝑋2+𝛽3𝑋3+...+𝛽𝑛𝑋𝑛+𝜖
Where:
 𝑌 is the dependent variable (reservoir outflow).
 𝑋1,𝑋2,𝑋3,...,𝑋𝑛 are the independent variables (reservoir inflow, rainfall, temperature,
reservoir level, etc.).
 𝛽0 is the intercept term.
 𝛽1,𝛽2,𝛽3,...,𝛽𝑛 are the coefficients (also known as regression coefficients) representing
the change in Y for a one-unit change in the respective independent variable, holding
other variables constant.
 𝜖 is the error term, representing the difference between the predicted and actual values
of 𝑌.
10
Multivariate Linear Regression
• MLR can outperform Artificial Neural Networks (ANN) in certain applications, especially
when the available sample data is small. The aim is to assess whether the dataset
available was large enough to justify using ANN-based models.
• The goal of multivariate linear regression is to estimate the values of the coefficients
that minimize the difference between the observed and predicted values of the
dependent variable 𝑌.
• Once the model is trained using historical data, it can be used to predict the outflow of
the reservoir based on new values of the independent variables.
11
MLP Model
• A multi-layer perceptron is a type of feed forward neural network with multiple
neutrons arranged in layers.
• The network has at least three layers with an input layer, one or more hidden
layers and an output layer.
12
MLP Model
Activation Function:
 Non-linear activation functions such as ReLU, sigmoid, or tanh are applied to
the neurons in the hidden layers to allow the model to capture complex
relationships and patterns in the data.
Advantages of MLP:
 MLPs can learn complex patterns in data.
 They are capable of approximating any continuous function, given enough
neurons in the hidden layers.
 They can handle both linear and non-linear relationships between input and
output.
13
NARX Model
NARX (Nonlinear AutoRegressivewith eXogenous inputs) is a type of neural
network model that can be used for time series forecasting, including reservoir
outflow forecasting.
AutoRegressive (AR) Part: relationship between the current and past values of
the target variable (reservoir outflow).
Exogenous (X) Part: external factors (such as reservoir inflow, rainfall,
temperature).
Hidden Layers: capture the complex relationships between the input features
and the target variable.
Output Layer: predicted outflow of the reservoir
14
NARX Model
15
LSTM Model
• Long Short-Term Memory (LSTM) is a type of recurrent neural network (RNN).
• LSTM networks have memory cells that allow them to remember information
over long periods of time. Each LSTM cell has three gates: the input gate, the
forget gate, and the output gate. These gates control the flow of information
into and out of the memory cell, enabling the network to learn long-term
dependencies in the data.
INDIAN INSTITUTE OF TECHNOLOGY ROORKEE
METRICS
17
1. Pearson’s coefficient of correlation (r):
 Pearson's coefficient measures the linear relationship between the predictions
made by a model and the observed data.
 It ranges between -1 and 1, where values close to 1 indicate a high degree of
positive linear relationship.
• where Qfor is the forecasted value, Qobs is the observed value and N is the total
number of samples
18
2. Nash-Sutcliffe Efficiency coefficient (NSE):
 NSE is a dimensionless statistic that computes the relative magnitude of the
residual variance with respect to the variance of the observed data.
 It ranges from -∞ to 1.0, with 1.0 being the optimal value.
where Qfor is the forecasted value, Qobs is the observed value and N is
the total number of samples
19
3. Ratio of root mean square error and the
standard deviation of the observed values (RSR):
 RSR is a dimensionless statistic that measures the ratio of the root mean square
error (RMSE) to the standard deviation of the observed data.
 A value of 0 is optimal, indicating no deviation between predicted and
observed values.
where Qfor is the forecasted value, Qobs is the observed value and N is
the total number of samples
20
4. Percent bias (PBIAS):
• PBIAS is an error index statistic that calculates the average tendency of the
predicted values to either underestimate (positive PBIAS) or overestimate
(negative PBIAS) the observed series.
where Qfor is the forecasted value, Qobs is the observed value and N is
the total number of samples
INDIAN INSTITUTE OF TECHNOLOGY ROORKEE
CASE STUDY 1
22
Zhen Xing Zhang (2023)
Objective:
To compare the different machine learning techniques for reservoir outflow
forecasting, applied on Mino river(northwest of spain).
Area of study:
Miño-Sil River Basin:
 Location: The Miño-Sil River basin is situated in the northwestern part of the
Iberian Peninsula.
 Size: The basin has a total area of approximately 17,000 km².
Selected Reservoirs:
 Number: Eight reservoirs were selected from the Miño-Sil river system.
 Capacities: The selected reservoirs have capacities ranging from 10 to 655 hm³.
23
Zhen Xing Zhang (2023)
24
Zhen Xing Zhang (2023)
Data:
•The study utilized 19 years of daily-scale data provided by the Minho-Sil River Basin
Authority for the reservoirs under investigation. The data span from October 1, 2000,
to September 30, 2019, and include information on the percentage of filled volume,
reservoir inflow, and reservoir outflow.
Methodology
 Data Preparation:
 Historical data on reservoir inflow, rainfall, and reservoir level, along with
corresponding outflow measurements, were collected for model training and
evaluation.
• Model Training and Testing:
The models were trained using the training set and evaluated using the
testing set to assess their predictive performance.
25
Zhen Xing Zhang (2023)
Model Evaluation:
 The predictive performance of each model was evaluated using the testing
dataset.
 Evaluation metrics such as RMSE, MAE, and 𝑅2 were calculated to assess the
accuracy and reliability of each model in predicting reservoir outflow.
Model Comparison and Selection:
 The performance of RF, SVM, and ANN models was compared based on the
evaluation metrics.
 The model with the best performance in terms of accuracy and reliability in
predicting reservoir outflow was selected for further analysis.
26
Zhen Xing Zhang (2023)
Results:
•As we can see in RSR, NSE and r, all the models improve the accuracy of the
baseline model in every dataset. The MLR approach was able to outperform the
baseline model on the whole dataset but lags behind the ANN based models. All
the ANN models showed similar accuracy based on RSR, NSE and r metrics and
provide a good generalization across the test subset. The LSTM models were
slightly better than MLP whilst NARX has shown the best performance.
•As can be observed in PBIAS , the baseline does not offer any significant bias since
it is the same data series as the one observed but with a delay. All the ML models
have a tendency to overestimate the series, especially in the test subset. The LSTM
models have the lowest tendency to overestimate and the MLR models have the
highest.
27
Zhen Xing Zhang (2023)
28
Zhen Xing Zhang (2023)
-Belesar, Castrelo and Santo Estevo reservoirs show the highest advantage
for the ML models. On the opposite, in the Barcena reservoir, only NARX
and LSTM were able to improve the baseline approach.
-NARX models performed best for most reservoirs, except for the Barcena
reservoir, where the LSTM model outperformed. This highlights the importance
of per-reservoir analysis.
29
Research gaps and Future opportunity
• Inclusion of additional variables such as climate data, precipitation land use
patterns, and anthropogenic influences to develop more comprehensive and
accurate predictive models.
• Explore the development of hybrid models that combine physics-based models
with machine learning techniques. By integrating the strengths of both
approaches, such hybrid models can potentially improve prediction accuracy
and robustness.
• Investigate the specific characteristics of reservoirs or hydrological conditions
under which each model performs best.
• Qualitative comparison of the models based on interpretability, computational
efficiency, and ease of implementation could provide additional insights.
30
References
1. Berghuijs, W. R., Aalbers, E. E., Larsen, J. R., Trancoso, R., and Woods, R. A.: Recent
changes in extreme floods across multiple continents, Environmental Research Letters,
12, 114035, https://doi.org/10.1088/1748-9326/aa8847, 2017
2. Passerotti, G., Massazza, G., Pezzoli, A., Bigi, V., Zsótér, E., and Rosso, M.: Hydrological
model application in the Sirba river: Early warning system and GloFAS improvements,
Water (Switzerland), 12, 620, https://doi.org/10.3390/w12030620, 2020.
3. Arnell, N. W. and Gosling, S. N.: The impacts of climate change on river flood risk at the
global scale, Climatic Change, 134, 387–401, https://doi.org/10.1007/s10584-014-1084-
5, 2016.
4. Booth, D. B. and Bledsoe, B. P.: Streams and urbanization, in: The Water Environment of
Cities, edited by: Baker, L. A., Springer US, Boston, MA, 93–123,
https://doi.org/10.1007/978-0-387-84891-4_6, 2009.
5. B. (2018, June 15). Application of Machine Learning Algorithms for Timeseries
Forecasting. Medium. https://medium.com/@b.bhaskaran/application-of-machine-
learning-algorithms-for-timeseries-forecasting-c952e765ace
31
References
1. Liu, C., Guo, L., Ye, L., Zhang, S., Zhao, Y., and Song, T.: A review of advances in China’s
flash flood early-warning system, 490 Natural Hazards, 92, 619–634,
https://doi.org/10.1007/s11069-018-3173-7, 2018.
2. Bradshaw, C. J. A., Sodhi, N. S., Peh, K. S. H., and Brook, B. W.: Global evidence that
deforestation amplifies flood risk and severity in the developing world, Global Change
Biology, 13, 2379–2395, https://doi.org/10.1111/j.1365-2486.2007.01446.x, 2007.
3. N. (2021, November 24). Andhra Pradesh flash floods worst in 20 years, says CWC. The
Times of India. https://timesofindia.indiatimes.com/city/visakhapatnam/ap-flash-floods-
worst-in-20-years-says-cwc/articleshow/87875499.cms
4. Adaramola, M.: Climate Change And The Future Of Sustainability: The Impact on
Renewable Resources, CRC Press, 1–336 pp., 2016.
5. Comparison of Machine Learning Models Performance on Simulating Reservoir
Outflow: A Case Study of Two Reservoirs in Illinois, U.S.A.
•Guangping Qie, Zhenxing Zhang ,Elias Getahun ,andEmily Allen Mamer
32
Thanks…

Determining reservoir outflow using machine learning techniques.pptx

  • 1.
    INDIAN INSTITUTE OFTECHNOLOGY ROORKEE Forecasting of reservoir outflow using machine learning models Under the supervision of Prof. Abhishek Assistant Professor IIT Roorkee Presented by: Shikhar Verma
  • 2.
    2 Reservoir and waterresource Management • Reservoirs serve various purposes, including irrigation, flood control, power generation, and regulation of river flows. • The significance of reservoirs lies in their ability to store water and regulate its flow. • Reservoirs also play a critical role in protecting communities from the devastating impact of floods.
  • 3.
  • 4.
    4 Why data drivenmodel is preferred nowadays? • Rivers often flow through multiple countries or administrative regions, each with distinct policies and regulations, while dams within a river basin may be operated by private companies, each with its own operating policies and interests. • Therefore, the operation of reservoirs is influenced by natural factors, operating rules, and external demands, introducing significant uncertainty into predicting reservoir outflows at any given time. • Unlike physics-based models, data-driven models are better equipped to handle the uncertainty associated with water resource management, making them essential tools for optimizing water allocation and flood risk prevention in river basins.
  • 5.
  • 6.
    6 Reservoir outflow forecastingusing data driven models • A simplistic approach to forecasting reservoir outflow assumes the reservoir is at 100% capacity, effectively canceling out the dam's regulating capacity, and equates outflow to inflow. While oversimplifying river dynamics, this method can offer a reasonable approximation during wet seasons, particularly for small reservoirs nearing full capacity with minimal ability to alter natural river flow. • Under normal conditions, a common approximation is to assume that the reservoir outflow for a given day ("day d") will be the same as the previous day ("day d-1"). While this approach can provide acceptable approximations when the flow doesn't vary significantly day to day, further improvement can be achieved by employing multivariate solutions that assign different weights to several known variables.
  • 7.
    INDIAN INSTITUTE OFTECHNOLOGY ROORKEE Different Machine learning techniques
  • 8.
  • 9.
    9 Multivariate Linear Regression Themultivariate linear regression model can be represented as: 𝑌=𝛽0+𝛽1𝑋1+𝛽2𝑋2+𝛽3𝑋3+...+𝛽𝑛𝑋𝑛+𝜖 Where:  𝑌 is the dependent variable (reservoir outflow).  𝑋1,𝑋2,𝑋3,...,𝑋𝑛 are the independent variables (reservoir inflow, rainfall, temperature, reservoir level, etc.).  𝛽0 is the intercept term.  𝛽1,𝛽2,𝛽3,...,𝛽𝑛 are the coefficients (also known as regression coefficients) representing the change in Y for a one-unit change in the respective independent variable, holding other variables constant.  𝜖 is the error term, representing the difference between the predicted and actual values of 𝑌.
  • 10.
    10 Multivariate Linear Regression •MLR can outperform Artificial Neural Networks (ANN) in certain applications, especially when the available sample data is small. The aim is to assess whether the dataset available was large enough to justify using ANN-based models. • The goal of multivariate linear regression is to estimate the values of the coefficients that minimize the difference between the observed and predicted values of the dependent variable 𝑌. • Once the model is trained using historical data, it can be used to predict the outflow of the reservoir based on new values of the independent variables.
  • 11.
    11 MLP Model • Amulti-layer perceptron is a type of feed forward neural network with multiple neutrons arranged in layers. • The network has at least three layers with an input layer, one or more hidden layers and an output layer.
  • 12.
    12 MLP Model Activation Function: Non-linear activation functions such as ReLU, sigmoid, or tanh are applied to the neurons in the hidden layers to allow the model to capture complex relationships and patterns in the data. Advantages of MLP:  MLPs can learn complex patterns in data.  They are capable of approximating any continuous function, given enough neurons in the hidden layers.  They can handle both linear and non-linear relationships between input and output.
  • 13.
    13 NARX Model NARX (NonlinearAutoRegressivewith eXogenous inputs) is a type of neural network model that can be used for time series forecasting, including reservoir outflow forecasting. AutoRegressive (AR) Part: relationship between the current and past values of the target variable (reservoir outflow). Exogenous (X) Part: external factors (such as reservoir inflow, rainfall, temperature). Hidden Layers: capture the complex relationships between the input features and the target variable. Output Layer: predicted outflow of the reservoir
  • 14.
  • 15.
    15 LSTM Model • LongShort-Term Memory (LSTM) is a type of recurrent neural network (RNN). • LSTM networks have memory cells that allow them to remember information over long periods of time. Each LSTM cell has three gates: the input gate, the forget gate, and the output gate. These gates control the flow of information into and out of the memory cell, enabling the network to learn long-term dependencies in the data.
  • 16.
    INDIAN INSTITUTE OFTECHNOLOGY ROORKEE METRICS
  • 17.
    17 1. Pearson’s coefficientof correlation (r):  Pearson's coefficient measures the linear relationship between the predictions made by a model and the observed data.  It ranges between -1 and 1, where values close to 1 indicate a high degree of positive linear relationship. • where Qfor is the forecasted value, Qobs is the observed value and N is the total number of samples
  • 18.
    18 2. Nash-Sutcliffe Efficiencycoefficient (NSE):  NSE is a dimensionless statistic that computes the relative magnitude of the residual variance with respect to the variance of the observed data.  It ranges from -∞ to 1.0, with 1.0 being the optimal value. where Qfor is the forecasted value, Qobs is the observed value and N is the total number of samples
  • 19.
    19 3. Ratio ofroot mean square error and the standard deviation of the observed values (RSR):  RSR is a dimensionless statistic that measures the ratio of the root mean square error (RMSE) to the standard deviation of the observed data.  A value of 0 is optimal, indicating no deviation between predicted and observed values. where Qfor is the forecasted value, Qobs is the observed value and N is the total number of samples
  • 20.
    20 4. Percent bias(PBIAS): • PBIAS is an error index statistic that calculates the average tendency of the predicted values to either underestimate (positive PBIAS) or overestimate (negative PBIAS) the observed series. where Qfor is the forecasted value, Qobs is the observed value and N is the total number of samples
  • 21.
    INDIAN INSTITUTE OFTECHNOLOGY ROORKEE CASE STUDY 1
  • 22.
    22 Zhen Xing Zhang(2023) Objective: To compare the different machine learning techniques for reservoir outflow forecasting, applied on Mino river(northwest of spain). Area of study: Miño-Sil River Basin:  Location: The Miño-Sil River basin is situated in the northwestern part of the Iberian Peninsula.  Size: The basin has a total area of approximately 17,000 km². Selected Reservoirs:  Number: Eight reservoirs were selected from the Miño-Sil river system.  Capacities: The selected reservoirs have capacities ranging from 10 to 655 hm³.
  • 23.
  • 24.
    24 Zhen Xing Zhang(2023) Data: •The study utilized 19 years of daily-scale data provided by the Minho-Sil River Basin Authority for the reservoirs under investigation. The data span from October 1, 2000, to September 30, 2019, and include information on the percentage of filled volume, reservoir inflow, and reservoir outflow. Methodology  Data Preparation:  Historical data on reservoir inflow, rainfall, and reservoir level, along with corresponding outflow measurements, were collected for model training and evaluation. • Model Training and Testing: The models were trained using the training set and evaluated using the testing set to assess their predictive performance.
  • 25.
    25 Zhen Xing Zhang(2023) Model Evaluation:  The predictive performance of each model was evaluated using the testing dataset.  Evaluation metrics such as RMSE, MAE, and 𝑅2 were calculated to assess the accuracy and reliability of each model in predicting reservoir outflow. Model Comparison and Selection:  The performance of RF, SVM, and ANN models was compared based on the evaluation metrics.  The model with the best performance in terms of accuracy and reliability in predicting reservoir outflow was selected for further analysis.
  • 26.
    26 Zhen Xing Zhang(2023) Results: •As we can see in RSR, NSE and r, all the models improve the accuracy of the baseline model in every dataset. The MLR approach was able to outperform the baseline model on the whole dataset but lags behind the ANN based models. All the ANN models showed similar accuracy based on RSR, NSE and r metrics and provide a good generalization across the test subset. The LSTM models were slightly better than MLP whilst NARX has shown the best performance. •As can be observed in PBIAS , the baseline does not offer any significant bias since it is the same data series as the one observed but with a delay. All the ML models have a tendency to overestimate the series, especially in the test subset. The LSTM models have the lowest tendency to overestimate and the MLR models have the highest.
  • 27.
  • 28.
    28 Zhen Xing Zhang(2023) -Belesar, Castrelo and Santo Estevo reservoirs show the highest advantage for the ML models. On the opposite, in the Barcena reservoir, only NARX and LSTM were able to improve the baseline approach. -NARX models performed best for most reservoirs, except for the Barcena reservoir, where the LSTM model outperformed. This highlights the importance of per-reservoir analysis.
  • 29.
    29 Research gaps andFuture opportunity • Inclusion of additional variables such as climate data, precipitation land use patterns, and anthropogenic influences to develop more comprehensive and accurate predictive models. • Explore the development of hybrid models that combine physics-based models with machine learning techniques. By integrating the strengths of both approaches, such hybrid models can potentially improve prediction accuracy and robustness. • Investigate the specific characteristics of reservoirs or hydrological conditions under which each model performs best. • Qualitative comparison of the models based on interpretability, computational efficiency, and ease of implementation could provide additional insights.
  • 30.
    30 References 1. Berghuijs, W.R., Aalbers, E. E., Larsen, J. R., Trancoso, R., and Woods, R. A.: Recent changes in extreme floods across multiple continents, Environmental Research Letters, 12, 114035, https://doi.org/10.1088/1748-9326/aa8847, 2017 2. Passerotti, G., Massazza, G., Pezzoli, A., Bigi, V., Zsótér, E., and Rosso, M.: Hydrological model application in the Sirba river: Early warning system and GloFAS improvements, Water (Switzerland), 12, 620, https://doi.org/10.3390/w12030620, 2020. 3. Arnell, N. W. and Gosling, S. N.: The impacts of climate change on river flood risk at the global scale, Climatic Change, 134, 387–401, https://doi.org/10.1007/s10584-014-1084- 5, 2016. 4. Booth, D. B. and Bledsoe, B. P.: Streams and urbanization, in: The Water Environment of Cities, edited by: Baker, L. A., Springer US, Boston, MA, 93–123, https://doi.org/10.1007/978-0-387-84891-4_6, 2009. 5. B. (2018, June 15). Application of Machine Learning Algorithms for Timeseries Forecasting. Medium. https://medium.com/@b.bhaskaran/application-of-machine- learning-algorithms-for-timeseries-forecasting-c952e765ace
  • 31.
    31 References 1. Liu, C.,Guo, L., Ye, L., Zhang, S., Zhao, Y., and Song, T.: A review of advances in China’s flash flood early-warning system, 490 Natural Hazards, 92, 619–634, https://doi.org/10.1007/s11069-018-3173-7, 2018. 2. Bradshaw, C. J. A., Sodhi, N. S., Peh, K. S. H., and Brook, B. W.: Global evidence that deforestation amplifies flood risk and severity in the developing world, Global Change Biology, 13, 2379–2395, https://doi.org/10.1111/j.1365-2486.2007.01446.x, 2007. 3. N. (2021, November 24). Andhra Pradesh flash floods worst in 20 years, says CWC. The Times of India. https://timesofindia.indiatimes.com/city/visakhapatnam/ap-flash-floods- worst-in-20-years-says-cwc/articleshow/87875499.cms 4. Adaramola, M.: Climate Change And The Future Of Sustainability: The Impact on Renewable Resources, CRC Press, 1–336 pp., 2016. 5. Comparison of Machine Learning Models Performance on Simulating Reservoir Outflow: A Case Study of Two Reservoirs in Illinois, U.S.A. •Guangping Qie, Zhenxing Zhang ,Elias Getahun ,andEmily Allen Mamer
  • 32.