1. Discharge Forecasting of
Mahanadi River Basin Using
Machine Learning Algorithm
SUBMITTED BY :--
• Subhadeep Sahu (2002030042)
• Sourav Ghose (2002030130)
• Suraj Gouda(1902031109)
• Allein Guria (1902030020)
• Swastik Suman
Pattanaik(1802031096)
GUIDED BY :-
DR. Janhabi Meher
SUPERVISOR
DEPT. OF CIVIL ENGG.,
VSSUT, BURLA, ODISHA
(Water Resources Engineering)
2. Contents
• Introduction
• Objective and Scope of the study
• Literature Review & Critical Review
• Materials and Methodology
• Conclusion
• References
3. Introduction
Background
• The main aim of this study is to anticipate discharge within the Mahanadi River Basin..
• Sustainable river basin management is critical for healthy ecosystems and communities.
• Understanding and predicting dynamic river features like streamflow, discharge, and hydrological patterns is vital
for effective water resource management.
• Traditional methods often lack the complexity to capture these processes.
4. Challenges:
•Predicting discharge in the Mahanadi River Basin, a vital water resource for various ecological and human activities.
•Accurate discharge forecasts are crucial for:
1. Sustainable development
2. Flood forecasting
3. Efficient water resource management.
Solutions:
•This aims to combine GIS and machine learning to:
1. Analyze geographical and hydrological features like area, discharge data, flow accumulation, flow
direction, stream order, and stream to feature.
2. Develop a machine learning model for discharge prediction.
5. Literature Review
Author and Date of Publication Name of the Research Paper Conclusion
Cecilia Svensson et al., 2013 Flood Frequency Estimation Using a Joint
Probability Approach.
Joint probability approach captures the
variability and uncertainties in flood
estimation more comprehensively than
traditional methods.
Martin Durocher et al., 2015 Nonlinear Approach to Regional Flood
Frequency Analysis Using Projection Pursuit
Regression
PPR is effective in capturing nonlinearity in
flood frequency analysis.
Sandeep Samantaray et al., 2020 Estimation of Flood Frequency using
Statistical Method: Mahanadi River Basin,
India
Gumbel max method provides better flow
discharge values, but the best-fit method
varies by gauge station.
Witold F. Krajewski et al., 2023 Revisiting Turcotte’s Approach: Flood
Frequency Analysis
Turcotte's method provides estimates for rare flood
occurrences but doesn't significantly enhance
accuracy compared to LP3 distribution.
6. Flood Frequency Estimation Using a Joint Probability Approach.:
1.Flexibility in Input Variables: The new method enables a more flexible assessment by allowing all input variables to take values across
their respective distributions.
2.Monte Carlo Simulation: This simulation considers dependencies between events and variables, providing a more realistic representation
of flood characteristics.
3.Holistic Analysis: Both peak flows and flow volumes are considered in the frequency analysis.
Nonlinear Approach to Regional Flood Frequency Analysis Using Projection Pursuit Regression:
The paper proposes a novel nonlinear approach, Projection Pursuit Regression (PPR), for Regional Flood Frequency Analysis (RFFA).
PPR combines features of Generalized Additive Models (GAM) and Artificial Neural Networks (ANN), using smooth functions for complex
pattern fitting.
Applied to southern Quebec hydrometric stations, PPR proves effective in flood quantile estimation.
However, the paper highlights the need for further research on PPR's applicability in multivariate RFFA and understanding the significance of
intermediate predictors and smooth functions in the presence of multiple terms.
Future work involves adapting PPR for multivariate cases and exploring predictor interactions.
Critical Review
7. Estimation of Flood Frequency using Statistical Method: Mahanadi River Basin, India
The study compared four statistical methods for streamflow forecasting in the Mahanadi River basin, India. Gumbel max
performed best in predicting flow discharge across all gauge stations. Goodness-of-fit tests revealed Gen. extreme value as the
best fit for three stations, while LP III was best at one station. The study also provided distribution parameters and applied
goodness-of-fit tests to rainfall data series.
Revisiting Turcotte’s Approach: Flood Frequency Analysis
The paper reexamines Turcotte's method for flood frequency estimation, focusing on Iowa. It employs a formal statistical
approach and introduces a novel simulation framework to assess sampling uncertainty. The study finds that Turcotte's method
tends to provide conservative estimates for low-probability quantiles compared to the Log-Pearson Type III distribution. In a
related note, the authors combine Turcotte's approach with recommendations from Coles (2001) and Katz (2013), highlighting
that linear extrapolation of power-law parameters based on time can result in potentially high quantiles, especially for stations
where flood magnitudes seem to be increasing over the past 50 years.
Critical Review
8. Objective
• To use ArcGIS to create a comprehensive map of the Mahanadi River Basin that includes all
essential spatial features.
• To incorporate stream-to-feature, fill, flow accumulation, and flow direction—all derived from
GIS—into the machine learning model to increase accuracy.
• To use machine learning techniques to create a reliable discharge forecasting model.
• To evaluate the model's predictive power for river discharge in the Mahanadi Basin.
9. Methodology
GIS Mapping:
• ArcGis is used to analyze and visualize the Mahanadi River Basin’s spatial characteristics, including basin shape,
DEM, and raingauge stations.
• Discharge map is generated using Raster calculator followed by IDW (Inverse Distance Weighting) tool under
Geostatistical Analyst Tools.
• Using Fill operation sinks and depression is removed from the elevation data. It also enhances the accuracy of terrain
modeling and ensures smoother representation of landscapes.
• The direction of water flow in each cell of a raster is determined by Flow Direction, it is essential for understanding
the flow patterns in a watershed further aids in hydrological modelling.
• Using flow accumulation the accumulated flow in each cell of a raster dataset is calculated. From these areas with
high water flow is identified, which will help in watershed delineation and flood risk assessment.
• Using Stream to feature operation a rasterized representation of a stream was converted into vector features. This
enables the integration of hydrological features into GIS databases for further analysis.
10. Integration of GIS with Machine Learning Models:
• The data obtained from GIS will be integrated with machine learning models such as linear regression.
• Linear regression is used to forecast river discharge, using rainfall data as the feature and discharge as the
target variable.
• The model undergoes data preprocessing which involves filling missing values, standardizing features, and
splitting the dataset into training and testing sets.
• Finally the model is evaluated using RMSE(Root Mean Squared Error) to measure the model’s accuracy and
test the model with new data to predict flood frequency.
11. Work Done
GIS Mapping:
• The initial steps involved the extraction of essential geospatial data, including the delineation of the basin and the
creation of Digital Elevation Models (DEMs).
• These foundational datasets served as the basis for subsequent GIS mapping of Mahanadi River basin.
• The boundaries of Mahanadi River basin was delineated from Aster platform of NASA, then to comprehend the
topography of the river basin, Digital Elevation Models were extracted from elevation datasets.
• DEMs represent the three-dimensional surface of the Earth, providing information on variations in elevation
across the basin.
• After the extraction of Basin shape file and DEM, the data was added to ArcGIS 10.8 of the study area.
• 5 station points i.e. Kesinga, Kantamal, Tikarpada, Sundargarh, Salebhata were located on the map using the
respective coordinates using the IDW function.
• Using the Arc Toolbox the maps for Fill, Flow direction, Flow accumulation, stream feature and discharge was
generated .
12. Mahanadi River Basin along with 5 rainfall stations
Discharge map at 5 stations in Mahanadi River
Basin
13. Stream to Feature Map Fill map
Flow Direction Map Flow Accumulation Map
14. Work Done
Linear Regression Model:
• The rainfall data was collected for the districts covered by Mahanadi River Basin and discharge data was
collected from 5 rainfall gauge stations for the year 2021.
• The data were arranged in a chronological order of time using Microsoft excel. Then the excel file was
converted into CSV format. And it was named as ‘rainfall.csv’ using Pandas ‘read_csv ()’ function.
• The Linear Regression Model was defined with a linear equation representing the relationship between the
rainfall data and the discharge. The ‘Rainfall’ column was set as the feature and the ‘discharge’ column was
designated as the target variable.
• The missing values in the features and target columns were filled their respective means using ‘fillna()’
method.
• Using ‘train_test_split’ function the dataset was spited into training and testing sets. The training set was used
to train the model, while the testing set was used to evaluate its performance.
• The trained model was used to make predictions on the test set with the predict method. And the predictions
were made on the test using “model.preddict()’ function.
• Root Mean Squared Error (RMSE) used to measure the accuracy between predicted values and actual values.
15. Work To be Done
• To delineate watershed boundary and to apply the monthly average rainfall data of 31 years(1990-
2020) using ArcGIS over it along with the discharge data of same duration.
• These data will be used to interpolate rainfall using IDW or Kriging tool to generate the monthly
average rainfall values mentioned in step 1
• The rainfall and discharge values will be integrated with Machine Learning algorithms as independent
and dependent variables respectively using which the future discharge will be predicted.
• Usage and comparison of accuracy of various machine learning models such as linear regression,
random forest regression, logistic regression, decision tree, etc. to select the model with maximum
accuracy.
16. Conclusion and Future Work
Summarize the main findings of the study and their significance for
discharge forecasting in the Mahanadi river basin. Explore potential areas
for further research and improvement, paving the way for advancements
in river management practices.