SlideShare a Scribd company logo
1 of 57
Download to read offline
Automated Drought Analysis with Python and Machine Learning
THESIS SUBMITTED TO
Symbiosis Institute of Geoinformatics
FOR PARTIAL FULFILLMENT OF THE M. Sc. DEGREE
By
Gurminder Bharani
(Batch 2014 - 16)
Symbiosis Institute of Geoinformatics
Symbiosis International University
5th Floor, Atur Centre, Gokhale Cross Road,
Model Colony, Pune – 411016.
CERTIFICATE
Page | 2
Certified that this thesis titled ‘Automated Drought Analysis with Python and Machine
Learning’ is a bonafide work done by Mr. Gurminder Bharani, at International Water
Management Institute (IWMI), Sri Lanka and Symbiosis Institute of Geoinformatics, under our
supervision.
Supervisor External
Dr. Giriraj Amarnath
IWMI
Supervisor Internal
Dr. T. P. Singh
Director,
Symbiosis Institute of
Geoinformatics
Page | 3
Index
I. Acknowledgement 4
II. List of Figure 5
III. List Table 6
IV. Abbreviation list 7
1. Preface 8
2. Introduction 10
3. Literature Review 12
4. Study Area (if there) 24
5. Methodology 25
6. Result 37
7. Discussion 56
8. Conclusion 57
9. References 58
10. Annexure 59
Page | 4
Acknowledgement
The last six months working on my project has been a very productive journey.
Getting an opportunity to have a glimpse of what the research world looks and feels like
could not have been possible had it not been for Dr Giriraj Amarnath, who hired me as intern in
IWMI. I would like to extend my heartfelt gratitude to Mr Peejush Pani, who particularly helped
in developing the tools by rendering me his remarkable and constant guidance through his remote
sensing modelling expertise in the field.
The experience in this esteemed organisation could be marked as leaving an indelible mark
on my learning experience. It has been a great exposure and had served as a reality check through
which I plan to better myslef and polish my learning skills in the days to come.
Further, I would thank the faculty of Symbiosis Institute of Geoinformatics, Pune, namely
Dr T. P. Singh, Dr Navendu Chowdhury and Col B. K. Pradhan, without whom my knowledge
about GIS and its application in the various domains would not have been clear.
I would like to thank my computer science teacher Mr Charudatta Ekbote without his
teachings none of this would have been possible.
Page | 5
List of Figures
Figure 1: Simple implementation of Decision Tree
Figure 2: Simple implementation of Random Forest
Figure 3: Area of Interest
Figure 4: UI of Monthly Sum for SPI
Figure 5:UI of SPI Calculation from 1 month to 12 months
Figure 6: UI of SPI Calculation from 13 months to 60 months
Figure 7: UI of unpacking all the calculated SPI to daily raster images
Figure 8: Comparison of Mean Rainfall with Mean SPI
Figure 9: 2007 TRMM data compared with all types of Bias Correction methods
Figure 10: 1998 TRMM data compared with all types of Bias Correction methods
Figure 11: 2007 PERSIANN data compared with all types of Bias Correction methods
Figure 12: 1998 PERSIANN data compared with all types of Bias Correction methods
Figure 13: 1998 PERSIANN data compared with all types of Bias Correction methods
Figure 14: 2007 PERSIANN data compared with all types of Bias Correction methods
Figure 15: IWMI tools in ArcCatelog 10.3
Page | 6
List of Tables
Table 1: SPI Values
Table 2: Comparison between SPI calculated by WMO and IWMI made python tool
Table 3: Comparison between Random forest with 10, 80, 25, 50 estimators and decision tree
with IDSI and SPI
Page | 7
Abbreviation List
MODIS Moderate Imaging Spectro Radiometer
VCI Vegetation Condition Index
TCI Temperature Condition Index
SPI Standardized Precipitation Index
IMD Indian Meteorological Department
NDVI Normalized Difference Vegetation Index
OLI Operational Landsat Imager
DEM Digital Elevation Model
WMO World Meteorological Organization
IDSI Integrated Drought Severity Index
Page | 8
Preface
This project is about blending in multiple index from multiple satellites such as MODIS is known
for its high temporal dataset hence two indices are derived from MODIS which are VCI
(Vegetation Condition Index) and TCI (Temperature Condition Index).
The SPI (Standardized Precipitation Index) is widely accepted as the prime indicator of
meteorological drought and is derived from IMD (Indian Meteorological Department)
precipitation data.
Landsat data has been used specifically to get the fine resolution of 30 meter in the result of the
final classified product. NDVI (Normalized Difference Vegetation Index) is the indicator used to
identify pixels which will be eligible for drought.
Benefits of blending the datasets
There are several benefits that one may notice when blending datasets. Few are discussed below:
Temporal resolution:
High temporal resolution from MODIS dataset helps in understanding the long term behavior of
the data.
Also, by comparing the values in long term we can determine the severity of the dataset based on
the past events.
Spatial resolution:
Low spatial resolution gives the detailed outline of the data distributed spatially. Freely available
low spatial resolution data on an average do not have large historical data which makes processes
like identification of drought pixel compared to the past a difficult task. Additionally, when we
blend these dataset we get the benefits of both temporal as well as spatial resolution to indicate
stress of drought
Determining short term and long term drought with 1 month SPI to 60 month SPI
SPI based on 1 month of precipitation data over 30 years will indicated short term meteorological
drought because accumulation of one month of precipitation data is taken of every year and is
compared with the past accumulated rainfall of the respective month.
Page | 9
Here, SPI based on 12 months of accumulated precipitation data will indicate locations which are
under meteorological drought over one year. Similarly 24 months based SPI will indicate locations
suffering drought over two years.
When we try to classify drought with machine learning by taking 1, 12, 24 and 60, we can get
drought stress pixels with varying intensity.
Page | 10
Introduction
Objective
1. Need for Automation
The main objective of automation is to produce rapid results. When the project is based on
high temporal data sets, the processing of these datasets becomes repetitive. Once the
definition of the process is defined it can be automated, this will help in reducing human
interference which will lead to less erroneous product. Also since it’s automated the results
generated are rapid.
ArcMap a GIS software also has limitation when it comes to project specific customization,
for example exporting weekly drought maps by default has to be done manually in case of
high temporal dataset it becomes a time consuming task, with the help of automation we
can reduce the time of generating the result.
When it comes to analyzing the data, depending on the methodology many intermediate
dataset are created. For example for plotting sum monthly mean rainfall of precipitation
data provided by IMD from 1901 to 2015 a total dataset of 115 years, traditionally ArcMap
user will create Batch for every month and perform zonal statistics on the files given in
batch, the user will do this for every month of every from 1901 to 2015.
The intermediate data generated here is the sum of the monthly files taking unnecessary
space in the computer.
Each sum file is around 136 Kilobytes after calculation over 115 years we waste 184
Megabytes of space.
Since IMD dataset is 0.25 degree in spatial resolution and the size of the individual dataset
is in Kilobytes
Comparing it to Landsat 8 OLI images where each image is over 1 Gigabyte the amount
of space wasted will be more therefore with help of automation we can use the hardware
resources of the computer to the optimal level.
2. Need for machine learning classification
Machine learning enables us to create application which replicate human cognitive function
to classify objects. Machine learning has many sub streams, each having its own
advantages and disadvantages.
Page | 11
Once the algorithm is trained it can be used multiple times to classify drought to generate
weekly or daily product depending on the input parameters.
Page | 12
Literature Review
Python success story
ForecastWatch.com
Introduction
ForecastWatch.com, a service of Intellovations, is in the business of rating the accuracy of weather
reports from companies such as Accuweather, MyForecast.com, and The Weather Channel. Over
36,000 weather forecasts are collected every day for over 800 U.S. cities, and later compared with
actual climatological data. These comparisons are used by meteorologists to improve their weather
forecasts, and to compare their forecasts with others. They are also used by consumers to better
understand the probable accuracy of a forecast.
The Architecture
ForecastWatch.com is built from four major architectural components: An input process for
acquiring forecasts, an input process for acquiring measured climatological data, the data
aggregation engine, and the web application framework.
There are two main input processes in the system: The forecast parser, and the actuals parser. The
forecast parser is responsible for requesting forecasts from the web for each of the forecast
providers ForecastWatch.com tracks. It parses the forecast from the page and inserts the forecast
data into a database until it can be compared to the actual data. The actuals parser takes actual data
from the National Climatic Data Center of the National Weather Service, which provides high,
low, precipitation, and significant weather events for over 800 United States cities and inserts the
data in to the database. This process also scores the forecasts with the actual weather data, and
places that information in the database.
Once the data has been collected and scored, it is processed by the aggregation engine, which
combines the scores into yearly and monthly blocks, sliced by provider, location, and the number
of days into the future for which the forecasts were predicting. In its first year, 2003, the system
only gathered forecasts for 20 U.S. cities, or about 250,000 individual forecasts, so most of the
data output was based on the raw scoring data. The aggregation engine was added once the system
was scaled up to 800 cities, increasing the data stream by almost 4000%. In the first half of 2004,
Page | 13
the system has already scored over 4 million forecasts, all collected, parsed, and displayed on the
web.
Implemented with Python
ForecastWatch.com is a 100% pure Python solution. Python is used in all its components, from
the back-end to the front-end, including also the more performance-critical portions of the system.
Python was chosen initially because it comes with many standard libraries useful in collecting,
parsing, and storing data from the web. Among those particularly useful in this application were
the regular expression library, the thread library, the object serialization library, and gzip data
compression library. Other libraries, such as an HTTP client capable of accepting cookies
(ClientCookie), and an HTML table parser (ClientTable) were available as third party modules.
These proved invaluable and were easy to use.
The threading library turned out to be very important in scaling ForecastWatch.com's coverage to
over 800 cities. Grabbing web pages is a very I/O bound process, and requesting a single page at
a time for roughly 5000 web pages a day would have been prohibitively time-consuming. Using
Python's threading library, the web page retrieval loop simply calls thread.start_new() for each
request, passing in the necessary class instance method that retrieves and processes the web page,
along with the parameters necessary to describe the city for the desired forecast. The request
classes use a Python built-in Event class instance to communicate with the main controlling thread
when processing is complete. Python made this application of threading incredibly easy.
Python is also used in the aggregation engine, which runs as a separate process to combine forecast
accuracy scores into monthly and yearly slices. The aggregation process uses queries
via MySQLdb to theMySQL database where the input modules have placed the forecast and
climatological data they have harvested. Colorized maps, showing forecast accuracy by
geographical area, are then generated for use on the web site and in printed reports.
Python Made It Possible
Python played a significant role in the success of ForecastWatch.com. The product currently
contains over 5,000 lines of Python, most of which are concerned with implementing the high-
level functionality of the application, while most of the details are taken care of by Python's
Page | 14
powerful standard libraries and the third party modules described above. Many more lines of code
would have been needed working in, for example, Java or PHP. The integration capabilities of
those languages are not as strong, and their threading support is harder to use.
About Python
Python is impressive as an object-oriented rapid application development language. One of
Python's key strengths lies in its ability to produce results quickly without sacrificing
maintainability of the resulting code. In ForecastWatch.com, Python was used for prototyping as
well, and those prototypes were able to evolve cleanly into the production code without requiring
a complete rewrite or switching toolsets. This saved substantial effort and made the development
process more flexible and effective.
Because of the clean design of the language, refactoring the Python code was also much easier
than in other languages; moving code around simply requires less effort.
Python's interpreted nature was also a benefit: Code ideas can easily be tested in the Python
interactive shell, and lack of a compilation phase makes for a shorter edit/test cycle.
All of these factors combine to make Python a terrific alternative to C++ and Java as a general
purpose programming language. ForecastWatch.com was made possible because of the ease of
programming complex tasks in Python, and the rapid development that Python allows.
Python Modules
1. Pandas:
Pandas are used for data analytics. It enables the programmer to traverse around the data
and get the desired result. Pandas behaves similar to Microsoft Excel the only difference is
that there is no user interface for pandas to be implemented
2. Matplotlib:
As discussed in the earlier segment pandas matplotlib enables the user to visualize the
behavior of the data. Matplotlib is a vast library which is capable of printing any type of
graph. In this project matplotlib is used for analyzing the spatial correlation between two
dataset
Page | 15
3. Arcpy:
Arcpy is a python module made only for ArcGIS application. This module cannot be used
outside the ArcGIS environment. The basic purpose of this module to create customized
application in ArcGIS Desktop. Every tool in the ArcMap has a python implementation
which you can see in the tool description. By understanding the behavior of the tool we
can then merger multiple tools in ArcMap and get the desired output
This helps in reducing the manual work as merging the tools automated the process for
generating tools
4. Numpy:
Numpy helps in performing operation on 2D or 3D numpy arrays. With help of numpy any
raster based model can be generated. Numpy has additional methods which enable the user
to transform the raster dataset which are saved in 2D or 3D numpy array. Multiple modules
in numpy makes the task like taking the temporal mean by excluding particular value in
the high temporal dataset (For Example 100 years) very easy. Just by masking the dataset
of numpy raster the above result can be achieved
5. Openpyxl:
Openpyxl is the bridge between the ArcMap to Microsoft Excel. The results generated from
the ArcMap can be taken in the form of pandas data frame and then be stored into excel.
After storing the dataset we can use multiple methods in openpyxl to visualize the data.
All types of graph and charts can be generated with the help of openpyxl. Some of them
are line chart, scatter chart, pie charts, area charts etc.
6. Scipy:
Scipy includes all the complex statistical tools for data analysis. One of which known as
gamma cumulative probability density function is used for calculation of SPI. Statistical
components as linear regression calculation with results containing standard error and other
important information can be generated with the help of scipy. Interpolation module inside
scipy helps the programmer to perform tasks like interpolation of point dataset to generate
surface. There are dedicated modules in scipy for Fourier Transform, Linear Algebra,
Eigen Values, Multidimensional image processing etc.
7. (GRASS, n.d.):
Page | 16
GRASS is freely available plugin in Quantum GIS software which can be used with raster
as well as vector data for analysis. GRASS is a plugin not a module, it contains many
Python modules for analysis of spatial data like:
i. Db as database module.
ii. R which is a Raster module.
iii. V which is Vector module.
8. Sklearn:
a. Decision Tree (scikit-learn, 1.10. Decision Trees, n.d.): Decision Trees (DTs) are a
non-parametric supervised learning method used for classification and regression.
The goal is to create a model that predicts the value of a target variable by learning
simple decision rules inferred from the data features.
For instance, in the example below, decision trees learn from data to approximate
a sine curve with a set of if-then-else decision rules. The deeper the tree, the more
complex the decision rules and the fitter the model.
Some advantages of decision trees are:
 Simple to understand and to interpret. Trees can be visualised.
 Requires little data preparation. Other techniques often require data
normalisation, dummy variables need to be created and blank values to be
removed. Note however that this module does not support missing values.
 The cost of using the tree (i.e., predicting data) is logarithmic in the number
of data points used to train the tree.
 Able to handle both numerical and categorical data. Other techniques are
usually specialised in analysing datasets that have only one type of variable.
See algorithms for more information.
 Able to handle multi-output problems.
 Uses a white box model. If a given situation is observable in a model, the
explanation for the condition is easily explained by boolean logic. By
contrast, in a black box model (e.g., in an artificial neural network), results
may be more difficult to interpret.
 Possible to validate a model using statistical tests. That makes it possible to
account for the reliability of the model.
Page | 17
 Performs well even if its assumptions are somewhat violated by the true
model from which the data were generated.
The disadvantages of decision trees include:
 Decision-tree learners can create over-complex trees that do not
generalise the data well. This is called overfitting. Mechanisms such as
pruning (not currently supported), setting the minimum number of
samples required at a leaf node or setting the maximum depth of the tree
are necessary to avoid this problem.
 Decision trees can be unstable because small variations in the data might
result in a completely different tree being generated. This problem is
mitigated by using decision trees within an ensemble.
 The problem of learning an optimal decision tree is known to be NP-
complete under several aspects of optimality and even for simple
concepts. Consequently, practical decision-tree learning algorithms are
based on heuristic algorithms such as the greedy algorithm where
locally optimal decisions are made at each node. Such algorithms cannot
guarantee to return the globally optimal decision tree. This can be
mitigated by training multiple trees in an ensemble learner, where the
features and samples are randomly sampled with replacement.
 There are concepts that are hard to learn because decision trees do not
express them easily, such as XOR, parity or multiplexer problems.
 Decision tree learners create biased trees if some classes dominate. It is
therefore recommended to balance the dataset prior to fitting with the
decision tree.
Page | 18
Figure 16: Simple implementation of Decision Tree
b. Random Forest (scikit-learn, 3.2.4.3.1. sklearn.ensemble.RandomForestClassifier,
n.d.):
A random forest is a meta estimator that fits a number of decision tree classifiers
on various sub-samples of the dataset and use averaging to improve the predictive
accuracy and control over-fitting. The sub-sample size is always the same as the
original input sample size but the samples are drawn with replacement
if bootstrap=True (default).
The example of random forest implementation is given as following
Page | 19
Figure 17: Simple implementation of Random Forest
9. Decision Tree Implementation:
(Hwahwan & Cha, 2008) Implemented land classification with machine learning technique
named decision tree.
The parameters taken for classification were
i. DEM
ii. Aspect
iii. Slope
iv. ISO cluster
v. Population Density
vi. Distance to water
vii. Distance to Road
To train the decision tree, classified land data was taken from the government of South
Korea.
The classes classified are Forest, Urban, Water, Agriculture, Rangeland, Barren land,
Wetland. After training and classification of the dataset 96% of accuracy was achieved.
Page | 20
10. WMO SPI (Standardized precipitation Index):
(WMO, 2009) In the Inter-Regional Workshop on Indices and Early Warning Systems for
Drought declares that SPI (Standardized Precipitation Index) should be used to characterize
meteorological drought.
SPI answers the question like is rainfall in particular month in deficit or surplus compare
to past years of data.
SPI ranges from 1 month to 60 months, it can be done over 60 month as well if long term
rainfall data is available.
One month SPI helps in identifying short term drought events since data of 1 month rainfall
is compared with past records.
As we increase the month of SPI we can identify long term drought, for example if we see
48 months SPI we can identify location which are affected by drought for past 2 years. SPI
roughly ranges from -3 to +3 and each range of values have meaning.
Table 2: SPI Values
2.0+ Extremely wet
1.5 to 1.99 Very wet
1.0 to 1.49 Moderately wet
-.99 to .99 Near normal
-1.0 to -1.49 Moderately dry
-1.5 to -1.99 Severely dry
-2 and less Extremely dry
(M. Svoboda, 2012) in the user guide define the meaning of SPI values above.
11. Bias Correction CMCC:
CMCC = 𝑃 ∗ (𝑑) = 𝑃(𝑑) .
µ 𝑚(𝑃 𝑜𝑏𝑠(𝑑))
µ 𝑚(𝑃𝑟𝑒𝑚(𝑑))
Where:
- µ 𝑚(𝑃𝑜𝑏𝑠(𝑑)) is the month mean of the station data
- µ 𝑚(𝑃𝑟𝑒𝑚(𝑑)) is the month mean of remote sensing data
- 𝑃(𝑑) is the daily remote sensing data
- 𝑃 ∗ (𝑑) is the bias corrected remote sensing data
Page | 21
12. Bias Correction MRC
𝑆𝑅𝐸𝑒 = (𝑆𝑅𝐸 𝑜 − µ 𝑆𝑅𝐸) . Ƭ 𝑓(µ 𝑆𝑅𝐸 . µ 𝑓)
µ 𝑓 = µ 𝑂𝐵𝑆/ µ 𝑆𝑅𝐸
Ƭ 𝑓 = Ƭ 𝑂𝐵𝑆/ Ƭ 𝑆𝑅𝐸
Where:
- 𝑆𝑅𝐸𝑜 is the station data
- µ 𝑆𝑅𝐸 is month mean of remote sensing data
- µ 𝑂𝐵𝑆 is month mean of station data
- Ƭ 𝑂𝐵𝑆 is month standard deviation of station data
- Ƭ 𝑆𝑅𝐸 is month standard deviation of remote sensing data
- 𝑆𝑅𝐸 𝑜 daily SRE data
- 𝑆𝑅𝐸𝑒 is the bias corrected data
13. Bias Correction Rule Based (Modified CMCC):
This method was developed at IWMI and its behavior is related to the CMCC
methodology. After testing the rainfall dataset in excel we created the rules. By
studying the behavior of the equation depending on the mean of data.
- Rule 1: If station data records precipitation and satellite does not, copy
IMD data into bias corrected satellite data.
- Rule 2: If the difference between station and satellite data is two
millimeter take the mean of these pixels.
- Rule 3: If station mean is greater than the satellite mean where satellite
daily data is greater than station daily data then the equation of bias
correction is:
𝑃 ∗ (𝑑) = 𝑃(𝑑) .
µ 𝑚(𝑃𝑟𝑒𝑚(𝑑))
µ 𝑚(𝑃𝑜𝑏𝑠(𝑑))
Where satellite daily is less than station daily data.
𝑃 ∗ (𝑑) = 𝑃(𝑑) .
µ 𝑚(𝑃𝑜𝑏𝑠(𝑑))
µ 𝑚(𝑃𝑟𝑒𝑚(𝑑))
Page | 22
- Rule 4: If station mean is less than the satellite mean where satellite daily
data is less than station daily data then the equation of bias correction is:
𝑃 ∗ (𝑑) = 𝑃(𝑑) .
µ 𝑚(𝑃𝑟𝑒𝑚(𝑑))
µ 𝑚(𝑃𝑜𝑏𝑠(𝑑))
Where satellite daily is greater than station daily data.
𝑃 ∗ (𝑑) = 𝑃(𝑑) .
µ 𝑚(𝑃𝑜𝑏𝑠(𝑑))
µ 𝑚(𝑃𝑟𝑒𝑚(𝑑))
Where:
- µ 𝑚(𝑃𝑜𝑏𝑠(𝑑)) is the month mean of the station data
- µ 𝑚(𝑃𝑟𝑒𝑚(𝑑)) is the month mean of remote sensing data
- 𝑃(𝑑) is the daily remote sensing data
- 𝑃 ∗ (𝑑) is the bias corrected remote sensing data
14. Bias Correction IIT Gandhinagar:
(Shah & Mishra, 2014) Bias Corrected TRMM data with respect to IMD data. In
the paper they mention that TRMM always underestimates the amount of
precipitation compared to the station data provided by IMD. These difference was
noticeably seen in the monsoon season when the values were extreme.
They created two scale factors to bias correct the data one for the extreme events
and the other for the non-monsoon months.
First they took precipitation values over ninetieth percentile of IMD data and
created the first scale factor for extreme events.
The scale factor is just the ratio of the IMD and TRMM mean corresponding to
pixels over ninetieth percentile of IMD.
This scale factor is then multiplied to raw TRMM data for respective months
Page | 23
Second for pixels below ninetieth percentile of IMD the ratio is taken of the IMD
and TRMM mean corresponding to pixels under ninetieth percentile of IMD. This
second factor is applied on raw TRMM data. Percentile is taken only for the
monsoon months because other months receive comparatively less rainfall.
15. IDSI (Integrated Drought Severity Index)
a. IDSI is index developed at IWMI for monitoring of drought in South Asia
consisting of countries named India, Pakistan, Sri Lanka, Nepal, Afghanistan and
Bangladesh. It uses VCI, TCI, Rainfall anomaly from GPM dataset to classify a
pixel’s drought severity.
A map of IDSI looks like following.
Figure 18: IDSI 20-27 Jul 2002
Page | 24
Study Area
Introduction
Since this methodology is on experimental mode and the data set is of huge varsity and variety the
processing time with machine learning of the dataset is completely dependent on how big is the
data.
To get faster results, the study area has been kept small, the area was decided on the basis of large
spatial variability in Maharashtra so that the interaction of the diverse indices from multiple
satellites could be studied.
Area of concern
The biggest are of concern is farmer suicides in Maharashtra, the major reason behind this issue
lack of management from the government. Remote sensing is one platform that can be adopted by
the government to release funds to the needy and stop this crisis.
Lack of management is due to not taking right decisions and right time, it is happening because
conventional methodologies are still used to determine drought. The reason why remote sensing
is the only method to get fast and mostly accurate results is because in this methodology we do not
need to wait for a verdict from a village surveyor to declare the village as drought, from satellite
imagery we can immediately process the data and provide is crucial information to the decision
makers to take the right decisions
With help of python automation we can automate entire processing without human interference
and get results in minutes
Page | 25
Figure 19: Area of Interest
Page | 26
Methodology
Conversion tools:
1. IMD GRD data to ASCII: IMD by default distributes the in-situ precipitation data in GRD
format, a separate C program is distribute to convert this GRD file into ASCII to simplify
processing for the user inbuilt python script is made so that performs the same task as the
C program
2. IMD ASCII to Raster: The ASCII file is then converted to girded raster data so that it can
be compared with satellite rainfall estimates
3. Excel to Raster: Apart from IMD, Bangladesh precipitation data that I found was in the
form of excel sheet containing in-situ recorded precipitation data. This tool will enable the
user to generate girded raster map from the point station data with IDW interpolation
technique
Shift in the IMD and Remote sensing data:
1. After Converting IMD raw data into raster maps we observed that there is a shift of 0.125
degrees between IMD raster and remote sensing data raster
2. Compared with PERSIANN and TRMM this shift was observed significantly
3. Since the final product is going to be based on 30 meters spatial resolution the shift of
0.125 degrees will cause major problem
4. Hence a tool was created to remove this shift
Methodology
1. Resample IMD data to 0.125 degree from 0.25 degrees
2. Then we take the zonal mean of these 0.125 degrees with a fishnet which is created based
on the remote sensing data, where the grid size of the fishnet is 0.25 matching the extent
of remote sensing data
3. The extra reaming 0.125 degree cells left are then clipped
4. Again resample is performed to convert the 0.125 degree IMD data tot 0.25 remote sensing
products like PERSIANN and TRMM
Bias Correction
Page | 27
Tool box dedicated to bias correction is created with all the methods listed below.
1. Bias Correction CMCC interval
2. Bias Correction CMCC monthly
3. Bias Correction Rule Based
4. Bias Correction MRC
5. Bias Correction IIT Gandhinagar
Best Corrected results was taken for SPI calculation
1. SPI calculation
a. Automated data sorting
Figure 20: UI of Monthly Sum for SPI
Description of this tool.
Monthly Sum For SPI
Title Monthly Sum For SPI
Page | 28
Summary
This tool calculates the monthly sum from the daily rainfall data and saves the long term files in
its repective month folder
For calculation of SPI with the SPI tool in this toolbox some data preparation is needed, this tool
will enable you process data with help of automation, hence you willl not need to manually sort
and then send it to SPI tool.
Usage
There is no usage for this tool.
Syntax
MonthlySumForSPI (Daily_gridded_data_Folder, Extension, Output_Folder)
Parameter Explanation Data Type
Daily_gridded_data_Folder Dialog Reference
This folder should contain subfolders with
year name containing 365 daily rainfall data
Example: Folder name = TRMM_daily
Sub-folder name = 2001, 2002, 2003... 2015
There is no python reference for this
parameter.
Folder
Extension Dialog Reference String
Page | 29
Select the format of precipitation dataset the
you are giving as input parameter for this tool
There is no python reference for this
parameter.
Output_Folder Dialog Reference
Folder where all the monthly folders will be
created containing the output files
There is no python reference for this
parameter.
Folder
Page | 30
b. 1 to 12 tool UI
Figure 21:UI of SPI Calculation from 1 month to 12 months
Description of this tool:
SPI 1 to 12
Title SPI 1 to 12
Summary
This tool calculates the SPI using monthly rainfall data
It is designed to calculate Standardiezed Precipitaion Index (SPI), with minimal human
interaction.
It computes α,β,Г(α) and cumulative probablity density function within the tool and gives the
final output as 1 to 12 all months SPI
Page | 31
NOTE: The SPI output from the tool has been validated with the World Meteorological
Organisation (WMO) software for SPI. Correlation of 0.99 was achived between this tool and
WMO software.
Usage
There is no usage for this tool.
Syntax
SPI1to12 (Input_Folder, Extension, Daily_gridded_data_Folder, Output_Folder)
Parameter Explanation Data Type
Input_Folder Dialog Reference
Folder containing all the monthly sub-folders
with all the years monthly rainfall files as
computed by the "Monthly Sum For SPI"
tool
There is no python reference for this
parameter.
Folder
Extension Dialog Reference
Select the format of precipitation dataset the
you are giving as input parameter for this tool
There is no python reference for this
parameter.
String
Daily_gridded_data_Folder Dialog Reference Folder
Page | 32
This folder should contain subfolders with
year name containing 365 daily rainfall data
Example: Folder name = TRMM_daily
Sub-folder name = 2001, 2002, 2003... 2015
There is no python reference for this
parameter.
Output_Folder Dialog Reference
Folder where all the monthly folders will be
created containing the output files
There is no python reference for this
parameter.
Folder
Page | 33
c. 12 to 60 tool UI
Figure 22: UI of SPI Calculation from 13 months to 60 months
Description of this tool:
SPI 13 to 60
Title SPI 13 to 60
Summary
This tool calculates the SPI using monthly rainfall data
It is designed to calculate Standardiezed Precipitaion Index (SPI), with minimal human
interaction.
It computes α,β,Г(α) and cumulative probablity density function within the tool and gives the
final output as 1 to 12 all months SPI
Page | 34
NOTE: The SPI output from the tool has been validated with the World Meteorological
Organisation (WMO) software for SPI. Correlation of 0.99 was achived between this tool and
WMO software.
Usage
There is no usage for this tool.
Syntax
SPI13to60 (Input_Folder, Extension, Daily_gridded_data_Folder, Output_Folder)
Parameter Explanation Data Type
Input_Folder Dialog Reference
Folder containing all the monthly sub-folders
with all the years monthly rainfall files as
computed by the "Monthly Sum For SPI"
tool
There is no python reference for this
parameter.
Folder
Extension Dialog Reference
Select the format of precipitation dataset the
you are giving as input parameter for this tool
There is no python reference for this
parameter.
String
Daily_gridded_data_Folder Dialog Reference Folder
Page | 35
This folder should contain subfolders with
year name containing 365 daily rainfall data
Example: Folder name = TRMM_daily
Sub-folder name = 2001, 2002, 2003... 2015
There is no python reference for this
parameter.
Output_Folder Dialog Reference
Folder where all the monthly folders will be
created containing the output files
There is no python reference for this
parameter.
Folder
Page | 36
d. Unpack
Figure 23: UI of unpacking all the calculated SPI to daily raster images
Description of this tool:
This tool helps to user to unpack all the stacked SPI raster, the output from this tools will contain
single layer containing monthly SPI.
e. Validation with WMO software
i. WMO has developed a command line program to calculate the SPI, output
from both WMO program and python SPI tool developed at IWMI had a
correlation of 0.99
Table 2: Comparison between SPI calculated by WMO and IWMI made python tool
Python WMO Correlation
-99 -99 0.99999
1.28 1.25
-1.4 -1.392
-0.34 -0.342
-0.26 -0.265
Page | 37
Python WMO Correlation
-0.88 -0.877
1.47 1.441
0.17 0.157
2.43 2.388
-0.27 -0.27
-0.66 -0.662
-0.43 -0.431
-0.18 -0.179
0.96 0.936
-0.62 -0.616
-1.91 -1.894
0.49 0.468
0.06 0.041
-0.73 -0.725
-0.15 -0.161
-1.65 -1.641
0.73 0.705
-0.15 -0.155
0.45 0.437
1.42 1.384
0.28 0.262
0.02 0.013
-1.2 -1.196
1.51 1.486
-1.46 -1.443
0.17 0.149
1.15 1.127
Page | 38
f. Validation with month mean of IMD rainfall data
i. IMD data of 1901 to 2013 was taken to calculate one month SPI, Y axis on
the left side has data of mean monthly rainfall of Maharashtra. Y axis on
right side contains corresponding data’s SPI. The correlation between both
the dataset was achieved over 0.94 and the patterns also match the same
Figure 24: Comparison of Mean Rainfall with Mean SPI
2. VCI
(Kogan & J. Sullivan, 1993) Defined vegetation index which takes the maximum and
minimum NDVI values in the time series and then calculates the index
𝑉𝐶𝐼 =
(𝑁𝐷𝑉𝐼 − 𝑁𝐷𝑉𝐼𝑚𝑖𝑛) × 100
𝑁𝐷𝑉𝐼 𝑚𝑎𝑥 − 𝑁𝐷𝑉𝐼𝑚𝑖𝑛
Where:
NDVI, NDVImax and NDVImin are values of smoothed weekly NDVI and the
multiple year NDVI maximum and minimum, respectively.
3. TCI
(Liu, W.T., & F.N. Kogan, 1996) Similar to VCI the maximum and minimum is taken over
the long time period
-2.50000000000
-2.00000000000
-1.50000000000
-1.00000000000
-0.50000000000
0.00000000000
0.50000000000
1.00000000000
1.50000000000
0.00000000000
100.00000000000
200.00000000000
300.00000000000
400.00000000000
500.00000000000
600.00000000000
1
6
11
16
21
26
31
36
41
46
51
56
61
66
71
76
81
86
91
96
101
106
111
Page | 39
𝑇𝐶𝐼 =
100(𝐵𝑇 𝑚𝑎𝑥 − 𝐵𝑇)
𝐵𝑇 𝑚𝑎𝑥 − 𝐵𝑇𝑚𝑖𝑛
Where:
BT, BTmax, and BTmin are smoothed weekly and multiple-year maximum
and minimum thermal brightness temperatures, respectively
4. NDVI
NDVI was calculated by using the dataset from Landsat 8 OLI
𝑁𝐷𝑉𝐼 =
𝐵𝑎𝑛𝑑 5 + 𝐵𝑎𝑛𝑑 4
𝐵𝑎𝑛𝑑 5 − 𝐵𝑎𝑛𝑑 4
Where:
Band 5 is near infrared having wavelength of (0.85-0.88) micro meters and
Band 4 is red, wavelength ranging from (0.64 - 0.67)
5. SPI, VCI, TCI and NDVI these indices will be used as parameters in both of the machine
learning approach
Page | 40
Results
1. Bias correction comparison
a. Two satellite precipitation estimators were take first TRMM and other the
PERSIANN both have the resolution of 0.25 degrees, all the types of bias correction
discussed in the literature review were implemented and a correlation graph
between station original and satellite estimates before bias correction and after bias
correction . After taking the annual average of the correlation between bias
corrected data and original data to understand which bias correction is consistently
giving good results on most of the days TRMM and PERSIANN had an average
correlation of 0.6
b. Since the data of PERSIANN was available from 1983 to 2015 which is thirty three
years of daily rainfall estimates data it will be better choice for calculation of SPI.
After comparison of all the results from bias correction Rule Based bias correction
proved to be the best with the average correlation of 0.7 to 0.8 so the final bias
correction method used is Rule Based with PERSIANN as the satellite rainfall
estimates and IMD as the in-situ observed data. The results are shown in the
following graphs
c. The day of the data compared in all the months listed below in graphs i:e April,
May, July and August are dated on the 15th
of April, May July and August.
Page | 41
Figure 25: 2007 TRMM data compared with all types of Bias Correction methods
Page | 42
Figure 26: 1998 TRMM data compared with all types of Bias Correction methods
Page | 43
Figure 27: 2007 PERSIANN data compared with all types of Bias Correction methods
Page | 44
Figure 28: 1998 PERSIANN data compared with all types of Bias Correction methods
Page | 45
Figure 29: 1998 PERSIANN data compared with all types of Bias Correction methods
Page | 46
Figure 30: 2007 PERSIANN data compared with all types of Bias Correction methods
Page | 47
d. DT
i. The input parameter for classification of the dataset were VCI, TCI, SPI and
NDVI.
Page | 48
e. Random Forest with different estimators and their results
i. Estimator 10
Page | 49
ii. Estimator 25
Page | 50
iii. Estimator 50
Page | 51
iv. Estimator 80
Page | 52
f. IDSI vs Random Forest, Decision Tree and SPI vs Random Forest and Decision
Tree
Table 3: Comparison between Random forest with 10, 80, 25, 50 estimators and decision tree with IDSI and SPI
RF_10 RF_80 RF_25 RF_50 DT
VS
IDSI
0.49 0.42 0.48 0.48 0.43
VS SPI 0.18 0.78 0.31 0.14 0.99
Where RF_10, RF_25, RF_50, RF_80 and DT are random forest with 10
estimators, random forest with 25 estimators, random forest with 50 estimators,
random forest with 80 estimators and decision tree respectively
Page | 53
2. IWMI tool box
Figure 31: IWMI tools in ArcCatelog 10.3
3.
Page | 54
a. Remaining tools description
Tool Short description
Export Maps Exports Map in all possible formats
supported by ArcMap
Frequency computer Computes the frequency of drought pixel
to find the number of drought occurrence
Clip Batch Clips all the files in the given folder
Define Projection Batch Defines projection to all the files in the
given folder
Interval Mean Finds days interval mean of the given
annual dataset in a folder
Months Statistics Calculates monthly statistics of daily
gridded data
Raster Scatter diagram Automated generation of scatter plot of
two given raster dataset
Resample Batch Resample all the files in the given folder
Set no data to Value File Sets a value to no data elements in the
raster file
Set no data to Value Folder Sets a value to no data elements in the
raster folder
Stack Sum Generates the sum of all the layers in the
stack raster dataset
Zonal Batch Raster Computes zonal raster on all the files in
the given folder
Zonal Batch Table Computes zonal table on all the files in
the given folder
Page | 55
Discussion
Using python in automation has substantially reduced manual intervention of human
decision and reduced the processing time of data set up to 30 percent. After going through
multiple papers on bias correction there were many assumptions taken. The very fact of in-
situ data recorded daily and satellite estimates are registered on different time period of the
day causes the daily bias correction to be made nearly impossible, more over the in-situ
data collected is also heavily biased due to human error and this fact also causes a lot noise
which has to be corrected but cannot be done. The only way of bias correcting daily real
time rainfall data is to use (Internet Of Things) IOT, multiple in-sity sensors can be
stationed around at particular intervals and while the satellites remote sense’s the data with
help of IOT both the satellite and the ground in-situ can communicate at the same moment
and bias correct the data there itself .
Due to long processing hours of dataset of the size of one district, this becomes the biggest
limitation of implementing this machine learning approach in real life, also due to the use
of Landsat 8 dataset we can see reasonably fine resolution of water stressed areas.
Machine learning needs greater understanding of how operating systems function
manipulating the threads for multiprocessing can cause the results to be generated ten times
faster.
Page | 56
Conclusion
Random forest proved to the better than decision tree in terms of classification of the pixels,
random forest could easily use the high resolution of Landsat 8 dataset to get finer
resolution of water stressed pixels. Four parameter is just the beginning multiple parameter
will be generated for future classification of potential drought pixel. Consideration of rain
fed was not taken into consideration due to lack of data availability, if the similar data is
made available the quality of classification can be increased substantially. More parameter
with high correlation with each other results in better result in random forest.
For future do deal with the slow processing of huge amount of data dask module of python
will be used to generate faster result with low (Random Access Memory) RAM, dask
enables the user to use block algorithms which use less memory and more processing
power of the computer. Instead of using single threat for calculation of entire raster dask
breaks down the big raster dataset into user defined intervals and then processes all these
divided raster dataset parallely.
After having experience in automated generation of maps development of end to end
application will be aimed to be built. In which the algorithm will directly talk to the
satellites and generate the map and statistical data without any human interference
SPI was calculated till 60 months to find location which are under drought over five years
due to time constrains it could not be implemented on the random forest as well as the
decision tree.
For further development collaboration will be done with one of my class mates Miss
Marcia Chen as she has explored neural network for analyzing the difference and efficiency
between random forest and neural network
In my six months of internship I have created nearly 70 tools out of which these 30
mentioned in this project has been approved by IWMI, in some weeks all of the tools will
be published with IWMI branding
Page | 57
References
1. GRASS. (n.d.). https://grass.osgeo.org. Retrieved from Documentation:
https://grass.osgeo.org/documentation
2. H. K., & C. Y. (2008). A Machine Learning Approach for Knowledge Base Construction.
Journal of the Korean Geographical Society, 761-774.
3. Kogan, & J. Sullivan. (1993). Development of global drought-watch system using NOAA
/ AVHRR data. Advance in Space Research, 219-222.
4. Liu, W.T., & F.N. Kogan. (1996). Monitoring regional deought using the Vegetation
Condition Index. International Journal of Remote Sensing, 2761-2782.
5. M. Svoboda, M. H. (2012). Standardized Precipitation Index User.
6. scikit-learn. (n.d.). 1.10. Decision Trees. Retrieved from http://scikit-learn.org:
http://scikit-learn.org/stable/modules/tree.html
7. scikit-learn. (n.d.). 3.2.4.3.1. sklearn.ensemble.RandomForestClassifier. Retrieved from
http://scikit-learn.org: http://scikit-
learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
8. Shah, R. D., & Mishra, V. (2014). Development of an Experimental Neari-Real-Time
Drought Monitor for India. Journal of Hydrometeorology, 327-345.
9. WMO. (2009, 12 15). Press Release No. 872. Retrieved from www.wmo.int:
https://www.wmo.int/pages/mediacentre/press_releases/pr_872_en.html

More Related Content

What's hot

IRJET- Large & Complex Data Streams using Big Data
IRJET- Large & Complex Data Streams using Big DataIRJET- Large & Complex Data Streams using Big Data
IRJET- Large & Complex Data Streams using Big DataIRJET Journal
 
Automated Summarisation of Big Data, useR! 2018
Automated Summarisation of Big Data, useR! 2018  Automated Summarisation of Big Data, useR! 2018
Automated Summarisation of Big Data, useR! 2018 Amy Stringer
 
Analysis of crop yield prediction using data mining techniques
Analysis of crop yield prediction using data mining techniquesAnalysis of crop yield prediction using data mining techniques
Analysis of crop yield prediction using data mining techniqueseSAT Journals
 
EMPLOYING MULTI CORE ARCHITECTURE TO OPTIMIZE ON PERFORMANCE, FOR APPROACH IN...
EMPLOYING MULTI CORE ARCHITECTURE TO OPTIMIZE ON PERFORMANCE, FOR APPROACH IN...EMPLOYING MULTI CORE ARCHITECTURE TO OPTIMIZE ON PERFORMANCE, FOR APPROACH IN...
EMPLOYING MULTI CORE ARCHITECTURE TO OPTIMIZE ON PERFORMANCE, FOR APPROACH IN...IAEME Publication
 
IRJET- Agricultural Crop Yield Prediction using Deep Learning Approach
IRJET-  	  Agricultural Crop Yield Prediction using Deep Learning ApproachIRJET-  	  Agricultural Crop Yield Prediction using Deep Learning Approach
IRJET- Agricultural Crop Yield Prediction using Deep Learning ApproachIRJET Journal
 
IRJET- Rainfall Prediction by using Time-Series Data in Analysis of Artif...
IRJET-  	  Rainfall Prediction by using Time-Series Data in Analysis of Artif...IRJET-  	  Rainfall Prediction by using Time-Series Data in Analysis of Artif...
IRJET- Rainfall Prediction by using Time-Series Data in Analysis of Artif...IRJET Journal
 
Time Series Least Square Forecasting Analysis and Evaluation for Natural Gas ...
Time Series Least Square Forecasting Analysis and Evaluation for Natural Gas ...Time Series Least Square Forecasting Analysis and Evaluation for Natural Gas ...
Time Series Least Square Forecasting Analysis and Evaluation for Natural Gas ...rahulmonikasharma
 
Rainfall Forecasting : A Regression Case Study
Rainfall Forecasting : A Regression Case StudyRainfall Forecasting : A Regression Case Study
Rainfall Forecasting : A Regression Case StudyIRJET Journal
 
Spatial station
Spatial stationSpatial station
Spatial stationAtiqa khan
 
Energy Wasting Rate as a Metrics for Green Computing and Static Analysis
Energy Wasting Rate as a Metrics for Green Computing and Static AnalysisEnergy Wasting Rate as a Metrics for Green Computing and Static Analysis
Energy Wasting Rate as a Metrics for Green Computing and Static AnalysisJérôme Rocheteau
 
Artificial Neural Network based computing model for wind speed prediction: A ...
Artificial Neural Network based computing model for wind speed prediction: A ...Artificial Neural Network based computing model for wind speed prediction: A ...
Artificial Neural Network based computing model for wind speed prediction: A ...Kaja Bantha Navas Raja Mohamed
 

What's hot (17)

Application of web ontology to harvest estimation of rice in Thailand
Application of web ontology to harvest estimation of rice in ThailandApplication of web ontology to harvest estimation of rice in Thailand
Application of web ontology to harvest estimation of rice in Thailand
 
Application of web ontology to harvest estimation of rice in thailand
Application of web ontology to harvest estimation of rice in thailandApplication of web ontology to harvest estimation of rice in thailand
Application of web ontology to harvest estimation of rice in thailand
 
Data Dimensional Reduction by Order Prediction in Heterogeneous Environment
Data Dimensional Reduction by Order Prediction in Heterogeneous EnvironmentData Dimensional Reduction by Order Prediction in Heterogeneous Environment
Data Dimensional Reduction by Order Prediction in Heterogeneous Environment
 
IRJET- Large & Complex Data Streams using Big Data
IRJET- Large & Complex Data Streams using Big DataIRJET- Large & Complex Data Streams using Big Data
IRJET- Large & Complex Data Streams using Big Data
 
Automated Summarisation of Big Data, useR! 2018
Automated Summarisation of Big Data, useR! 2018  Automated Summarisation of Big Data, useR! 2018
Automated Summarisation of Big Data, useR! 2018
 
Analysis of crop yield prediction using data mining techniques
Analysis of crop yield prediction using data mining techniquesAnalysis of crop yield prediction using data mining techniques
Analysis of crop yield prediction using data mining techniques
 
Noura2
Noura2Noura2
Noura2
 
EMPLOYING MULTI CORE ARCHITECTURE TO OPTIMIZE ON PERFORMANCE, FOR APPROACH IN...
EMPLOYING MULTI CORE ARCHITECTURE TO OPTIMIZE ON PERFORMANCE, FOR APPROACH IN...EMPLOYING MULTI CORE ARCHITECTURE TO OPTIMIZE ON PERFORMANCE, FOR APPROACH IN...
EMPLOYING MULTI CORE ARCHITECTURE TO OPTIMIZE ON PERFORMANCE, FOR APPROACH IN...
 
IRJET- Agricultural Crop Yield Prediction using Deep Learning Approach
IRJET-  	  Agricultural Crop Yield Prediction using Deep Learning ApproachIRJET-  	  Agricultural Crop Yield Prediction using Deep Learning Approach
IRJET- Agricultural Crop Yield Prediction using Deep Learning Approach
 
IRJET- Rainfall Prediction by using Time-Series Data in Analysis of Artif...
IRJET-  	  Rainfall Prediction by using Time-Series Data in Analysis of Artif...IRJET-  	  Rainfall Prediction by using Time-Series Data in Analysis of Artif...
IRJET- Rainfall Prediction by using Time-Series Data in Analysis of Artif...
 
Time Series Least Square Forecasting Analysis and Evaluation for Natural Gas ...
Time Series Least Square Forecasting Analysis and Evaluation for Natural Gas ...Time Series Least Square Forecasting Analysis and Evaluation for Natural Gas ...
Time Series Least Square Forecasting Analysis and Evaluation for Natural Gas ...
 
Rainfall Forecasting : A Regression Case Study
Rainfall Forecasting : A Regression Case StudyRainfall Forecasting : A Regression Case Study
Rainfall Forecasting : A Regression Case Study
 
CLIM Program: Remote Sensing Workshop, High Performance Computing and Spatial...
CLIM Program: Remote Sensing Workshop, High Performance Computing and Spatial...CLIM Program: Remote Sensing Workshop, High Performance Computing and Spatial...
CLIM Program: Remote Sensing Workshop, High Performance Computing and Spatial...
 
Spatial station
Spatial stationSpatial station
Spatial station
 
Energy Wasting Rate as a Metrics for Green Computing and Static Analysis
Energy Wasting Rate as a Metrics for Green Computing and Static AnalysisEnergy Wasting Rate as a Metrics for Green Computing and Static Analysis
Energy Wasting Rate as a Metrics for Green Computing and Static Analysis
 
Artificial Neural Network based computing model for wind speed prediction: A ...
Artificial Neural Network based computing model for wind speed prediction: A ...Artificial Neural Network based computing model for wind speed prediction: A ...
Artificial Neural Network based computing model for wind speed prediction: A ...
 
ODVSML_Presentation
ODVSML_PresentationODVSML_Presentation
ODVSML_Presentation
 

Similar to Here are a few key points about ForecastWatch.com's architecture using Python:- They collect over 36,000 weather forecasts daily from various providers for 800+ US cities.- They use Python to parse/extract forecast data from provider websites and actual weather data from NOAA.- The forecast and actual weather data is inserted into a database for later comparison and scoring. - A Python data aggregation engine combines the scores into monthly/yearly blocks by provider, location, forecast lead time.- The scored data is made available through a web application, likely built with a Python framework like Django.- This allows meteorologists and consumers to evaluate and compare forecast accuracy over time.- Python's versatility enabled

Flood and rainfall predction final
Flood and rainfall predction finalFlood and rainfall predction final
Flood and rainfall predction finalCity University
 
IRJET- Crop Yield Prediction and Disease Detection using IoT Approach
IRJET- Crop Yield Prediction and Disease Detection using IoT ApproachIRJET- Crop Yield Prediction and Disease Detection using IoT Approach
IRJET- Crop Yield Prediction and Disease Detection using IoT ApproachIRJET Journal
 
Comparative Study of Machine Learning Algorithms for Rainfall Prediction
Comparative Study of Machine Learning Algorithms for Rainfall PredictionComparative Study of Machine Learning Algorithms for Rainfall Prediction
Comparative Study of Machine Learning Algorithms for Rainfall Predictionijtsrd
 
FLOOD FORECASTING USING MACHINE LEARNING ALGORITHM
FLOOD FORECASTING USING MACHINE LEARNING ALGORITHMFLOOD FORECASTING USING MACHINE LEARNING ALGORITHM
FLOOD FORECASTING USING MACHINE LEARNING ALGORITHMIRJET Journal
 
IRJET - Intelligent Weather Forecasting using Machine Learning Techniques
IRJET -  	  Intelligent Weather Forecasting using Machine Learning TechniquesIRJET -  	  Intelligent Weather Forecasting using Machine Learning Techniques
IRJET - Intelligent Weather Forecasting using Machine Learning TechniquesIRJET Journal
 
BJLP45-paper1_+Linear+Vs+Logistic.pdf
BJLP45-paper1_+Linear+Vs+Logistic.pdfBJLP45-paper1_+Linear+Vs+Logistic.pdf
BJLP45-paper1_+Linear+Vs+Logistic.pdfssuser8e260b1
 
Big Data to avoid weather related flight delays
Big Data to avoid weather related flight delaysBig Data to avoid weather related flight delays
Big Data to avoid weather related flight delaysAkshatGiri3
 
Integrated Water Resources Management Using Rainfall Forecasting With Artific...
Integrated Water Resources Management Using Rainfall Forecasting With Artific...Integrated Water Resources Management Using Rainfall Forecasting With Artific...
Integrated Water Resources Management Using Rainfall Forecasting With Artific...IRJET Journal
 
IRJET- Weather Prediction for Tourism Application using ARIMA
IRJET- Weather Prediction for Tourism Application using ARIMAIRJET- Weather Prediction for Tourism Application using ARIMA
IRJET- Weather Prediction for Tourism Application using ARIMAIRJET Journal
 
IRJET- Projecting Climate Impacts on Transportation by Diagnosing and Exa...
IRJET-  	  Projecting Climate Impacts on Transportation by Diagnosing and Exa...IRJET-  	  Projecting Climate Impacts on Transportation by Diagnosing and Exa...
IRJET- Projecting Climate Impacts on Transportation by Diagnosing and Exa...IRJET Journal
 
A Comprehensive review of Conversational Agent and its prediction algorithm
A Comprehensive review of Conversational Agent and its prediction algorithmA Comprehensive review of Conversational Agent and its prediction algorithm
A Comprehensive review of Conversational Agent and its prediction algorithmvivatechijri
 
23 2 may17 28apr 16137 (6575 new)(edit)
23 2 may17 28apr 16137 (6575 new)(edit)23 2 may17 28apr 16137 (6575 new)(edit)
23 2 may17 28apr 16137 (6575 new)(edit)IAESIJEECS
 
CLOUD BURST FORECAST USING EXPERT SYSTEMS
CLOUD BURST FORECAST USING EXPERT SYSTEMSCLOUD BURST FORECAST USING EXPERT SYSTEMS
CLOUD BURST FORECAST USING EXPERT SYSTEMSIRJET Journal
 
IRJET- IoT Based Crop Growth Detection and Irrigation System using Raspberry PI
IRJET- IoT Based Crop Growth Detection and Irrigation System using Raspberry PIIRJET- IoT Based Crop Growth Detection and Irrigation System using Raspberry PI
IRJET- IoT Based Crop Growth Detection and Irrigation System using Raspberry PIIRJET Journal
 
IRJET- IoT Based Crop Growth Detection and Irrigation System using Raspbe...
IRJET-  	  IoT Based Crop Growth Detection and Irrigation System using Raspbe...IRJET-  	  IoT Based Crop Growth Detection and Irrigation System using Raspbe...
IRJET- IoT Based Crop Growth Detection and Irrigation System using Raspbe...IRJET Journal
 
Intelligent flood disaster warning on the fly: developing IoT-based managemen...
Intelligent flood disaster warning on the fly: developing IoT-based managemen...Intelligent flood disaster warning on the fly: developing IoT-based managemen...
Intelligent flood disaster warning on the fly: developing IoT-based managemen...journalBEEI
 
Crop Prediction using IoT & Machine Learning Algorithm
Crop Prediction using IoT & Machine Learning AlgorithmCrop Prediction using IoT & Machine Learning Algorithm
Crop Prediction using IoT & Machine Learning AlgorithmIRJET Journal
 
Smart Irrigation System using Machine Learning and IoT
Smart Irrigation System using Machine Learning and IoTSmart Irrigation System using Machine Learning and IoT
Smart Irrigation System using Machine Learning and IoTIRJET Journal
 

Similar to Here are a few key points about ForecastWatch.com's architecture using Python:- They collect over 36,000 weather forecasts daily from various providers for 800+ US cities.- They use Python to parse/extract forecast data from provider websites and actual weather data from NOAA.- The forecast and actual weather data is inserted into a database for later comparison and scoring. - A Python data aggregation engine combines the scores into monthly/yearly blocks by provider, location, forecast lead time.- The scored data is made available through a web application, likely built with a Python framework like Django.- This allows meteorologists and consumers to evaluate and compare forecast accuracy over time.- Python's versatility enabled (20)

Flood and rainfall predction final
Flood and rainfall predction finalFlood and rainfall predction final
Flood and rainfall predction final
 
IRJET- Crop Yield Prediction and Disease Detection using IoT Approach
IRJET- Crop Yield Prediction and Disease Detection using IoT ApproachIRJET- Crop Yield Prediction and Disease Detection using IoT Approach
IRJET- Crop Yield Prediction and Disease Detection using IoT Approach
 
Comparative Study of Machine Learning Algorithms for Rainfall Prediction
Comparative Study of Machine Learning Algorithms for Rainfall PredictionComparative Study of Machine Learning Algorithms for Rainfall Prediction
Comparative Study of Machine Learning Algorithms for Rainfall Prediction
 
FLOOD FORECASTING USING MACHINE LEARNING ALGORITHM
FLOOD FORECASTING USING MACHINE LEARNING ALGORITHMFLOOD FORECASTING USING MACHINE LEARNING ALGORITHM
FLOOD FORECASTING USING MACHINE LEARNING ALGORITHM
 
IRJET - Intelligent Weather Forecasting using Machine Learning Techniques
IRJET -  	  Intelligent Weather Forecasting using Machine Learning TechniquesIRJET -  	  Intelligent Weather Forecasting using Machine Learning Techniques
IRJET - Intelligent Weather Forecasting using Machine Learning Techniques
 
BJLP45-paper1_+Linear+Vs+Logistic.pdf
BJLP45-paper1_+Linear+Vs+Logistic.pdfBJLP45-paper1_+Linear+Vs+Logistic.pdf
BJLP45-paper1_+Linear+Vs+Logistic.pdf
 
Big Data to avoid weather related flight delays
Big Data to avoid weather related flight delaysBig Data to avoid weather related flight delays
Big Data to avoid weather related flight delays
 
Integrated Water Resources Management Using Rainfall Forecasting With Artific...
Integrated Water Resources Management Using Rainfall Forecasting With Artific...Integrated Water Resources Management Using Rainfall Forecasting With Artific...
Integrated Water Resources Management Using Rainfall Forecasting With Artific...
 
IRJET- Weather Prediction for Tourism Application using ARIMA
IRJET- Weather Prediction for Tourism Application using ARIMAIRJET- Weather Prediction for Tourism Application using ARIMA
IRJET- Weather Prediction for Tourism Application using ARIMA
 
IRJET- Projecting Climate Impacts on Transportation by Diagnosing and Exa...
IRJET-  	  Projecting Climate Impacts on Transportation by Diagnosing and Exa...IRJET-  	  Projecting Climate Impacts on Transportation by Diagnosing and Exa...
IRJET- Projecting Climate Impacts on Transportation by Diagnosing and Exa...
 
BIG DATA TO AVOID WEATHER RELATED FLIGHT DELAYS PPT
BIG DATA TO AVOID WEATHER RELATED FLIGHT DELAYS PPTBIG DATA TO AVOID WEATHER RELATED FLIGHT DELAYS PPT
BIG DATA TO AVOID WEATHER RELATED FLIGHT DELAYS PPT
 
A Comprehensive review of Conversational Agent and its prediction algorithm
A Comprehensive review of Conversational Agent and its prediction algorithmA Comprehensive review of Conversational Agent and its prediction algorithm
A Comprehensive review of Conversational Agent and its prediction algorithm
 
23 2 may17 28apr 16137 (6575 new)(edit)
23 2 may17 28apr 16137 (6575 new)(edit)23 2 may17 28apr 16137 (6575 new)(edit)
23 2 may17 28apr 16137 (6575 new)(edit)
 
CLOUD BURST FORECAST USING EXPERT SYSTEMS
CLOUD BURST FORECAST USING EXPERT SYSTEMSCLOUD BURST FORECAST USING EXPERT SYSTEMS
CLOUD BURST FORECAST USING EXPERT SYSTEMS
 
IRJET- IoT Based Crop Growth Detection and Irrigation System using Raspberry PI
IRJET- IoT Based Crop Growth Detection and Irrigation System using Raspberry PIIRJET- IoT Based Crop Growth Detection and Irrigation System using Raspberry PI
IRJET- IoT Based Crop Growth Detection and Irrigation System using Raspberry PI
 
IRJET- IoT Based Crop Growth Detection and Irrigation System using Raspbe...
IRJET-  	  IoT Based Crop Growth Detection and Irrigation System using Raspbe...IRJET-  	  IoT Based Crop Growth Detection and Irrigation System using Raspbe...
IRJET- IoT Based Crop Growth Detection and Irrigation System using Raspbe...
 
Big Data For Flight Delay Report
Big Data For Flight Delay ReportBig Data For Flight Delay Report
Big Data For Flight Delay Report
 
Intelligent flood disaster warning on the fly: developing IoT-based managemen...
Intelligent flood disaster warning on the fly: developing IoT-based managemen...Intelligent flood disaster warning on the fly: developing IoT-based managemen...
Intelligent flood disaster warning on the fly: developing IoT-based managemen...
 
Crop Prediction using IoT & Machine Learning Algorithm
Crop Prediction using IoT & Machine Learning AlgorithmCrop Prediction using IoT & Machine Learning Algorithm
Crop Prediction using IoT & Machine Learning Algorithm
 
Smart Irrigation System using Machine Learning and IoT
Smart Irrigation System using Machine Learning and IoTSmart Irrigation System using Machine Learning and IoT
Smart Irrigation System using Machine Learning and IoT
 

Here are a few key points about ForecastWatch.com's architecture using Python:- They collect over 36,000 weather forecasts daily from various providers for 800+ US cities.- They use Python to parse/extract forecast data from provider websites and actual weather data from NOAA.- The forecast and actual weather data is inserted into a database for later comparison and scoring. - A Python data aggregation engine combines the scores into monthly/yearly blocks by provider, location, forecast lead time.- The scored data is made available through a web application, likely built with a Python framework like Django.- This allows meteorologists and consumers to evaluate and compare forecast accuracy over time.- Python's versatility enabled

  • 1. Automated Drought Analysis with Python and Machine Learning THESIS SUBMITTED TO Symbiosis Institute of Geoinformatics FOR PARTIAL FULFILLMENT OF THE M. Sc. DEGREE By Gurminder Bharani (Batch 2014 - 16) Symbiosis Institute of Geoinformatics Symbiosis International University 5th Floor, Atur Centre, Gokhale Cross Road, Model Colony, Pune – 411016. CERTIFICATE
  • 2. Page | 2 Certified that this thesis titled ‘Automated Drought Analysis with Python and Machine Learning’ is a bonafide work done by Mr. Gurminder Bharani, at International Water Management Institute (IWMI), Sri Lanka and Symbiosis Institute of Geoinformatics, under our supervision. Supervisor External Dr. Giriraj Amarnath IWMI Supervisor Internal Dr. T. P. Singh Director, Symbiosis Institute of Geoinformatics
  • 3. Page | 3 Index I. Acknowledgement 4 II. List of Figure 5 III. List Table 6 IV. Abbreviation list 7 1. Preface 8 2. Introduction 10 3. Literature Review 12 4. Study Area (if there) 24 5. Methodology 25 6. Result 37 7. Discussion 56 8. Conclusion 57 9. References 58 10. Annexure 59
  • 4. Page | 4 Acknowledgement The last six months working on my project has been a very productive journey. Getting an opportunity to have a glimpse of what the research world looks and feels like could not have been possible had it not been for Dr Giriraj Amarnath, who hired me as intern in IWMI. I would like to extend my heartfelt gratitude to Mr Peejush Pani, who particularly helped in developing the tools by rendering me his remarkable and constant guidance through his remote sensing modelling expertise in the field. The experience in this esteemed organisation could be marked as leaving an indelible mark on my learning experience. It has been a great exposure and had served as a reality check through which I plan to better myslef and polish my learning skills in the days to come. Further, I would thank the faculty of Symbiosis Institute of Geoinformatics, Pune, namely Dr T. P. Singh, Dr Navendu Chowdhury and Col B. K. Pradhan, without whom my knowledge about GIS and its application in the various domains would not have been clear. I would like to thank my computer science teacher Mr Charudatta Ekbote without his teachings none of this would have been possible.
  • 5. Page | 5 List of Figures Figure 1: Simple implementation of Decision Tree Figure 2: Simple implementation of Random Forest Figure 3: Area of Interest Figure 4: UI of Monthly Sum for SPI Figure 5:UI of SPI Calculation from 1 month to 12 months Figure 6: UI of SPI Calculation from 13 months to 60 months Figure 7: UI of unpacking all the calculated SPI to daily raster images Figure 8: Comparison of Mean Rainfall with Mean SPI Figure 9: 2007 TRMM data compared with all types of Bias Correction methods Figure 10: 1998 TRMM data compared with all types of Bias Correction methods Figure 11: 2007 PERSIANN data compared with all types of Bias Correction methods Figure 12: 1998 PERSIANN data compared with all types of Bias Correction methods Figure 13: 1998 PERSIANN data compared with all types of Bias Correction methods Figure 14: 2007 PERSIANN data compared with all types of Bias Correction methods Figure 15: IWMI tools in ArcCatelog 10.3
  • 6. Page | 6 List of Tables Table 1: SPI Values Table 2: Comparison between SPI calculated by WMO and IWMI made python tool Table 3: Comparison between Random forest with 10, 80, 25, 50 estimators and decision tree with IDSI and SPI
  • 7. Page | 7 Abbreviation List MODIS Moderate Imaging Spectro Radiometer VCI Vegetation Condition Index TCI Temperature Condition Index SPI Standardized Precipitation Index IMD Indian Meteorological Department NDVI Normalized Difference Vegetation Index OLI Operational Landsat Imager DEM Digital Elevation Model WMO World Meteorological Organization IDSI Integrated Drought Severity Index
  • 8. Page | 8 Preface This project is about blending in multiple index from multiple satellites such as MODIS is known for its high temporal dataset hence two indices are derived from MODIS which are VCI (Vegetation Condition Index) and TCI (Temperature Condition Index). The SPI (Standardized Precipitation Index) is widely accepted as the prime indicator of meteorological drought and is derived from IMD (Indian Meteorological Department) precipitation data. Landsat data has been used specifically to get the fine resolution of 30 meter in the result of the final classified product. NDVI (Normalized Difference Vegetation Index) is the indicator used to identify pixels which will be eligible for drought. Benefits of blending the datasets There are several benefits that one may notice when blending datasets. Few are discussed below: Temporal resolution: High temporal resolution from MODIS dataset helps in understanding the long term behavior of the data. Also, by comparing the values in long term we can determine the severity of the dataset based on the past events. Spatial resolution: Low spatial resolution gives the detailed outline of the data distributed spatially. Freely available low spatial resolution data on an average do not have large historical data which makes processes like identification of drought pixel compared to the past a difficult task. Additionally, when we blend these dataset we get the benefits of both temporal as well as spatial resolution to indicate stress of drought Determining short term and long term drought with 1 month SPI to 60 month SPI SPI based on 1 month of precipitation data over 30 years will indicated short term meteorological drought because accumulation of one month of precipitation data is taken of every year and is compared with the past accumulated rainfall of the respective month.
  • 9. Page | 9 Here, SPI based on 12 months of accumulated precipitation data will indicate locations which are under meteorological drought over one year. Similarly 24 months based SPI will indicate locations suffering drought over two years. When we try to classify drought with machine learning by taking 1, 12, 24 and 60, we can get drought stress pixels with varying intensity.
  • 10. Page | 10 Introduction Objective 1. Need for Automation The main objective of automation is to produce rapid results. When the project is based on high temporal data sets, the processing of these datasets becomes repetitive. Once the definition of the process is defined it can be automated, this will help in reducing human interference which will lead to less erroneous product. Also since it’s automated the results generated are rapid. ArcMap a GIS software also has limitation when it comes to project specific customization, for example exporting weekly drought maps by default has to be done manually in case of high temporal dataset it becomes a time consuming task, with the help of automation we can reduce the time of generating the result. When it comes to analyzing the data, depending on the methodology many intermediate dataset are created. For example for plotting sum monthly mean rainfall of precipitation data provided by IMD from 1901 to 2015 a total dataset of 115 years, traditionally ArcMap user will create Batch for every month and perform zonal statistics on the files given in batch, the user will do this for every month of every from 1901 to 2015. The intermediate data generated here is the sum of the monthly files taking unnecessary space in the computer. Each sum file is around 136 Kilobytes after calculation over 115 years we waste 184 Megabytes of space. Since IMD dataset is 0.25 degree in spatial resolution and the size of the individual dataset is in Kilobytes Comparing it to Landsat 8 OLI images where each image is over 1 Gigabyte the amount of space wasted will be more therefore with help of automation we can use the hardware resources of the computer to the optimal level. 2. Need for machine learning classification Machine learning enables us to create application which replicate human cognitive function to classify objects. Machine learning has many sub streams, each having its own advantages and disadvantages.
  • 11. Page | 11 Once the algorithm is trained it can be used multiple times to classify drought to generate weekly or daily product depending on the input parameters.
  • 12. Page | 12 Literature Review Python success story ForecastWatch.com Introduction ForecastWatch.com, a service of Intellovations, is in the business of rating the accuracy of weather reports from companies such as Accuweather, MyForecast.com, and The Weather Channel. Over 36,000 weather forecasts are collected every day for over 800 U.S. cities, and later compared with actual climatological data. These comparisons are used by meteorologists to improve their weather forecasts, and to compare their forecasts with others. They are also used by consumers to better understand the probable accuracy of a forecast. The Architecture ForecastWatch.com is built from four major architectural components: An input process for acquiring forecasts, an input process for acquiring measured climatological data, the data aggregation engine, and the web application framework. There are two main input processes in the system: The forecast parser, and the actuals parser. The forecast parser is responsible for requesting forecasts from the web for each of the forecast providers ForecastWatch.com tracks. It parses the forecast from the page and inserts the forecast data into a database until it can be compared to the actual data. The actuals parser takes actual data from the National Climatic Data Center of the National Weather Service, which provides high, low, precipitation, and significant weather events for over 800 United States cities and inserts the data in to the database. This process also scores the forecasts with the actual weather data, and places that information in the database. Once the data has been collected and scored, it is processed by the aggregation engine, which combines the scores into yearly and monthly blocks, sliced by provider, location, and the number of days into the future for which the forecasts were predicting. In its first year, 2003, the system only gathered forecasts for 20 U.S. cities, or about 250,000 individual forecasts, so most of the data output was based on the raw scoring data. The aggregation engine was added once the system was scaled up to 800 cities, increasing the data stream by almost 4000%. In the first half of 2004,
  • 13. Page | 13 the system has already scored over 4 million forecasts, all collected, parsed, and displayed on the web. Implemented with Python ForecastWatch.com is a 100% pure Python solution. Python is used in all its components, from the back-end to the front-end, including also the more performance-critical portions of the system. Python was chosen initially because it comes with many standard libraries useful in collecting, parsing, and storing data from the web. Among those particularly useful in this application were the regular expression library, the thread library, the object serialization library, and gzip data compression library. Other libraries, such as an HTTP client capable of accepting cookies (ClientCookie), and an HTML table parser (ClientTable) were available as third party modules. These proved invaluable and were easy to use. The threading library turned out to be very important in scaling ForecastWatch.com's coverage to over 800 cities. Grabbing web pages is a very I/O bound process, and requesting a single page at a time for roughly 5000 web pages a day would have been prohibitively time-consuming. Using Python's threading library, the web page retrieval loop simply calls thread.start_new() for each request, passing in the necessary class instance method that retrieves and processes the web page, along with the parameters necessary to describe the city for the desired forecast. The request classes use a Python built-in Event class instance to communicate with the main controlling thread when processing is complete. Python made this application of threading incredibly easy. Python is also used in the aggregation engine, which runs as a separate process to combine forecast accuracy scores into monthly and yearly slices. The aggregation process uses queries via MySQLdb to theMySQL database where the input modules have placed the forecast and climatological data they have harvested. Colorized maps, showing forecast accuracy by geographical area, are then generated for use on the web site and in printed reports. Python Made It Possible Python played a significant role in the success of ForecastWatch.com. The product currently contains over 5,000 lines of Python, most of which are concerned with implementing the high- level functionality of the application, while most of the details are taken care of by Python's
  • 14. Page | 14 powerful standard libraries and the third party modules described above. Many more lines of code would have been needed working in, for example, Java or PHP. The integration capabilities of those languages are not as strong, and their threading support is harder to use. About Python Python is impressive as an object-oriented rapid application development language. One of Python's key strengths lies in its ability to produce results quickly without sacrificing maintainability of the resulting code. In ForecastWatch.com, Python was used for prototyping as well, and those prototypes were able to evolve cleanly into the production code without requiring a complete rewrite or switching toolsets. This saved substantial effort and made the development process more flexible and effective. Because of the clean design of the language, refactoring the Python code was also much easier than in other languages; moving code around simply requires less effort. Python's interpreted nature was also a benefit: Code ideas can easily be tested in the Python interactive shell, and lack of a compilation phase makes for a shorter edit/test cycle. All of these factors combine to make Python a terrific alternative to C++ and Java as a general purpose programming language. ForecastWatch.com was made possible because of the ease of programming complex tasks in Python, and the rapid development that Python allows. Python Modules 1. Pandas: Pandas are used for data analytics. It enables the programmer to traverse around the data and get the desired result. Pandas behaves similar to Microsoft Excel the only difference is that there is no user interface for pandas to be implemented 2. Matplotlib: As discussed in the earlier segment pandas matplotlib enables the user to visualize the behavior of the data. Matplotlib is a vast library which is capable of printing any type of graph. In this project matplotlib is used for analyzing the spatial correlation between two dataset
  • 15. Page | 15 3. Arcpy: Arcpy is a python module made only for ArcGIS application. This module cannot be used outside the ArcGIS environment. The basic purpose of this module to create customized application in ArcGIS Desktop. Every tool in the ArcMap has a python implementation which you can see in the tool description. By understanding the behavior of the tool we can then merger multiple tools in ArcMap and get the desired output This helps in reducing the manual work as merging the tools automated the process for generating tools 4. Numpy: Numpy helps in performing operation on 2D or 3D numpy arrays. With help of numpy any raster based model can be generated. Numpy has additional methods which enable the user to transform the raster dataset which are saved in 2D or 3D numpy array. Multiple modules in numpy makes the task like taking the temporal mean by excluding particular value in the high temporal dataset (For Example 100 years) very easy. Just by masking the dataset of numpy raster the above result can be achieved 5. Openpyxl: Openpyxl is the bridge between the ArcMap to Microsoft Excel. The results generated from the ArcMap can be taken in the form of pandas data frame and then be stored into excel. After storing the dataset we can use multiple methods in openpyxl to visualize the data. All types of graph and charts can be generated with the help of openpyxl. Some of them are line chart, scatter chart, pie charts, area charts etc. 6. Scipy: Scipy includes all the complex statistical tools for data analysis. One of which known as gamma cumulative probability density function is used for calculation of SPI. Statistical components as linear regression calculation with results containing standard error and other important information can be generated with the help of scipy. Interpolation module inside scipy helps the programmer to perform tasks like interpolation of point dataset to generate surface. There are dedicated modules in scipy for Fourier Transform, Linear Algebra, Eigen Values, Multidimensional image processing etc. 7. (GRASS, n.d.):
  • 16. Page | 16 GRASS is freely available plugin in Quantum GIS software which can be used with raster as well as vector data for analysis. GRASS is a plugin not a module, it contains many Python modules for analysis of spatial data like: i. Db as database module. ii. R which is a Raster module. iii. V which is Vector module. 8. Sklearn: a. Decision Tree (scikit-learn, 1.10. Decision Trees, n.d.): Decision Trees (DTs) are a non-parametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features. For instance, in the example below, decision trees learn from data to approximate a sine curve with a set of if-then-else decision rules. The deeper the tree, the more complex the decision rules and the fitter the model. Some advantages of decision trees are:  Simple to understand and to interpret. Trees can be visualised.  Requires little data preparation. Other techniques often require data normalisation, dummy variables need to be created and blank values to be removed. Note however that this module does not support missing values.  The cost of using the tree (i.e., predicting data) is logarithmic in the number of data points used to train the tree.  Able to handle both numerical and categorical data. Other techniques are usually specialised in analysing datasets that have only one type of variable. See algorithms for more information.  Able to handle multi-output problems.  Uses a white box model. If a given situation is observable in a model, the explanation for the condition is easily explained by boolean logic. By contrast, in a black box model (e.g., in an artificial neural network), results may be more difficult to interpret.  Possible to validate a model using statistical tests. That makes it possible to account for the reliability of the model.
  • 17. Page | 17  Performs well even if its assumptions are somewhat violated by the true model from which the data were generated. The disadvantages of decision trees include:  Decision-tree learners can create over-complex trees that do not generalise the data well. This is called overfitting. Mechanisms such as pruning (not currently supported), setting the minimum number of samples required at a leaf node or setting the maximum depth of the tree are necessary to avoid this problem.  Decision trees can be unstable because small variations in the data might result in a completely different tree being generated. This problem is mitigated by using decision trees within an ensemble.  The problem of learning an optimal decision tree is known to be NP- complete under several aspects of optimality and even for simple concepts. Consequently, practical decision-tree learning algorithms are based on heuristic algorithms such as the greedy algorithm where locally optimal decisions are made at each node. Such algorithms cannot guarantee to return the globally optimal decision tree. This can be mitigated by training multiple trees in an ensemble learner, where the features and samples are randomly sampled with replacement.  There are concepts that are hard to learn because decision trees do not express them easily, such as XOR, parity or multiplexer problems.  Decision tree learners create biased trees if some classes dominate. It is therefore recommended to balance the dataset prior to fitting with the decision tree.
  • 18. Page | 18 Figure 16: Simple implementation of Decision Tree b. Random Forest (scikit-learn, 3.2.4.3.1. sklearn.ensemble.RandomForestClassifier, n.d.): A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting. The sub-sample size is always the same as the original input sample size but the samples are drawn with replacement if bootstrap=True (default). The example of random forest implementation is given as following
  • 19. Page | 19 Figure 17: Simple implementation of Random Forest 9. Decision Tree Implementation: (Hwahwan & Cha, 2008) Implemented land classification with machine learning technique named decision tree. The parameters taken for classification were i. DEM ii. Aspect iii. Slope iv. ISO cluster v. Population Density vi. Distance to water vii. Distance to Road To train the decision tree, classified land data was taken from the government of South Korea. The classes classified are Forest, Urban, Water, Agriculture, Rangeland, Barren land, Wetland. After training and classification of the dataset 96% of accuracy was achieved.
  • 20. Page | 20 10. WMO SPI (Standardized precipitation Index): (WMO, 2009) In the Inter-Regional Workshop on Indices and Early Warning Systems for Drought declares that SPI (Standardized Precipitation Index) should be used to characterize meteorological drought. SPI answers the question like is rainfall in particular month in deficit or surplus compare to past years of data. SPI ranges from 1 month to 60 months, it can be done over 60 month as well if long term rainfall data is available. One month SPI helps in identifying short term drought events since data of 1 month rainfall is compared with past records. As we increase the month of SPI we can identify long term drought, for example if we see 48 months SPI we can identify location which are affected by drought for past 2 years. SPI roughly ranges from -3 to +3 and each range of values have meaning. Table 2: SPI Values 2.0+ Extremely wet 1.5 to 1.99 Very wet 1.0 to 1.49 Moderately wet -.99 to .99 Near normal -1.0 to -1.49 Moderately dry -1.5 to -1.99 Severely dry -2 and less Extremely dry (M. Svoboda, 2012) in the user guide define the meaning of SPI values above. 11. Bias Correction CMCC: CMCC = 𝑃 ∗ (𝑑) = 𝑃(𝑑) . µ 𝑚(𝑃 𝑜𝑏𝑠(𝑑)) µ 𝑚(𝑃𝑟𝑒𝑚(𝑑)) Where: - µ 𝑚(𝑃𝑜𝑏𝑠(𝑑)) is the month mean of the station data - µ 𝑚(𝑃𝑟𝑒𝑚(𝑑)) is the month mean of remote sensing data - 𝑃(𝑑) is the daily remote sensing data - 𝑃 ∗ (𝑑) is the bias corrected remote sensing data
  • 21. Page | 21 12. Bias Correction MRC 𝑆𝑅𝐸𝑒 = (𝑆𝑅𝐸 𝑜 − µ 𝑆𝑅𝐸) . Ƭ 𝑓(µ 𝑆𝑅𝐸 . µ 𝑓) µ 𝑓 = µ 𝑂𝐵𝑆/ µ 𝑆𝑅𝐸 Ƭ 𝑓 = Ƭ 𝑂𝐵𝑆/ Ƭ 𝑆𝑅𝐸 Where: - 𝑆𝑅𝐸𝑜 is the station data - µ 𝑆𝑅𝐸 is month mean of remote sensing data - µ 𝑂𝐵𝑆 is month mean of station data - Ƭ 𝑂𝐵𝑆 is month standard deviation of station data - Ƭ 𝑆𝑅𝐸 is month standard deviation of remote sensing data - 𝑆𝑅𝐸 𝑜 daily SRE data - 𝑆𝑅𝐸𝑒 is the bias corrected data 13. Bias Correction Rule Based (Modified CMCC): This method was developed at IWMI and its behavior is related to the CMCC methodology. After testing the rainfall dataset in excel we created the rules. By studying the behavior of the equation depending on the mean of data. - Rule 1: If station data records precipitation and satellite does not, copy IMD data into bias corrected satellite data. - Rule 2: If the difference between station and satellite data is two millimeter take the mean of these pixels. - Rule 3: If station mean is greater than the satellite mean where satellite daily data is greater than station daily data then the equation of bias correction is: 𝑃 ∗ (𝑑) = 𝑃(𝑑) . µ 𝑚(𝑃𝑟𝑒𝑚(𝑑)) µ 𝑚(𝑃𝑜𝑏𝑠(𝑑)) Where satellite daily is less than station daily data. 𝑃 ∗ (𝑑) = 𝑃(𝑑) . µ 𝑚(𝑃𝑜𝑏𝑠(𝑑)) µ 𝑚(𝑃𝑟𝑒𝑚(𝑑))
  • 22. Page | 22 - Rule 4: If station mean is less than the satellite mean where satellite daily data is less than station daily data then the equation of bias correction is: 𝑃 ∗ (𝑑) = 𝑃(𝑑) . µ 𝑚(𝑃𝑟𝑒𝑚(𝑑)) µ 𝑚(𝑃𝑜𝑏𝑠(𝑑)) Where satellite daily is greater than station daily data. 𝑃 ∗ (𝑑) = 𝑃(𝑑) . µ 𝑚(𝑃𝑜𝑏𝑠(𝑑)) µ 𝑚(𝑃𝑟𝑒𝑚(𝑑)) Where: - µ 𝑚(𝑃𝑜𝑏𝑠(𝑑)) is the month mean of the station data - µ 𝑚(𝑃𝑟𝑒𝑚(𝑑)) is the month mean of remote sensing data - 𝑃(𝑑) is the daily remote sensing data - 𝑃 ∗ (𝑑) is the bias corrected remote sensing data 14. Bias Correction IIT Gandhinagar: (Shah & Mishra, 2014) Bias Corrected TRMM data with respect to IMD data. In the paper they mention that TRMM always underestimates the amount of precipitation compared to the station data provided by IMD. These difference was noticeably seen in the monsoon season when the values were extreme. They created two scale factors to bias correct the data one for the extreme events and the other for the non-monsoon months. First they took precipitation values over ninetieth percentile of IMD data and created the first scale factor for extreme events. The scale factor is just the ratio of the IMD and TRMM mean corresponding to pixels over ninetieth percentile of IMD. This scale factor is then multiplied to raw TRMM data for respective months
  • 23. Page | 23 Second for pixels below ninetieth percentile of IMD the ratio is taken of the IMD and TRMM mean corresponding to pixels under ninetieth percentile of IMD. This second factor is applied on raw TRMM data. Percentile is taken only for the monsoon months because other months receive comparatively less rainfall. 15. IDSI (Integrated Drought Severity Index) a. IDSI is index developed at IWMI for monitoring of drought in South Asia consisting of countries named India, Pakistan, Sri Lanka, Nepal, Afghanistan and Bangladesh. It uses VCI, TCI, Rainfall anomaly from GPM dataset to classify a pixel’s drought severity. A map of IDSI looks like following. Figure 18: IDSI 20-27 Jul 2002
  • 24. Page | 24 Study Area Introduction Since this methodology is on experimental mode and the data set is of huge varsity and variety the processing time with machine learning of the dataset is completely dependent on how big is the data. To get faster results, the study area has been kept small, the area was decided on the basis of large spatial variability in Maharashtra so that the interaction of the diverse indices from multiple satellites could be studied. Area of concern The biggest are of concern is farmer suicides in Maharashtra, the major reason behind this issue lack of management from the government. Remote sensing is one platform that can be adopted by the government to release funds to the needy and stop this crisis. Lack of management is due to not taking right decisions and right time, it is happening because conventional methodologies are still used to determine drought. The reason why remote sensing is the only method to get fast and mostly accurate results is because in this methodology we do not need to wait for a verdict from a village surveyor to declare the village as drought, from satellite imagery we can immediately process the data and provide is crucial information to the decision makers to take the right decisions With help of python automation we can automate entire processing without human interference and get results in minutes
  • 25. Page | 25 Figure 19: Area of Interest
  • 26. Page | 26 Methodology Conversion tools: 1. IMD GRD data to ASCII: IMD by default distributes the in-situ precipitation data in GRD format, a separate C program is distribute to convert this GRD file into ASCII to simplify processing for the user inbuilt python script is made so that performs the same task as the C program 2. IMD ASCII to Raster: The ASCII file is then converted to girded raster data so that it can be compared with satellite rainfall estimates 3. Excel to Raster: Apart from IMD, Bangladesh precipitation data that I found was in the form of excel sheet containing in-situ recorded precipitation data. This tool will enable the user to generate girded raster map from the point station data with IDW interpolation technique Shift in the IMD and Remote sensing data: 1. After Converting IMD raw data into raster maps we observed that there is a shift of 0.125 degrees between IMD raster and remote sensing data raster 2. Compared with PERSIANN and TRMM this shift was observed significantly 3. Since the final product is going to be based on 30 meters spatial resolution the shift of 0.125 degrees will cause major problem 4. Hence a tool was created to remove this shift Methodology 1. Resample IMD data to 0.125 degree from 0.25 degrees 2. Then we take the zonal mean of these 0.125 degrees with a fishnet which is created based on the remote sensing data, where the grid size of the fishnet is 0.25 matching the extent of remote sensing data 3. The extra reaming 0.125 degree cells left are then clipped 4. Again resample is performed to convert the 0.125 degree IMD data tot 0.25 remote sensing products like PERSIANN and TRMM Bias Correction
  • 27. Page | 27 Tool box dedicated to bias correction is created with all the methods listed below. 1. Bias Correction CMCC interval 2. Bias Correction CMCC monthly 3. Bias Correction Rule Based 4. Bias Correction MRC 5. Bias Correction IIT Gandhinagar Best Corrected results was taken for SPI calculation 1. SPI calculation a. Automated data sorting Figure 20: UI of Monthly Sum for SPI Description of this tool. Monthly Sum For SPI Title Monthly Sum For SPI
  • 28. Page | 28 Summary This tool calculates the monthly sum from the daily rainfall data and saves the long term files in its repective month folder For calculation of SPI with the SPI tool in this toolbox some data preparation is needed, this tool will enable you process data with help of automation, hence you willl not need to manually sort and then send it to SPI tool. Usage There is no usage for this tool. Syntax MonthlySumForSPI (Daily_gridded_data_Folder, Extension, Output_Folder) Parameter Explanation Data Type Daily_gridded_data_Folder Dialog Reference This folder should contain subfolders with year name containing 365 daily rainfall data Example: Folder name = TRMM_daily Sub-folder name = 2001, 2002, 2003... 2015 There is no python reference for this parameter. Folder Extension Dialog Reference String
  • 29. Page | 29 Select the format of precipitation dataset the you are giving as input parameter for this tool There is no python reference for this parameter. Output_Folder Dialog Reference Folder where all the monthly folders will be created containing the output files There is no python reference for this parameter. Folder
  • 30. Page | 30 b. 1 to 12 tool UI Figure 21:UI of SPI Calculation from 1 month to 12 months Description of this tool: SPI 1 to 12 Title SPI 1 to 12 Summary This tool calculates the SPI using monthly rainfall data It is designed to calculate Standardiezed Precipitaion Index (SPI), with minimal human interaction. It computes α,β,Г(α) and cumulative probablity density function within the tool and gives the final output as 1 to 12 all months SPI
  • 31. Page | 31 NOTE: The SPI output from the tool has been validated with the World Meteorological Organisation (WMO) software for SPI. Correlation of 0.99 was achived between this tool and WMO software. Usage There is no usage for this tool. Syntax SPI1to12 (Input_Folder, Extension, Daily_gridded_data_Folder, Output_Folder) Parameter Explanation Data Type Input_Folder Dialog Reference Folder containing all the monthly sub-folders with all the years monthly rainfall files as computed by the "Monthly Sum For SPI" tool There is no python reference for this parameter. Folder Extension Dialog Reference Select the format of precipitation dataset the you are giving as input parameter for this tool There is no python reference for this parameter. String Daily_gridded_data_Folder Dialog Reference Folder
  • 32. Page | 32 This folder should contain subfolders with year name containing 365 daily rainfall data Example: Folder name = TRMM_daily Sub-folder name = 2001, 2002, 2003... 2015 There is no python reference for this parameter. Output_Folder Dialog Reference Folder where all the monthly folders will be created containing the output files There is no python reference for this parameter. Folder
  • 33. Page | 33 c. 12 to 60 tool UI Figure 22: UI of SPI Calculation from 13 months to 60 months Description of this tool: SPI 13 to 60 Title SPI 13 to 60 Summary This tool calculates the SPI using monthly rainfall data It is designed to calculate Standardiezed Precipitaion Index (SPI), with minimal human interaction. It computes α,β,Г(α) and cumulative probablity density function within the tool and gives the final output as 1 to 12 all months SPI
  • 34. Page | 34 NOTE: The SPI output from the tool has been validated with the World Meteorological Organisation (WMO) software for SPI. Correlation of 0.99 was achived between this tool and WMO software. Usage There is no usage for this tool. Syntax SPI13to60 (Input_Folder, Extension, Daily_gridded_data_Folder, Output_Folder) Parameter Explanation Data Type Input_Folder Dialog Reference Folder containing all the monthly sub-folders with all the years monthly rainfall files as computed by the "Monthly Sum For SPI" tool There is no python reference for this parameter. Folder Extension Dialog Reference Select the format of precipitation dataset the you are giving as input parameter for this tool There is no python reference for this parameter. String Daily_gridded_data_Folder Dialog Reference Folder
  • 35. Page | 35 This folder should contain subfolders with year name containing 365 daily rainfall data Example: Folder name = TRMM_daily Sub-folder name = 2001, 2002, 2003... 2015 There is no python reference for this parameter. Output_Folder Dialog Reference Folder where all the monthly folders will be created containing the output files There is no python reference for this parameter. Folder
  • 36. Page | 36 d. Unpack Figure 23: UI of unpacking all the calculated SPI to daily raster images Description of this tool: This tool helps to user to unpack all the stacked SPI raster, the output from this tools will contain single layer containing monthly SPI. e. Validation with WMO software i. WMO has developed a command line program to calculate the SPI, output from both WMO program and python SPI tool developed at IWMI had a correlation of 0.99 Table 2: Comparison between SPI calculated by WMO and IWMI made python tool Python WMO Correlation -99 -99 0.99999 1.28 1.25 -1.4 -1.392 -0.34 -0.342 -0.26 -0.265
  • 37. Page | 37 Python WMO Correlation -0.88 -0.877 1.47 1.441 0.17 0.157 2.43 2.388 -0.27 -0.27 -0.66 -0.662 -0.43 -0.431 -0.18 -0.179 0.96 0.936 -0.62 -0.616 -1.91 -1.894 0.49 0.468 0.06 0.041 -0.73 -0.725 -0.15 -0.161 -1.65 -1.641 0.73 0.705 -0.15 -0.155 0.45 0.437 1.42 1.384 0.28 0.262 0.02 0.013 -1.2 -1.196 1.51 1.486 -1.46 -1.443 0.17 0.149 1.15 1.127
  • 38. Page | 38 f. Validation with month mean of IMD rainfall data i. IMD data of 1901 to 2013 was taken to calculate one month SPI, Y axis on the left side has data of mean monthly rainfall of Maharashtra. Y axis on right side contains corresponding data’s SPI. The correlation between both the dataset was achieved over 0.94 and the patterns also match the same Figure 24: Comparison of Mean Rainfall with Mean SPI 2. VCI (Kogan & J. Sullivan, 1993) Defined vegetation index which takes the maximum and minimum NDVI values in the time series and then calculates the index 𝑉𝐶𝐼 = (𝑁𝐷𝑉𝐼 − 𝑁𝐷𝑉𝐼𝑚𝑖𝑛) × 100 𝑁𝐷𝑉𝐼 𝑚𝑎𝑥 − 𝑁𝐷𝑉𝐼𝑚𝑖𝑛 Where: NDVI, NDVImax and NDVImin are values of smoothed weekly NDVI and the multiple year NDVI maximum and minimum, respectively. 3. TCI (Liu, W.T., & F.N. Kogan, 1996) Similar to VCI the maximum and minimum is taken over the long time period -2.50000000000 -2.00000000000 -1.50000000000 -1.00000000000 -0.50000000000 0.00000000000 0.50000000000 1.00000000000 1.50000000000 0.00000000000 100.00000000000 200.00000000000 300.00000000000 400.00000000000 500.00000000000 600.00000000000 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111
  • 39. Page | 39 𝑇𝐶𝐼 = 100(𝐵𝑇 𝑚𝑎𝑥 − 𝐵𝑇) 𝐵𝑇 𝑚𝑎𝑥 − 𝐵𝑇𝑚𝑖𝑛 Where: BT, BTmax, and BTmin are smoothed weekly and multiple-year maximum and minimum thermal brightness temperatures, respectively 4. NDVI NDVI was calculated by using the dataset from Landsat 8 OLI 𝑁𝐷𝑉𝐼 = 𝐵𝑎𝑛𝑑 5 + 𝐵𝑎𝑛𝑑 4 𝐵𝑎𝑛𝑑 5 − 𝐵𝑎𝑛𝑑 4 Where: Band 5 is near infrared having wavelength of (0.85-0.88) micro meters and Band 4 is red, wavelength ranging from (0.64 - 0.67) 5. SPI, VCI, TCI and NDVI these indices will be used as parameters in both of the machine learning approach
  • 40. Page | 40 Results 1. Bias correction comparison a. Two satellite precipitation estimators were take first TRMM and other the PERSIANN both have the resolution of 0.25 degrees, all the types of bias correction discussed in the literature review were implemented and a correlation graph between station original and satellite estimates before bias correction and after bias correction . After taking the annual average of the correlation between bias corrected data and original data to understand which bias correction is consistently giving good results on most of the days TRMM and PERSIANN had an average correlation of 0.6 b. Since the data of PERSIANN was available from 1983 to 2015 which is thirty three years of daily rainfall estimates data it will be better choice for calculation of SPI. After comparison of all the results from bias correction Rule Based bias correction proved to be the best with the average correlation of 0.7 to 0.8 so the final bias correction method used is Rule Based with PERSIANN as the satellite rainfall estimates and IMD as the in-situ observed data. The results are shown in the following graphs c. The day of the data compared in all the months listed below in graphs i:e April, May, July and August are dated on the 15th of April, May July and August.
  • 41. Page | 41 Figure 25: 2007 TRMM data compared with all types of Bias Correction methods
  • 42. Page | 42 Figure 26: 1998 TRMM data compared with all types of Bias Correction methods
  • 43. Page | 43 Figure 27: 2007 PERSIANN data compared with all types of Bias Correction methods
  • 44. Page | 44 Figure 28: 1998 PERSIANN data compared with all types of Bias Correction methods
  • 45. Page | 45 Figure 29: 1998 PERSIANN data compared with all types of Bias Correction methods
  • 46. Page | 46 Figure 30: 2007 PERSIANN data compared with all types of Bias Correction methods
  • 47. Page | 47 d. DT i. The input parameter for classification of the dataset were VCI, TCI, SPI and NDVI.
  • 48. Page | 48 e. Random Forest with different estimators and their results i. Estimator 10
  • 49. Page | 49 ii. Estimator 25
  • 50. Page | 50 iii. Estimator 50
  • 51. Page | 51 iv. Estimator 80
  • 52. Page | 52 f. IDSI vs Random Forest, Decision Tree and SPI vs Random Forest and Decision Tree Table 3: Comparison between Random forest with 10, 80, 25, 50 estimators and decision tree with IDSI and SPI RF_10 RF_80 RF_25 RF_50 DT VS IDSI 0.49 0.42 0.48 0.48 0.43 VS SPI 0.18 0.78 0.31 0.14 0.99 Where RF_10, RF_25, RF_50, RF_80 and DT are random forest with 10 estimators, random forest with 25 estimators, random forest with 50 estimators, random forest with 80 estimators and decision tree respectively
  • 53. Page | 53 2. IWMI tool box Figure 31: IWMI tools in ArcCatelog 10.3 3.
  • 54. Page | 54 a. Remaining tools description Tool Short description Export Maps Exports Map in all possible formats supported by ArcMap Frequency computer Computes the frequency of drought pixel to find the number of drought occurrence Clip Batch Clips all the files in the given folder Define Projection Batch Defines projection to all the files in the given folder Interval Mean Finds days interval mean of the given annual dataset in a folder Months Statistics Calculates monthly statistics of daily gridded data Raster Scatter diagram Automated generation of scatter plot of two given raster dataset Resample Batch Resample all the files in the given folder Set no data to Value File Sets a value to no data elements in the raster file Set no data to Value Folder Sets a value to no data elements in the raster folder Stack Sum Generates the sum of all the layers in the stack raster dataset Zonal Batch Raster Computes zonal raster on all the files in the given folder Zonal Batch Table Computes zonal table on all the files in the given folder
  • 55. Page | 55 Discussion Using python in automation has substantially reduced manual intervention of human decision and reduced the processing time of data set up to 30 percent. After going through multiple papers on bias correction there were many assumptions taken. The very fact of in- situ data recorded daily and satellite estimates are registered on different time period of the day causes the daily bias correction to be made nearly impossible, more over the in-situ data collected is also heavily biased due to human error and this fact also causes a lot noise which has to be corrected but cannot be done. The only way of bias correcting daily real time rainfall data is to use (Internet Of Things) IOT, multiple in-sity sensors can be stationed around at particular intervals and while the satellites remote sense’s the data with help of IOT both the satellite and the ground in-situ can communicate at the same moment and bias correct the data there itself . Due to long processing hours of dataset of the size of one district, this becomes the biggest limitation of implementing this machine learning approach in real life, also due to the use of Landsat 8 dataset we can see reasonably fine resolution of water stressed areas. Machine learning needs greater understanding of how operating systems function manipulating the threads for multiprocessing can cause the results to be generated ten times faster.
  • 56. Page | 56 Conclusion Random forest proved to the better than decision tree in terms of classification of the pixels, random forest could easily use the high resolution of Landsat 8 dataset to get finer resolution of water stressed pixels. Four parameter is just the beginning multiple parameter will be generated for future classification of potential drought pixel. Consideration of rain fed was not taken into consideration due to lack of data availability, if the similar data is made available the quality of classification can be increased substantially. More parameter with high correlation with each other results in better result in random forest. For future do deal with the slow processing of huge amount of data dask module of python will be used to generate faster result with low (Random Access Memory) RAM, dask enables the user to use block algorithms which use less memory and more processing power of the computer. Instead of using single threat for calculation of entire raster dask breaks down the big raster dataset into user defined intervals and then processes all these divided raster dataset parallely. After having experience in automated generation of maps development of end to end application will be aimed to be built. In which the algorithm will directly talk to the satellites and generate the map and statistical data without any human interference SPI was calculated till 60 months to find location which are under drought over five years due to time constrains it could not be implemented on the random forest as well as the decision tree. For further development collaboration will be done with one of my class mates Miss Marcia Chen as she has explored neural network for analyzing the difference and efficiency between random forest and neural network In my six months of internship I have created nearly 70 tools out of which these 30 mentioned in this project has been approved by IWMI, in some weeks all of the tools will be published with IWMI branding
  • 57. Page | 57 References 1. GRASS. (n.d.). https://grass.osgeo.org. Retrieved from Documentation: https://grass.osgeo.org/documentation 2. H. K., & C. Y. (2008). A Machine Learning Approach for Knowledge Base Construction. Journal of the Korean Geographical Society, 761-774. 3. Kogan, & J. Sullivan. (1993). Development of global drought-watch system using NOAA / AVHRR data. Advance in Space Research, 219-222. 4. Liu, W.T., & F.N. Kogan. (1996). Monitoring regional deought using the Vegetation Condition Index. International Journal of Remote Sensing, 2761-2782. 5. M. Svoboda, M. H. (2012). Standardized Precipitation Index User. 6. scikit-learn. (n.d.). 1.10. Decision Trees. Retrieved from http://scikit-learn.org: http://scikit-learn.org/stable/modules/tree.html 7. scikit-learn. (n.d.). 3.2.4.3.1. sklearn.ensemble.RandomForestClassifier. Retrieved from http://scikit-learn.org: http://scikit- learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html 8. Shah, R. D., & Mishra, V. (2014). Development of an Experimental Neari-Real-Time Drought Monitor for India. Journal of Hydrometeorology, 327-345. 9. WMO. (2009, 12 15). Press Release No. 872. Retrieved from www.wmo.int: https://www.wmo.int/pages/mediacentre/press_releases/pr_872_en.html