SlideShare a Scribd company logo
1 of 3
Download to read offline
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 5346
RAINFALL FORECASTING USING WEKA DATA MINING TOOL
Abinaya P L1, Janani N2
1,2Department of Information Technology, Kumaraguru College of Technology, Tamil Nadu, India
--------------------------------------------------------------------------***-----------------------------------------------------------------------
ABSTRACT - Rain water, additionally called
precipitation, is a characteristic element of the world's
climate framework. India is a rural nation which to a
great extent relies upon monsoon for irrigation purpose.
Rainstorm have coordinated effect on agriculturists' life
and harvest creation. The Indian economy is heavily
dependent on agriculture and the job of the Indian
agriculturist to a great extent relies upon the Monsoon
downpours. Rainfall forecasting is essential and important
for development of the nation. Weather factors including
temperature, dew point temperature, humidity and
pressure have been used to forecasts the rainfall. The
dataset was collected from Kaggle and is classified using
Naïve Bayes, K nearest Neighbour, Decision tree and
Support vector machine. The Naïve Bayes has given
80.56% accuracy, K nearest Neighbour provides 93.96%
accuracy, Decision tree gives 94.10% accuracy while
Support vector machine has given 93.66% accuracy.
Keywords: Rainfall Prediction, Decision tree, Naive
Bayes, K Nearest Neighbour, SVM
1. INTRODUCTION
Rainfall forecasting is essential since it facilitates to spot
potential floods in future and to set up for higher water
management which decides future irrigation desires.
Weather forecasts may be categorised as: current
forecasts that is forecasts up to few hours, Short term
forecasts that is downfall forecasts is one to three days
forecast, Forecasts for four to ten days are Medium vary
forecasts and future forecasts are for over ten days.
Short vary and medium vary downfall forecasts are
necessary for flood forecasting and water resource
management. Classification has been applied in several
fields like detection frauds by banks and firms, by
numerous service suppliers to predict their performance
in future, to classify patients on the premise of their
symptoms. In this paper we have used Naïve Bayes, K
nearest Neighbour, Decision tree and Support vector
machine to forecast rainfall. Classification trees are
created using the concept of Gini Index, Bayes theorem is
used in Naïve Bayes for prediction, Euclidean distance is
used to compute Nearest neighbours and Support vector
machine use Kernel trick.
2. NAÏVE BAYES
Naive Bayes is a basic, yet viable and regularly utilized,
machine learning classifier. The Naive Bayes algorithm is
a basic probabilistic classifier. Naive Bayes classifier
requires a number of parameters linear in the number of
features in a learning problem and it is highly scalable.
Classification may be Gaussian, Kernel, multi nominal.
Here Bayes theorem is used which assumes all attributes
to be independent where the class variables are given.
Assuming distribution for weather information, Bayesian
classifiers use Bayes theorem to search out posterior
chances of occurrence of input data instance in all
classes. This assumption is known as class conditional
independence. The algorithm starts with computing
prior probability, P(Ci) for each class. Then for an input
data instance q, compute posterior probabilities for each
class
( | ) ∏ ( | )
Where Fj are attributes for input sample q. Finally
compute
P(Ci|q) = P(q|Ci) * P(Ci) / ∑ ( )
3. DECISION TREE ALGORITHM
A decision tree could be a structure that has a root node,
branches, and leaf nodes. It is used for both classification
and regression problem. Every internal node denotes a
test on the associate attribute, every branch denotes the
end result of a test, and every leaf node holds a category
label. The upmost node within the tree is the root node.
Tree pruning is performed so as to get rid of anomalies
due to noise or outliers. The cropped trees are smaller
and fewer complicated. The Decision tree algorithm
produces correct and explicable models with
comparatively very little user intervention. This
algorithm is often used for each binary and multiclass
classification issues. The algorithm is quick, each at build
time and apply time. The build method for decision tree
is parallelized. Decision tree rating is particularly quick.
The tree structure, created within the model build, is
employed for a series of easy tests. Every check relies on
one predictor. The algorithm is non-parametric and can
efficiently manage vast, muddled datasets without
forcing a convoluted parametric structure. At the point
when the example measure is sufficiently huge, study
about data can be partitioned into training and
validation datasets. Utilizing the training dataset to
assemble a decision tree model and a validation dataset
to settle on the suitable tree measure expected to
accomplish the optimal final display. Decision trees are
now recently referred as Classification and Regression
Trees (CART). The best attribute for splitting and
constructing decision tree is found by using Gini index as
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 5347
a selection measure. In CART the impurities in the
training dataset S is measured using Gini index.
Gini(S)=1-∑
Where m denotes the class labels. The one with least
impurity is considered to be the best option. This
process is done for all the attributes. The best splitting
attribute is one which has maximum ( )
( ) = Gini(S) - GiniA(S)
Where,
GiniA(S) = |S1| / |S| Gini (S1) + |S2| / |S| Gini (S2)
4. K NEAREST NEIGHBOUR
K nearest neighbours are referred as lazy learners where
learning depends on relationship. The algorithm can't be
utilized until the example for which neighbours are to be
computed is accessible. KNN are processed for given
instance t for which class name is to be anticipated. The
features are wanted to be numeric in nature as
neighbours are processed utilizing the distance metric.
Euclidean distance is performed here for analysing,
( ) √∑ ( )
Where Y and Z represents n dimensional observations
from training and test dataset respectively for which
class is to be predicted. The decision for the instance is
taken by majority voting. The input sample is considered
based on the class which has maximum neighbours.
Based on the size of the training dataset the value of K
that is the number of neighbours is determined.
5. SUPPORT VECTOR MACHINE
Support Vector Machine is a regulated machine learning
calculation which can be utilized for both classification
or regression difficulties. In any case, it is for the most
part utilized in grouping issues. In this calculation, we
plot every datum thing as a point in n-dimensional space
where n represents the number of features, we have
with the estimation of each component being the
estimation of a specific facilitate. At that point, we
perform classification by finding the hyper-plane that
separate the two classes exceptionally well. Support
Vectors are basically the co-ordinates of individual
perception. Support Vector Machine is a border which
best isolates the two classes that is hyper-plane/line. The
least complex approach to isolate two gatherings of
information is with a straight line, level plane or a N-
dimensional hyperplane. In any case, there are
circumstances where a nonlinear locale can isolate the
gatherings all the more productively. SVM handles this
by utilizing a kernel function (nonlinear) to map the
information into an alternate space where a hyperplane
can't be utilized to do the detachment. It implies a non-
linear function is found out by a linear learning machine
in a high-dimensional component space while the limit of
the framework is controlled by a parameter that does
not rely upon the dimensionality of the space. This is
called kernel trick which implies the kernel function
change the data into a higher dimensional element space
to make it conceivable to play out the linear separation.
In actuality, we probably won't have the capacity to drive
a straight line between the classes which makes Support
vector machines somewhat progressively confused
however it's possible to characterize the most extreme
edge hyperplane with Gaussian kernel.
6.PROCESS AND OUTCOMES
A dataset is collected which provides the data from May
2016 to March 2018 of the specific city that is Jaipur in
India. The dataset consists of 679 instances and 13
attributes.
Running the algorithms utilizing Naïve Bayes we break
down the classifier yield with such a large number of
statistic-based yield by utilizing 10 cross validation to
make a prediction of each example of the dataset. After
running the algorithm, we accomplished a classification
precision of 80.55% and the mean absolute error is
0.0446, time taken for building model is 0.01 seconds.
Fig.1.Screenshot view for Naïve Bayes Algorithm
J48 Tree has been utilized in this paper to choose the
objective esteem dependent on different attributes of
dataset and to characterize their accuracy. The output is
based on 10 cross validation which produced an
accuracy of 94.10% and the mean absolute error is
0.0227. The time taken to build the model is 0.01
seconds.
Fig.2.Screenshot view for J48 Algorithm
The K nearest neighbour is processed utilizing Euclidean
distance. Here 10 cross validation is used to make
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 5348
prediction for every instance in the dataset. The
algorithm has shown an output of 93.96% of accuracy.
The mean absolute error is 0.0181 and the time taken to
build the model is 0 seconds.
Fig.3.Screenshot view for K Nearest Neighbour
Algorithm
By running the Support vector machine algorithm, we
broke down the classifier yield with various statistics
dependent on yield by utilizing 10 cross validation. The
accuracy obtained for this algorithm is 93.66% with a
mean absolute error of 0.1598 and it took 0.69 seconds
to build the model.
Fig.4.Screenshot view for Support Vector Machine
Algorithm
7.CONCLUSION AND FUTURE WORK
The main aim of this paper is to forecast rainfall using
WEKA data mining tool. Classification which is a part of
data mining is extremely helpful in discovering obscure
patterns like forecasting the future trends. Here in this
paper we have used four algorithms namely Naïve Bayes,
Decision tree, K Nearest Neighbour and Support Vector
Machine and the outputs were compared based on the
accuracy achieved. The result has shown that the
decision tree algorithm has given the highest accuracy of
94.10% when compared to other three algorithms. So,
we can conclude that the Decision tree algorithm is the
best classification algorithm for forecasting rainfall on
the basis of given features in the dataset.
REFERENCES:
1. Dhamodharan S, Liver Disease Prediction Using
Bayesian Classification, Special Issues, 4th National
Conference on Advance Computing, Application
Technologies, May 2014
2. SolankiA.V., Data Mining Techniques using WEKA
Classification for Sickle Cell Disease, International
Journal of Computer Science and Information
Technology,5(4): 58575860,2014.
3. Joshi J, Rinal D, Patel J, Diagnosis And Prognosis of
Breast Cancer Using Classification Rules, International
Journal of Engineering Research and General
Science,2(6):315-323, October 2014.
4. David S. K., Saeb A. T., Al Rubeaan K., Comparative
Analysis of Data Mining Tools and Classification
Techniques using WEKA in Medical Bioinformatics,
Computer Engineering and Intelligent Systems,
4(13):28-38,2013.
5. Vijayarani, S., Sudha, S., Comparative Analysis of
Classification Function Techniques for Heart Disease
Prediction, International Journal of Innovative Research
in Computer and Communication Engineering, 1(3): 735-
741, 2013.
6.https://www.sciencedirect.com/science/article/pii/S0
925231202005775
7.https://docs.oracle.com/cd/B28359_01/datamine.111
/b28129/algo_decisiontree.htm#CACCJCEJ
8. Tina R. Patil, Mrs. S. S. Sherekar, Performance Analysis
of Naive Bayes and J48 Classification Algorithm for Data
Classification, International Journal Of Computer Science
And Applications, Apr 2013
9. Nicholas I.Sapankeyych, Ravi Sankar, Time series
prediction using support vector machines: A survey,
IEEE Computational Intelligence Magazine, April 2009
10.https://machinelearningmastery.com/use-
classification-machine-learning-algorithms-weka/

More Related Content

What's hot

Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 

What's hot (16)

IRJET - Comparison of Vedic, Wallac Tree and Array Multipliers
IRJET -  	  Comparison of Vedic, Wallac Tree and Array MultipliersIRJET -  	  Comparison of Vedic, Wallac Tree and Array Multipliers
IRJET - Comparison of Vedic, Wallac Tree and Array Multipliers
 
Efficient Intrusion Detection using Weighted K-means Clustering and Naïve Bay...
Efficient Intrusion Detection using Weighted K-means Clustering and Naïve Bay...Efficient Intrusion Detection using Weighted K-means Clustering and Naïve Bay...
Efficient Intrusion Detection using Weighted K-means Clustering and Naïve Bay...
 
Machine Learning Project - Neural Network
Machine Learning Project - Neural Network Machine Learning Project - Neural Network
Machine Learning Project - Neural Network
 
A Novel Optimization of Cloud Instances with Inventory Theory Applied on Real...
A Novel Optimization of Cloud Instances with Inventory Theory Applied on Real...A Novel Optimization of Cloud Instances with Inventory Theory Applied on Real...
A Novel Optimization of Cloud Instances with Inventory Theory Applied on Real...
 
IRJET- Study of Prediction Algorithms on Aviation Accident Dataset using Rapi...
IRJET- Study of Prediction Algorithms on Aviation Accident Dataset using Rapi...IRJET- Study of Prediction Algorithms on Aviation Accident Dataset using Rapi...
IRJET- Study of Prediction Algorithms on Aviation Accident Dataset using Rapi...
 
House price prediction
House price predictionHouse price prediction
House price prediction
 
IRJET- Privacy Preservation using Apache Spark
IRJET- Privacy Preservation using Apache SparkIRJET- Privacy Preservation using Apache Spark
IRJET- Privacy Preservation using Apache Spark
 
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...
 
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
 
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideCART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User Guide
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Ew36913917
Ew36913917Ew36913917
Ew36913917
 
IRJET- Study and Evaluation of Classification Algorithms in Data Mining
IRJET- Study and Evaluation of Classification Algorithms in Data MiningIRJET- Study and Evaluation of Classification Algorithms in Data Mining
IRJET- Study and Evaluation of Classification Algorithms in Data Mining
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithms
 
Feature Reduction Techniques
Feature Reduction TechniquesFeature Reduction Techniques
Feature Reduction Techniques
 
Data Analysis and Prediction System for Meteorological Data
Data Analysis and Prediction System for Meteorological DataData Analysis and Prediction System for Meteorological Data
Data Analysis and Prediction System for Meteorological Data
 

Similar to IRJET - Rainfall Forecasting using Weka Data Mining Tool

Similar to IRJET - Rainfall Forecasting using Weka Data Mining Tool (20)

IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification Algorithms
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms Comparison
 
IRJET - Smart Vet Locator for Hybrid Pets
IRJET -  	  Smart Vet Locator for Hybrid PetsIRJET -  	  Smart Vet Locator for Hybrid Pets
IRJET - Smart Vet Locator for Hybrid Pets
 
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGESCASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
 
AIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONAIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTION
 
Water Quality Index Calculation of River Ganga using Decision Tree Algorithm
Water Quality Index Calculation of River Ganga using Decision Tree AlgorithmWater Quality Index Calculation of River Ganga using Decision Tree Algorithm
Water Quality Index Calculation of River Ganga using Decision Tree Algorithm
 
Real Estate Investment Advising Using Machine Learning
Real Estate Investment Advising Using Machine LearningReal Estate Investment Advising Using Machine Learning
Real Estate Investment Advising Using Machine Learning
 
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data Mining
 
A Comparative Study for Anomaly Detection in Data Mining
A Comparative Study for Anomaly Detection in Data MiningA Comparative Study for Anomaly Detection in Data Mining
A Comparative Study for Anomaly Detection in Data Mining
 
Classification and Prediction Based Data Mining Algorithm in Weka Tool
Classification and Prediction Based Data Mining Algorithm in Weka ToolClassification and Prediction Based Data Mining Algorithm in Weka Tool
Classification and Prediction Based Data Mining Algorithm in Weka Tool
 
IRJET - Analysis of Crop Yield Prediction by using Machine Learning Algorithms
IRJET - Analysis of Crop Yield Prediction by using Machine Learning AlgorithmsIRJET - Analysis of Crop Yield Prediction by using Machine Learning Algorithms
IRJET - Analysis of Crop Yield Prediction by using Machine Learning Algorithms
 
Email Spam Detection Using Machine Learning
Email Spam Detection Using Machine LearningEmail Spam Detection Using Machine Learning
Email Spam Detection Using Machine Learning
 
IRJET - An Overview of Machine Learning Algorithms for Data Science
IRJET - An Overview of Machine Learning Algorithms for Data ScienceIRJET - An Overview of Machine Learning Algorithms for Data Science
IRJET - An Overview of Machine Learning Algorithms for Data Science
 
IRJET - An User Friendly Interface for Data Preprocessing and Visualizati...
IRJET -  	  An User Friendly Interface for Data Preprocessing and Visualizati...IRJET -  	  An User Friendly Interface for Data Preprocessing and Visualizati...
IRJET - An User Friendly Interface for Data Preprocessing and Visualizati...
 
Recuriter Recommendation System
Recuriter Recommendation SystemRecuriter Recommendation System
Recuriter Recommendation System
 
IRJET- Finding the Original Writer of an Anonymous Text using Naïve Bayes Cla...
IRJET- Finding the Original Writer of an Anonymous Text using Naïve Bayes Cla...IRJET- Finding the Original Writer of an Anonymous Text using Naïve Bayes Cla...
IRJET- Finding the Original Writer of an Anonymous Text using Naïve Bayes Cla...
 
IRJET- Stabilization of Black Cotton Soil using Rice Husk Ash and Lime
IRJET- Stabilization of Black Cotton Soil using Rice Husk Ash and LimeIRJET- Stabilization of Black Cotton Soil using Rice Husk Ash and Lime
IRJET- Stabilization of Black Cotton Soil using Rice Husk Ash and Lime
 
IRJET- Student Placement Prediction using Machine Learning
IRJET- Student Placement Prediction using Machine LearningIRJET- Student Placement Prediction using Machine Learning
IRJET- Student Placement Prediction using Machine Learning
 
A Comparative Study on Identical Face Classification using Machine Learning
A Comparative Study on Identical Face Classification using Machine LearningA Comparative Study on Identical Face Classification using Machine Learning
A Comparative Study on Identical Face Classification using Machine Learning
 
IRJET - Automated Fraud Detection Framework in Examination Halls
 IRJET - Automated Fraud Detection Framework in Examination Halls IRJET - Automated Fraud Detection Framework in Examination Halls
IRJET - Automated Fraud Detection Framework in Examination Halls
 

More from IRJET Journal

More from IRJET Journal (20)

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
 

Recently uploaded

Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
Tonystark477637
 

Recently uploaded (20)

University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spain
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 

IRJET - Rainfall Forecasting using Weka Data Mining Tool

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 5346 RAINFALL FORECASTING USING WEKA DATA MINING TOOL Abinaya P L1, Janani N2 1,2Department of Information Technology, Kumaraguru College of Technology, Tamil Nadu, India --------------------------------------------------------------------------***----------------------------------------------------------------------- ABSTRACT - Rain water, additionally called precipitation, is a characteristic element of the world's climate framework. India is a rural nation which to a great extent relies upon monsoon for irrigation purpose. Rainstorm have coordinated effect on agriculturists' life and harvest creation. The Indian economy is heavily dependent on agriculture and the job of the Indian agriculturist to a great extent relies upon the Monsoon downpours. Rainfall forecasting is essential and important for development of the nation. Weather factors including temperature, dew point temperature, humidity and pressure have been used to forecasts the rainfall. The dataset was collected from Kaggle and is classified using Naïve Bayes, K nearest Neighbour, Decision tree and Support vector machine. The Naïve Bayes has given 80.56% accuracy, K nearest Neighbour provides 93.96% accuracy, Decision tree gives 94.10% accuracy while Support vector machine has given 93.66% accuracy. Keywords: Rainfall Prediction, Decision tree, Naive Bayes, K Nearest Neighbour, SVM 1. INTRODUCTION Rainfall forecasting is essential since it facilitates to spot potential floods in future and to set up for higher water management which decides future irrigation desires. Weather forecasts may be categorised as: current forecasts that is forecasts up to few hours, Short term forecasts that is downfall forecasts is one to three days forecast, Forecasts for four to ten days are Medium vary forecasts and future forecasts are for over ten days. Short vary and medium vary downfall forecasts are necessary for flood forecasting and water resource management. Classification has been applied in several fields like detection frauds by banks and firms, by numerous service suppliers to predict their performance in future, to classify patients on the premise of their symptoms. In this paper we have used Naïve Bayes, K nearest Neighbour, Decision tree and Support vector machine to forecast rainfall. Classification trees are created using the concept of Gini Index, Bayes theorem is used in Naïve Bayes for prediction, Euclidean distance is used to compute Nearest neighbours and Support vector machine use Kernel trick. 2. NAÏVE BAYES Naive Bayes is a basic, yet viable and regularly utilized, machine learning classifier. The Naive Bayes algorithm is a basic probabilistic classifier. Naive Bayes classifier requires a number of parameters linear in the number of features in a learning problem and it is highly scalable. Classification may be Gaussian, Kernel, multi nominal. Here Bayes theorem is used which assumes all attributes to be independent where the class variables are given. Assuming distribution for weather information, Bayesian classifiers use Bayes theorem to search out posterior chances of occurrence of input data instance in all classes. This assumption is known as class conditional independence. The algorithm starts with computing prior probability, P(Ci) for each class. Then for an input data instance q, compute posterior probabilities for each class ( | ) ∏ ( | ) Where Fj are attributes for input sample q. Finally compute P(Ci|q) = P(q|Ci) * P(Ci) / ∑ ( ) 3. DECISION TREE ALGORITHM A decision tree could be a structure that has a root node, branches, and leaf nodes. It is used for both classification and regression problem. Every internal node denotes a test on the associate attribute, every branch denotes the end result of a test, and every leaf node holds a category label. The upmost node within the tree is the root node. Tree pruning is performed so as to get rid of anomalies due to noise or outliers. The cropped trees are smaller and fewer complicated. The Decision tree algorithm produces correct and explicable models with comparatively very little user intervention. This algorithm is often used for each binary and multiclass classification issues. The algorithm is quick, each at build time and apply time. The build method for decision tree is parallelized. Decision tree rating is particularly quick. The tree structure, created within the model build, is employed for a series of easy tests. Every check relies on one predictor. The algorithm is non-parametric and can efficiently manage vast, muddled datasets without forcing a convoluted parametric structure. At the point when the example measure is sufficiently huge, study about data can be partitioned into training and validation datasets. Utilizing the training dataset to assemble a decision tree model and a validation dataset to settle on the suitable tree measure expected to accomplish the optimal final display. Decision trees are now recently referred as Classification and Regression Trees (CART). The best attribute for splitting and constructing decision tree is found by using Gini index as
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 5347 a selection measure. In CART the impurities in the training dataset S is measured using Gini index. Gini(S)=1-∑ Where m denotes the class labels. The one with least impurity is considered to be the best option. This process is done for all the attributes. The best splitting attribute is one which has maximum ( ) ( ) = Gini(S) - GiniA(S) Where, GiniA(S) = |S1| / |S| Gini (S1) + |S2| / |S| Gini (S2) 4. K NEAREST NEIGHBOUR K nearest neighbours are referred as lazy learners where learning depends on relationship. The algorithm can't be utilized until the example for which neighbours are to be computed is accessible. KNN are processed for given instance t for which class name is to be anticipated. The features are wanted to be numeric in nature as neighbours are processed utilizing the distance metric. Euclidean distance is performed here for analysing, ( ) √∑ ( ) Where Y and Z represents n dimensional observations from training and test dataset respectively for which class is to be predicted. The decision for the instance is taken by majority voting. The input sample is considered based on the class which has maximum neighbours. Based on the size of the training dataset the value of K that is the number of neighbours is determined. 5. SUPPORT VECTOR MACHINE Support Vector Machine is a regulated machine learning calculation which can be utilized for both classification or regression difficulties. In any case, it is for the most part utilized in grouping issues. In this calculation, we plot every datum thing as a point in n-dimensional space where n represents the number of features, we have with the estimation of each component being the estimation of a specific facilitate. At that point, we perform classification by finding the hyper-plane that separate the two classes exceptionally well. Support Vectors are basically the co-ordinates of individual perception. Support Vector Machine is a border which best isolates the two classes that is hyper-plane/line. The least complex approach to isolate two gatherings of information is with a straight line, level plane or a N- dimensional hyperplane. In any case, there are circumstances where a nonlinear locale can isolate the gatherings all the more productively. SVM handles this by utilizing a kernel function (nonlinear) to map the information into an alternate space where a hyperplane can't be utilized to do the detachment. It implies a non- linear function is found out by a linear learning machine in a high-dimensional component space while the limit of the framework is controlled by a parameter that does not rely upon the dimensionality of the space. This is called kernel trick which implies the kernel function change the data into a higher dimensional element space to make it conceivable to play out the linear separation. In actuality, we probably won't have the capacity to drive a straight line between the classes which makes Support vector machines somewhat progressively confused however it's possible to characterize the most extreme edge hyperplane with Gaussian kernel. 6.PROCESS AND OUTCOMES A dataset is collected which provides the data from May 2016 to March 2018 of the specific city that is Jaipur in India. The dataset consists of 679 instances and 13 attributes. Running the algorithms utilizing Naïve Bayes we break down the classifier yield with such a large number of statistic-based yield by utilizing 10 cross validation to make a prediction of each example of the dataset. After running the algorithm, we accomplished a classification precision of 80.55% and the mean absolute error is 0.0446, time taken for building model is 0.01 seconds. Fig.1.Screenshot view for Naïve Bayes Algorithm J48 Tree has been utilized in this paper to choose the objective esteem dependent on different attributes of dataset and to characterize their accuracy. The output is based on 10 cross validation which produced an accuracy of 94.10% and the mean absolute error is 0.0227. The time taken to build the model is 0.01 seconds. Fig.2.Screenshot view for J48 Algorithm The K nearest neighbour is processed utilizing Euclidean distance. Here 10 cross validation is used to make
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 5348 prediction for every instance in the dataset. The algorithm has shown an output of 93.96% of accuracy. The mean absolute error is 0.0181 and the time taken to build the model is 0 seconds. Fig.3.Screenshot view for K Nearest Neighbour Algorithm By running the Support vector machine algorithm, we broke down the classifier yield with various statistics dependent on yield by utilizing 10 cross validation. The accuracy obtained for this algorithm is 93.66% with a mean absolute error of 0.1598 and it took 0.69 seconds to build the model. Fig.4.Screenshot view for Support Vector Machine Algorithm 7.CONCLUSION AND FUTURE WORK The main aim of this paper is to forecast rainfall using WEKA data mining tool. Classification which is a part of data mining is extremely helpful in discovering obscure patterns like forecasting the future trends. Here in this paper we have used four algorithms namely Naïve Bayes, Decision tree, K Nearest Neighbour and Support Vector Machine and the outputs were compared based on the accuracy achieved. The result has shown that the decision tree algorithm has given the highest accuracy of 94.10% when compared to other three algorithms. So, we can conclude that the Decision tree algorithm is the best classification algorithm for forecasting rainfall on the basis of given features in the dataset. REFERENCES: 1. Dhamodharan S, Liver Disease Prediction Using Bayesian Classification, Special Issues, 4th National Conference on Advance Computing, Application Technologies, May 2014 2. SolankiA.V., Data Mining Techniques using WEKA Classification for Sickle Cell Disease, International Journal of Computer Science and Information Technology,5(4): 58575860,2014. 3. Joshi J, Rinal D, Patel J, Diagnosis And Prognosis of Breast Cancer Using Classification Rules, International Journal of Engineering Research and General Science,2(6):315-323, October 2014. 4. David S. K., Saeb A. T., Al Rubeaan K., Comparative Analysis of Data Mining Tools and Classification Techniques using WEKA in Medical Bioinformatics, Computer Engineering and Intelligent Systems, 4(13):28-38,2013. 5. Vijayarani, S., Sudha, S., Comparative Analysis of Classification Function Techniques for Heart Disease Prediction, International Journal of Innovative Research in Computer and Communication Engineering, 1(3): 735- 741, 2013. 6.https://www.sciencedirect.com/science/article/pii/S0 925231202005775 7.https://docs.oracle.com/cd/B28359_01/datamine.111 /b28129/algo_decisiontree.htm#CACCJCEJ 8. Tina R. Patil, Mrs. S. S. Sherekar, Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification, International Journal Of Computer Science And Applications, Apr 2013 9. Nicholas I.Sapankeyych, Ravi Sankar, Time series prediction using support vector machines: A survey, IEEE Computational Intelligence Magazine, April 2009 10.https://machinelearningmastery.com/use- classification-machine-learning-algorithms-weka/