Rainfall is considered as one of the major components of the hydrological process; it takes
significant part in evaluating drought and flooding events. Therefore, it is important to have an
accurate model for rainfall prediction. Recently, several data-driven modeling approaches have
been investigated to perform such forecasting tasks as multilayer perceptron neural networks
(MLP-NN). In fact, the rainfall time series modeling (SARIMA) involvesimportant temporal
dimensions. In order to evaluate the incomes of both models, statistical parameters were used to
make the comparison between the two models. These parameters include the Root Mean Square
Error RMSE, Mean Absolute Error MAE, Coefficient Of Correlation CC and BIAS. Two-Third
of the data was used for training the model and One-third for testing.
Neural networks have gained a great deal of importance in the area of soft computing and are widely used in making predictions. The work presented in this paper is about the development of Artificial Neural Network (ANN) based models for the prediction of sugarcane yield in India. The ANN models have been experimented using different partitions of training patterns and different combinations of ANN parameters.
Experiments have also been conducted for different number of neurons in hidden layer and the algorithms for ANN training. For this work, data has been obtained from the website of Directorate of Economics and Statistics, Ministry of Agriculture, Government of India. In this work, the experiments have been conducted for 2160 different ANN models. The least Root Mean Square Error (RMSE) value that could be achieved on
test data was 4.03%. This has been achieved when the data was partitioned in such a way that there were 10% records in the test data, 10 neurons in hidden layer, learning rate was 0.001, the error goal was set to 0.01 and traincgb algorithm in MATLAB was used for ANN training.
CENTROG FEATURE TECHNIQUE FOR VEHICLE TYPE RECOGNITION AT DAY AND NIGHT TIMESijaia
This work proposes a feature-based technique to recognize vehicle types within day and night times. Support vector machine (SVM) classifier is applied on image histogram and CENsus Transformed
histogRam Oriented Gradient (CENTROG) features in order to classify vehicle types during the day and night. Thermal images were used for the night time experiments. Although thermal images suffer from low image resolution, lack of colour and poor texture information, they offer the advantage of being unaffected by high intensity light sources such as vehicle headlights which tend to render normal images unsuitable
for night time image capturing and subsequent analysis. Since contour is useful in shape based categorisation and the most distinctive feature within thermal images, CENTROG is used to capture this feature information and is used within the experiments. The experimental results so obtained were
compared with those obtained by employing the CENsus TRansformed hISTogram (CENTRIST). Experimental results revealed that CENTROG offers better recognition accuracies for both day and night times vehicle types recognition.
In current decades, Land Use (LU) and Land Cover (LC) classification is the most
challenging research area in the field of remote sensing. This research helps in
understanding the environmental changes for ensuring the sustainable development.
In this research, LU and LC classification assessed for Visakhapatnam city. After
collecting the satellite images, Hybrid Directional Lifting (HDL) technique was used
to remove the saturation and blooming effects in the input images. The pre-processed
satellite images were used for segmentation by applying Fuzzy C means (FCM)
clustering. Then, Local Binary Pattern (LBP) and Gray-level co-occurrence matrix
(GLCM) features were utilized to extract the features from the segmented satellite
images. After obtaining the feature information, a multi-class classifier: Adaptive
Neuro-Fuzzy Inference System (ANFIS) was used to classify the LU and LC classes;
water-body, vegetation, settlement, and barren land. The experimental outcome
showed that the proposed system effectively distinguishes the LU and LC classes by
means of sensitivity, specificity, and classification accuracy. The proposed system
enhances the classification accuracy up to 7% compared to the existing systems
Predictive Data Mining with Normalized Adaptive Training Method for Neural Ne...IJERDJOURNAL
Abstract:- Predictive data mining is an upcoming and fast-growing field and offers a competitive edge for the benefit of organization. In recent decades, researchers have developed new techniques and intelligent algorithms for predictive data mining. In this research paper, we have proposed a novel training algorithm for optimizing neural networks for prediction purpose and to utilize it for the development of prediction models. Models developed in MATLAB Neural Network Toolbox have been tested for insurance datasets taken from a live data warehouse. A comparative study of the proposed algorithm with other popular first and second order algorithms has been presented to judge the predictive accuracy of the suggested technique. Various graphs have been presented to analyse the convergence behaviour of different algorithms towards point of minimum error.
DATA MINING ATTRIBUTE SELECTION APPROACH FOR DROUGHT MODELLING: A CASE STUDY ...IJDKP
ABSTRACT
The objectives of this paper were to 1) develop an empirical method for selecting relevant attributes for modelling drought and 2) select the most relevant attribute for drought modelling and predictions in the Greater Horn of Africa (GHA). Twenty four attributes from different domain areas were used for this experimental analysis. Two attribute selection algorithms were used for the current study: Principal Component Analysis (PCA) and correlation-based attribute selection (CAS). Using the PCA and CAS algorithms, the 24 attributes were ranked by their merit value. Accordingly, 15 attributes were selected for modelling drought in GHA. The average merit values for the selected attributes ranged from 0.5 to 0.9. The methodology developed here helps to avoid the uncertainty of domain experts’ attribute selection
challenges, which are unsystematic and dominated by somewhat arbitrary trial. Future research may evaluate the developed methodology using relevant classification techniques and quantify the actual information gain from the developed approach.
Neural networks have gained a great deal of importance in the area of soft computing and are widely used in making predictions. The work presented in this paper is about the development of Artificial Neural Network (ANN) based models for the prediction of sugarcane yield in India. The ANN models have been experimented using different partitions of training patterns and different combinations of ANN parameters.
Experiments have also been conducted for different number of neurons in hidden layer and the algorithms for ANN training. For this work, data has been obtained from the website of Directorate of Economics and Statistics, Ministry of Agriculture, Government of India. In this work, the experiments have been conducted for 2160 different ANN models. The least Root Mean Square Error (RMSE) value that could be achieved on
test data was 4.03%. This has been achieved when the data was partitioned in such a way that there were 10% records in the test data, 10 neurons in hidden layer, learning rate was 0.001, the error goal was set to 0.01 and traincgb algorithm in MATLAB was used for ANN training.
CENTROG FEATURE TECHNIQUE FOR VEHICLE TYPE RECOGNITION AT DAY AND NIGHT TIMESijaia
This work proposes a feature-based technique to recognize vehicle types within day and night times. Support vector machine (SVM) classifier is applied on image histogram and CENsus Transformed
histogRam Oriented Gradient (CENTROG) features in order to classify vehicle types during the day and night. Thermal images were used for the night time experiments. Although thermal images suffer from low image resolution, lack of colour and poor texture information, they offer the advantage of being unaffected by high intensity light sources such as vehicle headlights which tend to render normal images unsuitable
for night time image capturing and subsequent analysis. Since contour is useful in shape based categorisation and the most distinctive feature within thermal images, CENTROG is used to capture this feature information and is used within the experiments. The experimental results so obtained were
compared with those obtained by employing the CENsus TRansformed hISTogram (CENTRIST). Experimental results revealed that CENTROG offers better recognition accuracies for both day and night times vehicle types recognition.
In current decades, Land Use (LU) and Land Cover (LC) classification is the most
challenging research area in the field of remote sensing. This research helps in
understanding the environmental changes for ensuring the sustainable development.
In this research, LU and LC classification assessed for Visakhapatnam city. After
collecting the satellite images, Hybrid Directional Lifting (HDL) technique was used
to remove the saturation and blooming effects in the input images. The pre-processed
satellite images were used for segmentation by applying Fuzzy C means (FCM)
clustering. Then, Local Binary Pattern (LBP) and Gray-level co-occurrence matrix
(GLCM) features were utilized to extract the features from the segmented satellite
images. After obtaining the feature information, a multi-class classifier: Adaptive
Neuro-Fuzzy Inference System (ANFIS) was used to classify the LU and LC classes;
water-body, vegetation, settlement, and barren land. The experimental outcome
showed that the proposed system effectively distinguishes the LU and LC classes by
means of sensitivity, specificity, and classification accuracy. The proposed system
enhances the classification accuracy up to 7% compared to the existing systems
Predictive Data Mining with Normalized Adaptive Training Method for Neural Ne...IJERDJOURNAL
Abstract:- Predictive data mining is an upcoming and fast-growing field and offers a competitive edge for the benefit of organization. In recent decades, researchers have developed new techniques and intelligent algorithms for predictive data mining. In this research paper, we have proposed a novel training algorithm for optimizing neural networks for prediction purpose and to utilize it for the development of prediction models. Models developed in MATLAB Neural Network Toolbox have been tested for insurance datasets taken from a live data warehouse. A comparative study of the proposed algorithm with other popular first and second order algorithms has been presented to judge the predictive accuracy of the suggested technique. Various graphs have been presented to analyse the convergence behaviour of different algorithms towards point of minimum error.
DATA MINING ATTRIBUTE SELECTION APPROACH FOR DROUGHT MODELLING: A CASE STUDY ...IJDKP
ABSTRACT
The objectives of this paper were to 1) develop an empirical method for selecting relevant attributes for modelling drought and 2) select the most relevant attribute for drought modelling and predictions in the Greater Horn of Africa (GHA). Twenty four attributes from different domain areas were used for this experimental analysis. Two attribute selection algorithms were used for the current study: Principal Component Analysis (PCA) and correlation-based attribute selection (CAS). Using the PCA and CAS algorithms, the 24 attributes were ranked by their merit value. Accordingly, 15 attributes were selected for modelling drought in GHA. The average merit values for the selected attributes ranged from 0.5 to 0.9. The methodology developed here helps to avoid the uncertainty of domain experts’ attribute selection
challenges, which are unsystematic and dominated by somewhat arbitrary trial. Future research may evaluate the developed methodology using relevant classification techniques and quantify the actual information gain from the developed approach.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
An Application of Genetic Programming for Power System Planning and OperationIDES Editor
This work incorporates the identification of model
in functional form using curve fitting and genetic programming
technique which can forecast present and future load
requirement. Approximating an unknown function with
sample data is an important practical problem. In order to
forecast an unknown function using a finite set of sample
data, a function is constructed to fit sample data points. This
process is called curve fitting. There are several methods of
curve fitting. Interpolation is a special case of curve fitting
where an exact fit of the existing data points is expected.
Once a model is generated, acceptability of the model must be
tested. There are several measures to test the goodness of a
model. Sum of absolute difference, mean absolute error, mean
absolute percentage error, sum of squares due to error (SSE),
mean squared error and root mean squared errors can be used
to evaluate models. Minimizing the squares of vertical distance
of the points in a curve (SSE) is one of the most widely used
method .Two of the methods has been presented namely Curve
fitting technique & Genetic Programming and they have been
compared based on (SSE)sum of squares due to error.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
Feature selection in high-dimensional datasets is
considered to be a complex and time-consuming problem. To
enhance the accuracy of classification and reduce the execution
time, Parallel Evolutionary Algorithms (PEAs) can be used. In
this paper, we make a review for the most recent works which
handle the use of PEAs for feature selection in large datasets.
We have classified the algorithms in these papers into four main
classes (Genetic Algorithms (GA), Particle Swarm Optimization
(PSO), Scattered Search (SS), and Ant Colony Optimization
(ACO)). The accuracy is adopted as a measure to compare the
efficiency of these PEAs. It is noticeable that the Parallel Genetic
Algorithms (PGAs) are the most suitable algorithms for feature
selection in large datasets; since they achieve the highest accuracy.
On the other hand, we found that the Parallel ACO is timeconsuming
and less accurate comparing with other PEA.
A BINARY BAT INSPIRED ALGORITHM FOR THE CLASSIFICATION OF BREAST CANCER DATAIJSCAI Journal
Advancement in information and technology has made a major impact on medical science where the
researchers come up with new ideas for improving the classification rate of various diseases. Breast cancer
is one such disease killing large number of people around the world. Diagnosing the disease at its earliest
instance makes a huge impact on its treatment. The authors propose a Binary Bat Algorithm (BBA) based
Feedforward Neural Network (FNN) hybrid model, where the advantages of BBA and efficiency of FNN is
exploited for the classification of three benchmark breast cancer datasets into malignant and benign cases.
Here BBA is used to generate a V-shaped hyperbolic tangent function for training the network and a fitness
function is used for error minimization. FNNBBA based classification produces 92.61% accuracy for
training data and 89.95% for testing data.
— The healthcare industry is considered one of the
largest industry in the world. The healthcare industry is same as
the medical industries having the largest amount of health related
and medical related data. This data helps to discover useful
trends and patters that can be used in diagnosis and decision
making. Clustering techniques like K-means, D-streams,
COBWEB, EM have been used for healthcare purposes like heart
disease diagnosis, cancer detection etc. This paper focuses on the
use of K-means and D-stream algorithm in healthcare. This
algorithms were used in healthcare to determine whether a
person is fit or unfit and this fitness decision was taken based on
his/her historical and current data. Both the clustering
algorithms were analyzed by applying them on patients current
biomedical historical databases, this analysis depends on the
attributes like peripheral blood oxygenation, diastolic arterial
blood pressure, systolic arterial blood pressure, heart rate,
heredity, obesity, and this fitness decision was taken based on
his/her historical and current data. Both the clustering
algorithms were analyzed by applying them on patients current
biomedical historical databases, this analysis depends on the
attributes like peripheral blood oxygenation, diastolic arterial
blood pressure, systolic arterial blood pressure, heart rate,
heredity, obesity, cigarette smoking. By analyzing both the
algorithm it was found that the Density-based clustering
algorithm i.e. the D-stream algorithm proves to give more
accurate results than K-means when used for cluster formation of
historical biomedical data. D-stream algorithm overcomes
drawbacks of K-means algorithm
A Novel Approach for Clustering Big Data based on MapReduce IJECEIAES
Clustering is one of the most important applications of data mining. It has attracted attention of researchers in statistics and machine learning. It is used in many applications like information retrieval, image processing and social network analytics etc. It helps the user to understand the similarity and dissimilarity between objects. Cluster analysis makes the users understand complex and large data sets more clearly. There are different types of clustering algorithms analyzed by various researchers. Kmeans is the most popular partitioning based algorithm as it provides good results because of accurate calculation on numerical data. But Kmeans give good results for numerical data only. Big data is combination of numerical and categorical data. Kprototype algorithm is used to deal with numerical as well as categorical data. Kprototype combines the distance calculated from numeric and categorical data. With the growth of data due to social networking websites, business transactions, scientific calculation etc., there is vast collection of structured, semi-structured and unstructured data. So, there is need of optimization of Kprototype so that these varieties of data can be analyzed efficiently.In this work, Kprototype algorithm is implemented on MapReduce in this paper. Experiments have proved that Kprototype implemented on Mapreduce gives better performance gain on multiple nodes as compared to single node. CPU execution time and speedup are used as evaluation metrics for comparison.Intellegent splitter is proposed in this paper which splits mixed big data into numerical and categorical data. Comparison with traditional algorithms proves that proposed algorithm works better for large scale of data.
ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...ijaia
The work in this paper shows intensive empirical experiments using 13 datasets to understand the regularization effectiveness of ridge regression, the lasso estimate, and elastic net regularization methods. The study offers a deep understanding of how the datasets affect the goodness of the prediction accuracy of each regularization method for a given problem given the diversity in the datasets used. The results have shown that datasets play crucial rules on the performance of the regularization method and that the
predication accuracy depends heavily on the nature of the sampled datasets.
June 2020: Most Downloaded Article in Soft Computing ijsc
Soft computing is likely to play an important role in science and engineering in the future. The successful applications of soft computing and the rapid growth suggest that the impact of soft computing will be felt increasingly in coming years. Soft Computing encourages the integration of soft computing techniques and tools into both everyday and advanced applications. This Open access peer-reviewed journal serves as a platform that fosters new applications for all scientists and engineers engaged in research and development in this fast growing field.
A hybrid approach for analysis of dynamic changes in spatial dataijdms
Any geographic location undergoes changes over a period of time. These changes can be observed by
naked eye, only if they are huge in number spread over a small area. However, when the changes are small
and spread over a large area, it is very difficult to observe or extract the changes. Presently, there are few
methods available for tackling these types of problems, such as GRID, DBSCAN etc. However, these
existing mechanisms are not adequate for finding an accurate changes or observation which is essential
with respect to most important geometrical changes such as deforestations and land grabbing etc.,. This
paper proposes new mechanism to solve the above problem. In this proposed method, spatial image
changes are compared over a period of time taken by the satellite. Partitioning the satellite image in to
grids, employed in the proposed hybrid method, provides finer details of the image which are responsible
for improving the precision of clustering compared to whole image manipulation, used in DBSCAN, at a
time .The simplicity of DBSCAN explored while processing portioned grid portion.
During the World War I ,II over fifty countries in the world today have been inherited a legacy of antipersonnel
landmines and unexploded ordnance (UXO) which represents a major threat to lives, and
hinders reconstruction and development efforts. Landmines have specific properties that make it harder to
detect .Therefore; these properties lead landmine detectors to become more complex. Many examples can
be found to address the increasing complexity of Landmines detection; unfortunately, these new techniques
are high of cost and need experts to deal with it. Many developing countries face financial difficulties to get
advanced technologies for detection landmines such as Robotic systems, this due to their high cost, use and
maintenance difficulties which makes them unaffordable to these countries. The safety of operators,
transportability, ease of maintenance and operation are the most factors that must take into consideration
to improve the applicability and effectiveness of landmines tracking systems.The aim of the study is to
proposed architecture of Intelligent Wireless Landmines Tracking System (IWLTS) with new decision
model based on fuzzy logic.To find an affordable, light and easy to use alternative which meet users’ needs
to protect and warn them from the risk of landmines during practice their lives, we suggested the design
and development of Fuzzy Inference Model for IWLTS using Smart Phone.Fuzzy model require three step
which are definitions of Linguistic Variable and fuzzy sets, determine fuzzy rules and the process of Fuzzy
Inference.Designed Fuzzy Inference Model gives both: Landmine risk value in percentage and alert to
avoid that risk.
The presentation has discussed comparatively among three SEM instruments which are (1) SAS CALIS procedure, (2) R's lavaan package, and (3) Mplus version 8.0 on MIDUS II dataset.
RAINFALL PREDICTION USING DATA MINING TECHNIQUES - A SURVEYcscpconf
Rainfall is considered as one of the major components of the hydrological process; it takes significant part in evaluating drought and flooding events. Therefore, it is important to have anaccurate model for rainfall prediction. Recently, several data-driven modeling approaches havebeen investigated to perform such forecasting tasks as multilayer perceptron neural networks
(MLP-NN). In fact, the rainfall time series modeling (SARIMA) involvesimportant temporal dimensions. In order to evaluate the incomes of both models, statistical parameters were used to
make the comparison between the two models. These parameters include the Root Mean Square Error RMSE, Mean Absolute Error MAE, Coefficient Of Correlation CC and BIAS. Two-Third of the data was used for training the model and One-third for testing.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
An Application of Genetic Programming for Power System Planning and OperationIDES Editor
This work incorporates the identification of model
in functional form using curve fitting and genetic programming
technique which can forecast present and future load
requirement. Approximating an unknown function with
sample data is an important practical problem. In order to
forecast an unknown function using a finite set of sample
data, a function is constructed to fit sample data points. This
process is called curve fitting. There are several methods of
curve fitting. Interpolation is a special case of curve fitting
where an exact fit of the existing data points is expected.
Once a model is generated, acceptability of the model must be
tested. There are several measures to test the goodness of a
model. Sum of absolute difference, mean absolute error, mean
absolute percentage error, sum of squares due to error (SSE),
mean squared error and root mean squared errors can be used
to evaluate models. Minimizing the squares of vertical distance
of the points in a curve (SSE) is one of the most widely used
method .Two of the methods has been presented namely Curve
fitting technique & Genetic Programming and they have been
compared based on (SSE)sum of squares due to error.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
Feature selection in high-dimensional datasets is
considered to be a complex and time-consuming problem. To
enhance the accuracy of classification and reduce the execution
time, Parallel Evolutionary Algorithms (PEAs) can be used. In
this paper, we make a review for the most recent works which
handle the use of PEAs for feature selection in large datasets.
We have classified the algorithms in these papers into four main
classes (Genetic Algorithms (GA), Particle Swarm Optimization
(PSO), Scattered Search (SS), and Ant Colony Optimization
(ACO)). The accuracy is adopted as a measure to compare the
efficiency of these PEAs. It is noticeable that the Parallel Genetic
Algorithms (PGAs) are the most suitable algorithms for feature
selection in large datasets; since they achieve the highest accuracy.
On the other hand, we found that the Parallel ACO is timeconsuming
and less accurate comparing with other PEA.
A BINARY BAT INSPIRED ALGORITHM FOR THE CLASSIFICATION OF BREAST CANCER DATAIJSCAI Journal
Advancement in information and technology has made a major impact on medical science where the
researchers come up with new ideas for improving the classification rate of various diseases. Breast cancer
is one such disease killing large number of people around the world. Diagnosing the disease at its earliest
instance makes a huge impact on its treatment. The authors propose a Binary Bat Algorithm (BBA) based
Feedforward Neural Network (FNN) hybrid model, where the advantages of BBA and efficiency of FNN is
exploited for the classification of three benchmark breast cancer datasets into malignant and benign cases.
Here BBA is used to generate a V-shaped hyperbolic tangent function for training the network and a fitness
function is used for error minimization. FNNBBA based classification produces 92.61% accuracy for
training data and 89.95% for testing data.
— The healthcare industry is considered one of the
largest industry in the world. The healthcare industry is same as
the medical industries having the largest amount of health related
and medical related data. This data helps to discover useful
trends and patters that can be used in diagnosis and decision
making. Clustering techniques like K-means, D-streams,
COBWEB, EM have been used for healthcare purposes like heart
disease diagnosis, cancer detection etc. This paper focuses on the
use of K-means and D-stream algorithm in healthcare. This
algorithms were used in healthcare to determine whether a
person is fit or unfit and this fitness decision was taken based on
his/her historical and current data. Both the clustering
algorithms were analyzed by applying them on patients current
biomedical historical databases, this analysis depends on the
attributes like peripheral blood oxygenation, diastolic arterial
blood pressure, systolic arterial blood pressure, heart rate,
heredity, obesity, and this fitness decision was taken based on
his/her historical and current data. Both the clustering
algorithms were analyzed by applying them on patients current
biomedical historical databases, this analysis depends on the
attributes like peripheral blood oxygenation, diastolic arterial
blood pressure, systolic arterial blood pressure, heart rate,
heredity, obesity, cigarette smoking. By analyzing both the
algorithm it was found that the Density-based clustering
algorithm i.e. the D-stream algorithm proves to give more
accurate results than K-means when used for cluster formation of
historical biomedical data. D-stream algorithm overcomes
drawbacks of K-means algorithm
A Novel Approach for Clustering Big Data based on MapReduce IJECEIAES
Clustering is one of the most important applications of data mining. It has attracted attention of researchers in statistics and machine learning. It is used in many applications like information retrieval, image processing and social network analytics etc. It helps the user to understand the similarity and dissimilarity between objects. Cluster analysis makes the users understand complex and large data sets more clearly. There are different types of clustering algorithms analyzed by various researchers. Kmeans is the most popular partitioning based algorithm as it provides good results because of accurate calculation on numerical data. But Kmeans give good results for numerical data only. Big data is combination of numerical and categorical data. Kprototype algorithm is used to deal with numerical as well as categorical data. Kprototype combines the distance calculated from numeric and categorical data. With the growth of data due to social networking websites, business transactions, scientific calculation etc., there is vast collection of structured, semi-structured and unstructured data. So, there is need of optimization of Kprototype so that these varieties of data can be analyzed efficiently.In this work, Kprototype algorithm is implemented on MapReduce in this paper. Experiments have proved that Kprototype implemented on Mapreduce gives better performance gain on multiple nodes as compared to single node. CPU execution time and speedup are used as evaluation metrics for comparison.Intellegent splitter is proposed in this paper which splits mixed big data into numerical and categorical data. Comparison with traditional algorithms proves that proposed algorithm works better for large scale of data.
ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...ijaia
The work in this paper shows intensive empirical experiments using 13 datasets to understand the regularization effectiveness of ridge regression, the lasso estimate, and elastic net regularization methods. The study offers a deep understanding of how the datasets affect the goodness of the prediction accuracy of each regularization method for a given problem given the diversity in the datasets used. The results have shown that datasets play crucial rules on the performance of the regularization method and that the
predication accuracy depends heavily on the nature of the sampled datasets.
June 2020: Most Downloaded Article in Soft Computing ijsc
Soft computing is likely to play an important role in science and engineering in the future. The successful applications of soft computing and the rapid growth suggest that the impact of soft computing will be felt increasingly in coming years. Soft Computing encourages the integration of soft computing techniques and tools into both everyday and advanced applications. This Open access peer-reviewed journal serves as a platform that fosters new applications for all scientists and engineers engaged in research and development in this fast growing field.
A hybrid approach for analysis of dynamic changes in spatial dataijdms
Any geographic location undergoes changes over a period of time. These changes can be observed by
naked eye, only if they are huge in number spread over a small area. However, when the changes are small
and spread over a large area, it is very difficult to observe or extract the changes. Presently, there are few
methods available for tackling these types of problems, such as GRID, DBSCAN etc. However, these
existing mechanisms are not adequate for finding an accurate changes or observation which is essential
with respect to most important geometrical changes such as deforestations and land grabbing etc.,. This
paper proposes new mechanism to solve the above problem. In this proposed method, spatial image
changes are compared over a period of time taken by the satellite. Partitioning the satellite image in to
grids, employed in the proposed hybrid method, provides finer details of the image which are responsible
for improving the precision of clustering compared to whole image manipulation, used in DBSCAN, at a
time .The simplicity of DBSCAN explored while processing portioned grid portion.
During the World War I ,II over fifty countries in the world today have been inherited a legacy of antipersonnel
landmines and unexploded ordnance (UXO) which represents a major threat to lives, and
hinders reconstruction and development efforts. Landmines have specific properties that make it harder to
detect .Therefore; these properties lead landmine detectors to become more complex. Many examples can
be found to address the increasing complexity of Landmines detection; unfortunately, these new techniques
are high of cost and need experts to deal with it. Many developing countries face financial difficulties to get
advanced technologies for detection landmines such as Robotic systems, this due to their high cost, use and
maintenance difficulties which makes them unaffordable to these countries. The safety of operators,
transportability, ease of maintenance and operation are the most factors that must take into consideration
to improve the applicability and effectiveness of landmines tracking systems.The aim of the study is to
proposed architecture of Intelligent Wireless Landmines Tracking System (IWLTS) with new decision
model based on fuzzy logic.To find an affordable, light and easy to use alternative which meet users’ needs
to protect and warn them from the risk of landmines during practice their lives, we suggested the design
and development of Fuzzy Inference Model for IWLTS using Smart Phone.Fuzzy model require three step
which are definitions of Linguistic Variable and fuzzy sets, determine fuzzy rules and the process of Fuzzy
Inference.Designed Fuzzy Inference Model gives both: Landmine risk value in percentage and alert to
avoid that risk.
The presentation has discussed comparatively among three SEM instruments which are (1) SAS CALIS procedure, (2) R's lavaan package, and (3) Mplus version 8.0 on MIDUS II dataset.
RAINFALL PREDICTION USING DATA MINING TECHNIQUES - A SURVEYcscpconf
Rainfall is considered as one of the major components of the hydrological process; it takes significant part in evaluating drought and flooding events. Therefore, it is important to have anaccurate model for rainfall prediction. Recently, several data-driven modeling approaches havebeen investigated to perform such forecasting tasks as multilayer perceptron neural networks
(MLP-NN). In fact, the rainfall time series modeling (SARIMA) involvesimportant temporal dimensions. In order to evaluate the incomes of both models, statistical parameters were used to
make the comparison between the two models. These parameters include the Root Mean Square Error RMSE, Mean Absolute Error MAE, Coefficient Of Correlation CC and BIAS. Two-Third of the data was used for training the model and One-third for testing.
Multi-task learning using non-linear autoregressive models and recurrent neur...IJECEIAES
Tide level forecasting plays an important role in environmental management and development. Current tide level forecasting methods are usually implemented for solving single task problems, that is, a model built based on the tide level data at an individual location is only used to forecast tide level of the same location but is not used for tide forecasting at another location. This study proposes a new method for tide level prediction at multiple locations simultaneously. The method combines nonlinear autoregressive moving average with exogenous inputs (NARMAX) model and recurrent neural networks (RNNs), and incorporates them into a multi-task learning (MTL) framework. Experiments are designed and performed to compare single task learning (STL) and MTL with and without using non-linear autoregressive models. Three different RNN variants, namely, long short- term memory (LSTM), gated recurrent unit (GRU) and bidirectional LSTM (BiLSTM) are employed together with non-linear autoregressive models. A case study on tide level forecasting at many different geographical locations (5 to 11 locations) is conducted. Experimental results demonstrate that the proposed architectures outperform the classical single-task prediction methods.
ANN Modeling of Monthly and Weekly Behaviour of the Runoff of Kali River Catc...IOSR Journals
Model is a system, by whose operation; the characteristics of other similar systems can be ascertained. Experimental observation made on a model bear a definite relationship with prototype. So, the model analysis or modeling is actually an experimental method of finding solution of complex flow problems like surface water modeling, sub-surface water modeling etc. Many flow situations are not amenable to theoretical analysis. Modeling is a valuable means of obtaining better understanding of particular situation. Inspired by the functioning of the brain and biological nervous system, Artificial Neural Networks (ANNs) has been applied to various hydrological problems in last two decades. In this study, two ANN models using feed forward – back propagation network are developed to correlate a relationship between rainfall and runoff on monthly and weekly basis for Kali river catchment up to Supa dam in Uttara Kannada District of Karnataka State, India. The developed two models are compared and evaluated using standard statistical parameters to know strength and weaknesses. This performance can be further refined by incorporating more input parameters of catchment properties like soil moisture index; land use and land cover details etc.
EFFICACY OF NEURAL NETWORK IN RAINFALL-RUNOFF MODELLING OF BAGMATI RIVER BASINIAEME Publication
In this paper, rainfall-runoff model of Bagmati river basin has been developed
using the ANN Technique. Three-layered fced forward network structure with back
propagation algorithm was used to train the ANN model. Different combinations of
rainfall and runoff were considered as input to the network and trained by BP
algorithm with different error tolerance, learning parameter, number of cycles and
number of hidden layers. The sensitivity of the prediction accuracy to the number of
hidden layer neurons in a back error propagation algorithm was used for the study.
The monthly rainfall and runoff data from 2000 to 2009 of Bagmati river basin has
been considered for the development of ANN model. Performance evaluation of the
model has been done by using statistical parameters. Three sets of data have been
used to make several combination of year keeping in view the highest peaks of
hydrographs. First set of data used was from 2000 to 2006 for the calibration and
from 2007 to 2009 for validation. The second set of data was from 2004 to 2009 for
calibration and from 2000 to 2003 for validation. The Third set of data was from 2000
to 2009 for calibration and from 2007 to 2009 for validation. It was found that the
third set of data gave better result than other two sets of data. The study demonstrates
the applicability of ANN approach in developing effective non-linear models of
Rainfall-Runoff process without the need to explicitly representing the internal
hydraulic structure of the watershed
IMPROVED NEURAL NETWORK PREDICTION PERFORMANCES OF ELECTRICITY DEMAND: MODIFY...csandit
Accurate prediction of electricity demand can bring extensive benefits to any country as the
forecast values help the relevant authorities to take decisions regarding electricity generation,
transmission and distribution much appropriately. The literature reveals that, when compared
to conventional time series techniques, the improved artificial intelligent approaches provide
better prediction accuracies. However, the accuracy of predictions using intelligent approaches
like neural networks are strongly influenced by the correct selection of inputs and the number of
neuro-forecasters used for prediction. This research shows how a cluster analysis performed to
group similar day types, could contribute towards selecting a better set of neuro-forecasters in
neural networks. Daily total electricity demands for five years were considered for the analysis
and each date was assigned to one of the thirteen day-types, in a Sri Lankan context. As a
stochastic trend could be seen over the years, prior to performing the k-means clustering, the
trend was removed by taking the first difference of the series. Three different clusters were
found using Silhouette plots, and thus three neuro-forecasters were used for predictions. This
paper illustrates the proposed modified neural network procedure using electricity demand
data.
A Time Series ANN Approach for Weather Forecastingijctcm
Weather forecasting is most challenging problem around the world. There are various reason because of its experimented values in meteorology, but it is also a typical unbiased time series forecasting problem in scientific research. A lots of methods proposed by various scientists. The motive behind research is to predict more accurate. This paper contribute the same using artificial neural network (ANN) and simulated in MATLAB to predict two important weather parameters i.e. maximum and minimum temperature. The model has been trained using past 60 years of real data collected from(1901-1960) and tested over 40 years to forecast maximum and minimum temperature. The results based on mean square error function (MSE) confirm, this model which is based on multilayer perceptron has the potential to successful application to weather forecasting
Applications of Artificial Neural Networks in Civil EngineeringPramey Zode
An artificial brain-like network based on certain mathematical algorithms developed using a numerical computing environment is called as an ‘Artificial Neural Network (ANN)’. Many civil engineering problems which need understanding of physical processes are found to be time consuming and inaccurate to evaluate using conventional approaches. In this regard, many ANNs have been seen as a reliable and practical alternative to solve such problems. Literature review reveals that ANNs have already being used in solving numerous civil engineering problems. This study explains some cases where ANNs have been used and its future scope is also discussed.
Comparative Study of Machine Learning Algorithms for Rainfall Predictionijtsrd
Majority of Indian framers depend on rainfall for agriculture. Thus, in an agricultural country like India, rainfall prediction becomes very important. Rainfall causes natural disasters like flood and drought, which are encountered by people across the globe every year. Rainfall prediction over drought regions has a great importance for countries like India whose economy is largely dependent on agriculture. A sufficient data length can play an important role in a proper estimation drought, leading to a better appraisal for drought risk reduction. Due to dynamic nature of atmosphere statistical techniques fail to provide good accuracy for rainfall prediction. So, we are going to use Machine Learning algorithms like Multiple Linear Regression, Random Forest Regressor and AdaBoost Regressor, where different models are going to be trained using training data set and tested using testing data set. The dataset which we have collected has the rainfall data from 1901 2015, where across the various drought affected states. Nonlinearity of rainfall data makes Machine Learning algorithms a better technique. Comparison of different approaches and algorithms will increase an accuracy rate of predicting rainfall over drought regions. We are going to use Python to code for algorithms. Intention of this project is to say, which algorithm can be used to predict rainfall, in order to increase the countries socioeconomic status. Mylapalle Yeshwanth | Palla Ratna Sai Kumar | Dr. G. Mathivanan M.E., Ph.D ""Comparative Study of Machine Learning Algorithms for Rainfall Prediction"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-3 , April 2019, URL: https://www.ijtsrd.com/papers/ijtsrd22961.pdf
Paper URL: https://www.ijtsrd.com/computer-science/data-miining/22961/comparative-study-of-machine-learning-algorithms-for-rainfall-prediction/mylapalle-yeshwanth
In this paper, we propose a models of process chain and knowledge-based of meteorological reanalysisdatasets that help scientists, working in the field of climate and in particular of the rainfall evolution, tosolve uncertainty of spatial resources (data, process) to monitor the rainfall evolution. Indeed, rainfallevolution mobilizes all research, various methods of meteorological reanalysis datasets processing areproposed. Meteorological reanalysis datasets available, at present, are voluminous and heterogeneous interms of source, spatial and temporal resolutions
MODELING PROCESS CHAIN OF METEOROLOGICAL REANALYSIS PRECIPITATION DATA USING ...Zac Darcy
In this paper, we propose a models of process chain and knowledge-based of meteorological reanalysis datasets that help scientists, working in the field of climate and in particular of the rainfall evolution, to solve uncertainty of spatial resources (data, process) to monitor the rainfall evolution. Indeed, rainfall evolution mobilizes all research, various methods of meteorological reanalysis datasets processing are proposed. Meteorological reanalysis datasets available, at present, are voluminous and heterogeneous in terms of source, spatial and temporal resolutions. The use of these meteorological reanalysis datasets may solve uncertainty of data. In addition, phenomena such as rainfall evolution require the analysis of time series of meteorological reanalysis datasets and the development of automated and reusable processing chains for monitoring rainfall evolution. We propose to formalize these processing chains from modeling an abstract and concrete models based on existing standards in terms of interoperability. These processing chains modelled will be capitalized, and diffusible in operational environments. Our modeling approach uses Work-Context concepts. These concepts need organization of human resources, data, and process in order to establish a knowledge-based connecting the two latter. This knowledge based will be used to solve uncertainty of meteorological reanalysis datasets resources for monitoring rainfall evolution.
Modeling Process Chain of Meteorological Reanalysis Precipitation Data Using ...Zac Darcy
In this paper, we propose a models of process chain and knowledge-based of meteorological reanalysis
datasets that help scientists, working in the field of climate and in particular of the rainfall evolution, to
solve uncertainty of spatial resources (data, process) to monitor the rainfall evolution. Indeed, rainfall
evolution mobilizes all research, various methods of meteorological reanalysis datasets processing are
proposed. Meteorological reanalysis datasets available, at present, are voluminous and heterogeneous in
terms of source, spatial and temporal resolutions. The use of these meteorol
MODELING PROCESS CHAIN OF METEOROLOGICAL REANALYSIS PRECIPITATION DATA USING ...ijitmcjournal
In this paper, we propose a models of process chain and knowledge-based of meteorological reanalysis datasets that help scientists, working in the field of climate and in particular of the rainfall evolution, to solve uncertainty of spatial resources (data, process) to monitor the rainfall evolution. Indeed, rainfall evolution mobilizes all research, various methods of meteorological reanalysis datasets processing are proposed. Meteorological reanalysis datasets available, at present, are voluminous and heterogeneous in terms of source, spatial and temporal resolutions. The use of these meteorological reanalysis datasets may solve uncertainty of data. In addition, phenomena such as rainfall evolution require the analysis of time series of meteorological reanalysis datasets and the development of automated and reusable processing chains for monitoring rainfall evolution. We propose to formalize these processing chains from modeling an abstract and concrete models based on existing standards in terms of interoperability. These processing chains modelled will be capitalized, and diffusible in operational environments. Our modeling approach uses Work-Context concepts. These concepts need organization of human resources, data, and process in order to establish a knowledge-based connecting the two latter. This knowledge based will be used to solve uncertainty of meteorological reanalysis datasets resources for monitoring rainfall evolution.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
2. 24
Computer Science & Information Technology (CS & IT)
are no predetermined notions about what will constitute an "interesting" outcome. Data mining is
the search for new, valuable, and nontrivial information in large volumes of data. Best results
are achieved by balancing the knowledge of human experts in describing problems and goals with
the search capabilities of computers. In practice, the two primary goals of data mining tend to be
classification and prediction. Prediction [1] [2]involves using some variables or fields in the
dataset to predict unknown or future values of other variables of interest. Classification [3] refers
to the task of analyzing a set of pre-classified data objects to learn a model (or a function) that can
be used to classify unseen data object into one of several predefined classes. Description, on the
other hand, focuses on finding patterns describing the data that can be interpreted by humans. For
a given a collection of records (training set) each record contains a set of attributes, out of which
one of the attribute is the class attribute or class variable. Other attributes are often called
independent or predictor attributes (or variables). The set of examples used to learn the
classification model is called the training data set. We need to find a model for class attribute as a
function of the values of other attributes. Further previously unseen records should be assigned a
class as accurately as possible. A test set is used to determine the accuracy of the model. Usually,
the given data set is divided into training and test sets, training set used to build the model and test
set used to validate it.
2. LITERATURE SURVEY
The rainfall forecasting problem has been traditionally tackled using linear techniques, such as
auto regressive (AR),Autoregressive–moving-average model with exogenous inputs (ARMAX),
and KalmanFilter(KF), but also using nonlinear regression [5][6]Box and Jenkins, (1970)[36].
Most of the forecasting methods consider one day ahead forecast. For the rainfall a longer term
forecast such as ten days ahead or a month ahead is more of interest, though it is more difficult
than the one day ahead problem. In fact, there are several considerable drawbacks to the use of
KF in rainfall forecasting application. These include (1) the necessity of accurate stochastic
modeling, which may not be possible in the case for rainfall; (2) the requirement for a priori
information of the system measurement and develop covariance matrices for each new pattern,
which could be challenging to accurately determine and (3) the weak observability of some of
temporal pattern states that may lead to unstable estimates for the forecasted value [4].
In this context, motivation for utilizing non-linear modeling approach based on the Artificial
Intelligence (AI) techniques has received considerable attention from the hydrologists in the last
two decades [7]. A study was conducted on detection of nonlinear response and damage
detection on signal processing, and concluded that artificial neural networks (ANN) can be used
for modeling and forecasting nonlinear time series. Recently ,numerous ANN-based rainfallrunoff models have been proposed to forecast stream flow [8][9][10] and reservoir inflow. In
addition, neural networks and fuzzy logic have been used as effective modeling tools in different
environmental processes such as waste water treatment, water treatment and air pollution. Several
water quality prediction models have been developed utilizing ANN and Adaptive neuro fuzzy
inference system (ANFIS)methods[11][12][19].Rainfall-runoff models utilizing ANN model
showed significant level accuracy if compared with traditional regression models, used an
artificial neural networks to predict the performance of a membrane bioreactor. They were able to
estimate concentrations of chemical oxygen demand, phosphate, ammonia and nitrate.
Altunkaynak et al. (2005)[30] used fuzzy logic modeling to forecast dissolved oxygen
concentration and compared the accuracy of fuzzylogic modeling and autoregressive integrated
moving average (ARIMA)[13] models in predicting water consumption in a city. They found that
relative error rates for fuzzy logic and ARIMA were 2.5 and 2.8, respectively.
An Artificial Neural Network (ANN) is a flexible mathematical structure, which is capable of
identifying complex non-linear relationship between input and output data set. ANN models have
3. Computer Science & Information Technology (CS & IT)
25
been found useful and efficient, particularly in problems for which the characteristics of the
processes are difficult to describe using physical equations[11]. Artificial neural networks (ANN)
are a class of models, inspired by biological nervous systems, denoted by neurons, and working in
parallel. The elements are connected by synoptic weights, which are allowed to adapt through a
learning process. Now a days, neural networks are applied in hydrological [13], pattern
recognition, vision, speech recognition, classification, and control systems.
Hall et al. (1993, 1999)[30] and Hsu et al. (1995) [31] have applied artificial neural network for
rainfall- runoff modeling. Goswamiet al. [35] have used ANNs with three layers, namely, input
layer, hidden layer and output layer for experimental forecasts of all India Summer Monsoon
Rainfall. French et.al.(1992) [31] discussed on rainfall forecasting using neural networks. Here an
attempt to represent the rainfall process in terms of a single–hidden layer feed forward Neural
Network is made (Kulshrestha et al , 2006)[31][33].
Hung, N.Q. et.al (2009)[33], studied artificial neural networks models for rainfall forecasting.
Somvanshi, V.K. et.al. (2006)[33]studied the modeling and prediction of rainfall using artificial
neural networks and Box-Jenkins methodology. Other applications of ANN in hydrology are
forecasting daily water demands and flow forecasting.
3. SCOPE OF THE PROBLEM
The following are the main objectives of the study considered in this problem.
Building of SARIMA models to forecast monthly rainfall in Coastal Andhra, Telangana and
Rayalaseema regions in Andhra Pradesh state in India using Box-Jenkins methodology.
Building feed-forward neural networks models to forecast monthly rainfall in Coastal Andhra,
Telangana and Rayalaseema regions in Andhra Pradesh state in India.
A comparative study is carried out to investigate the forecasting capability of feed-forward neural
networks model and Box-Jenkins methods, which are among those forecasting models most
successfully applied in practice. This study investigates application of neural networks models
and the results of which will be compared with those obtained by Box-Jenkins method.
The FFNN model performance is evaluated using percentage better comparison with SARIMA
models.
4. DATASET
Monthly rainfall(in MM) data for Coastal Andhra, Telangana and Rayalaseema regions in Andhra
Pradesh state during the years 1871-2011 is collected from Climatology & Hydrometeorology
Division, Indian Institute of Tropical Meteorology (IITM), Pune, India. This data consists of
1680 monthly observations, in which 140 years of data during 1871-2005 is used for model fitting
and remaining 6 years of data during 2006-2011 is used as out-of-sample set to measure the
predictability of the selected model using mean absolute error and root mean squared error
statistics.
5. FORECASTING METHODS
Neuron model- The multilayer perception neural network is built up of simple components. In
the beginning, we will describe a single input neuron which will then be extended to multiple
inputs. Next, we will stack these neurons together to produce layers [4]. Finally, the layers are
cascaded together to form the network.
4. 26
Computer Science & Information Technology (CS & IT)
Single-input neuron: A single-input neuron is shown in Figure. 1. The scalar input p is
multiplied by the scalar weight W to form Wp, one of the terms that is sent to the summer. The
other input1, is multiplied by a bias band then passed to the summer. The summer output n often
referred to as the net input, goes into a transfer function f which produces the scalar neuron output
a (sometimes "activation function" is used rather than transfer function and offset rather than
bias).
Figure 1. Single neuron
In Figure 1, both w and b are adjustable scalar parameters of the neuron. Typically the transfer
function is chosen by the designer and then the parameters w and b will be adjusted by some
learning rule so that the neuron input/output relationship meet some specific goal. The transfer
function in Figure 1 may be a linear or nonlinear function of n. A particular transfer function is
chosen to satisfy some specification of the problem that the neuron is attempting to solve. One of
the most commonly used functions is the log-sigmoid transfer function.
Multiple-input neuron: Typically, a neuron has more than one input. A neuron with R inputs is
shown in Figure 2. The individual inputs p1, p2,……, pg are each
Figure 2. Multi neuron
Weighted by corresponding elements W1,1 W1,2,….W1,R of the weight matrix W. The neuron has a
bias b, which is summed with the weight inputs to form the net input n:
n =W1,1P1+W1,2P2+... +W1,RPR +b
(3)
This expression can be written in matrix form as:
5. Computer Science & Information Technology (CS & IT)
n =Wp+b
27
(4)
Where the matrix W is the single neuron case has only one row. Now the neuron output can be
written as:
a =f(Wp+b) (5)
A particular convention in assigning the indices of the elements of the weight matrix has been
adopted [4].The first index indicates the particular neuron destination for the weight. The second
index indicates the source of the signal fed to the neuron. Thus, the indices in W1,2 say that this
weight represents the connection to the first (and only) neuron from the second source [4]. A
multiple-input neuron using abbreviated notation is shown in Figure 2.As shown in Figure 2, the
input vector p is represented by the solid vertical bar at left. The dimensions of p are displayed
below the variable as Rx1, indicating that the input is a single vector of R elements. These inputs
go to the weight matrix W, which has R columns but only one row in this single neuron case. A
constant 1 enters the neuron as input and is multiplied by a scalar bias b. The net input to the
transfer function f is n, which is the sum of the bias b and the product Wp. The neuron's output is
a scalar in this case. If there exit more than one neuron, thenetwork output would be a vector.
6. MULTILAYER PERCEPTRONS
A multilayer feed forward neural network is an interconnection of perceptrons in which data and
calculations flow in a single direction, from the input data to the output. The number of layers in
a neural network is the number of layers of perceptrons. MLP- neural networks consist of units
arranged in layers [14][24][28]. Each layer is composed of nodes and in the fully connected
network considered here each node connects to every node in subsequent layers. Each MLP is
composed of a minimum of three layers consisting of an input layer, one or more hidden layers
and an output layer. The input layer distributes the inputs to subsequent layers. Input nodes have
linear activation functions and no thresholds. Each hidden unit node and each output node have
thresholds associated with them in addition to the weights. The hidden unit nodes have nonlinear
activation functions and the outputs have linear activation functions. Hence, each signal feeding
into a node in a subsequent layer has the original input multiplied by a weight with a threshold
added and then is passed through an activation function that may be linear or nonlinear(hidden
units). Neural networks allows flexibility in modeling real world complex relationships and able
to estimate the posterior probability, which provides the basis for establishing classification rule
and performing statistical analysis [13]
Time Series forecasting
The Box-Jenkins method [36] is one of the most widely used time series forecasting methods in
practice. It is also one of the most popular models in traditional time series forecasting and is
often used as a benchmark model for comparison with any other forecasting method.
It is often difficult to identify a forecasting model because the underlying laws may not be clearly
understood. In addition, hydrological time series may display signs of seasonality and
nonlinearity which traditional linear forecasting techniques are ill equipped to handle, often
producing unsatisfactory results. Researchers confronted with problems of this nature
increasingly resort to techniques that are heuristic and non-linear. Such techniques include the use
of neural networks models.
6. 28
Computer Science & Information Technology (CS & IT)
ARIMA- ARIMA and SARIMA models are extensions of ARMA class in order to
include more realistic dynamics, in particular, respectively, nonstationary in mean and
seasonal behaviors.
In practice, many economic time series are nonstationary in mean and they can be modeled only
by removing the nonstationary source of variation. Often this is done by differencing the series.
Suppose Xt isnonstationary in mean, the idea is to build an ARMA model on the series wt,
definable as the result of the operation of differencing the series d times (in general d = 1): wt =
∆d Xt. Hence, ARIMA models (where I stays for integrated) are the ARMA models defined on the
d-th difference of the original process:
Ф(B)∆d Xt. = θ(B) ܽt
Where Ф(B) ∆d is called generalized autoregressive operator and ∆d Xt is a quantity made
stationary through the differentiation and can be modeled with an ARMA.
Consider a few examples:
•
•
ARIMA ( 0,1,1 ) is ∆d Xt = ܽt – θ1 ܽt-1 → the first difference of Xt is modeled as
MA (1).
ARIMA ( 1,1,0 ) is ( 1 – ф1B) ∆Xt = ܽt → the first difference of Xt is modeled as
AR (1).
Note that in this case:
( 1 – ф1B) ( 1 – B) Xt =ܽt
( 1 – B - ф1B + ф1 B2 ) Xt =ܽt
ሾ1 − ሺ 1 + фଵ ሻ + ܤфଵ ܤଶ ሿ= ܽt
The last equation shows that ARIMA (1, 1, 0) is like an AR(2) where фଶ = - ф1 and фଶ + ф1 = 1.
This reveals that, as we knew in advance, the stationary constraint does not hold.
Seasonal ARIMA - Often time series possess a seasonal component that repeats every
observations. For monthly observations s = 12 (12 in 1 year), for quarterly observations s = 4 ( 4
in 1 year ). In order to deal with seasonality, ARIMA processes have been generalized: SARIMA
models have been formulated.
Ф(B)∆d Xt. = θ(B) ∝௧
Where ∝௧ is such that
௦
௦
sф ( ܤሻ ∆௦ ∝௧ = sѲ ( ܽ ) ܤt
hence
ф ( B )s ф ( ܤ௦ ሻ∆ ∆d Xt = θ(B) sѲ ( ܤ௦ ) ∝௧
௦
and we write Xt~ ARIMA ( p, d, q ) × ( P, D, Q ) s . The idea is that SARIMA are ARIMA
(p,d,q) models whose residuals ∝௧ are ARIMA ( P, D, Q ). With ARIMA ( P, D, Q ) we intend
ARIMA models whose operators are defined on ܤ௦ and successive powers.
Concepts of admissible regions SARIMA are analog to the admissible regions for ARIMA
processes; they are just expressed in terms of ܤ௦ powers.
Now, consider some examples (specialcases):
7. Computer Science & Information Technology (CS & IT)
29
1. Xt = ܽt – sѲଵ ܽt – 12 is ARIMA ( 0, 0, 0 ) × ( 0, 0, 1 ) 12 . There is only one seasonal
MA component, special by s = 12, Q = 1. So, the ACF is characterized by finite
extension and takes value only at lag k = 12. The PACF is infinite extended with
exponential decay, visible at multiple of 12 lags, which is alternate or monotonic
according to the sign of Ѳଵ .
2. Xt = sѲଵ Xt – 12 + ܽt is ARIMA ( 0, 0, 0 ) × ( 1,0, 0 ) 12 . The seasonal AR component
is specified by s = 12, P = 1. So, the ACF is characterized by infinite extension. The
PACF is with finite extended and takes value only at lag k = 12.
7. CONCLUSION
For rainfall prediction, artificial neural network was applied to predict the summary rainfall data
in Thailand. According to the experiments, predictions of the summary rainfall data using back
propagation neural network were acceptably accuracy. In the future works, some additional inputs
were employed for rainfall prediction such as Sea Surface Temperature (SST) areas around
Andhra Pradesh and Southern part of India.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
Franck Le Duff , Cristian Muntean, Marc Cuggiaa, Philippe Mabob,Predicting Survival Causes After
Out of Hospital Cardiac Arrest using Data Mining Method, MEDINFO 2004 M. Fieschi et al. (Eds)
Amsterdam: IOS Press
Gisele L. Pappa, Anthony J. Baines and Alex A. Freitas, Predicting post-synaptic activity in proteins
with data mining, bioinformatics, vol 21, Pp. ii19-ii25
L. S. Kumar, y. H. Lee, j. X. Yeo and j. T. Ong, tropical rain classification and estimation of rain
from z-r (reflectivity-rain rate) relationships, progress in electromagnetics research b, vol. 32,
107{127, 2011}
A. El-Shafie, A. Noureldin, M. R. Taha, and A. Hussain, Dynamic versus static neural network model
for rainfall forecasting at Klang River Basin, Malaysia Hydrol. Earth Syst. Sci. Discuss., 8, 6489–
6532, 2011,doi:10.5194/hessd-8-6489-2011 C.L. Wu, K.W. Chau, Prediction of rainfall time series
using modular soft computing methods, Engineering Applications of Artificial Intelligence,2012
Elsevier Ltd. All rights .http://dx.doi.org/10.1016/j.engappai.2012.05.023
S. Alvisi, G. Mascellani, M. Franchini and A. Bárdossy, Water level forecasting through fuzzy logic
and artificial neural network approaches, Copernicus GmbH, 2006
Yilmaz, Abdullah Gokhan; Imteaz, MonzurAlam, ,Development of a hydrologic model using
artificial intelligence for Upper Euphrates Basin in Turkey
Pritpal Singh, Bhogeswar Borah, Indian summer monsoon rainfall prediction using artificial neural
network, springer 2013
Abhishek, Kumar,, Kumar, A., Ranjan, R. ,Kumar, S,A rainfall prediction model using artificial
neural network, Control and System Graduate Research Colloquium (ICSGRC), 2012 IEEE
Sunyoung Lee, Sungzoon Cho, Patrick M. Wong, Rainfall Prediction Using Artificial Neural
Networks , Journal of Geographic Information and Decision Analysis, vol. 2, no. 2, pp. 233 - 242,
1998
VidiBhuwana, Rainfall Runoff Modeling by Using Adaptive-Network-Based Fuzzy Inference System
(ANFIS) - Case Study Ciliwung River
Edvinaldrian, and yudhasetiawandjamil,application of multivariate anfis for daily rainfall Prediction:
influences of training data size, makara, sains, volume 12, no. 1, april 2008
RiaFaulina, , Suhartono2, Hybrid ARIMA-ANFIS for Rainfall Prediction in Indonesia, International
Journal of Science and Research (IJSR), India Online ISSN: 2319-7064 Volume 2 Issue 2, February
2013
Guhathakurta, P (2005) “Long-range monsoon rainfall prediction of 2005 for the districts and subdivision Kerala with artificial neural network”, Current Science, 90, 773-779
Rajeevan, M (2001) “Prediction of Indian summer monsoon: Status,problems and prospects”,
Current Science.
8. 30
Computer Science & Information Technology (CS & IT)
[15] D. Mark, “Geographical information science; critical issues in an emerging cross – disciplinary
Research domain”, URISA Journal,vol. 12, February 1999.
[16] Tamil Nadu Meteorology Department, Chennai.
[17] European – Commission, “Forest Fires in Europe 2007, Technicalreport, Report No. 8, 2008.
[18] L.A. Zadeh, “Fuzzy sets”, Information And control vol. 8, 1965.
[19] Nguyen T. Danh, Huyah N. Phien and AshimD.Gupta 1999,“Neural network models for river flow
forecasting, “Journal ofWater SA, Vol.25.
[20] S. Lee, S. Cho and P.M. Wong 1998, “Rainfall prediction usingArtificial Neural Networks,” Journal
of Geographic Information andDecision Analysis, Vol. 2, No. 2.
[21] Hastenrath, S(1988) “Prediction of India Monsoon Rainfall: FurtherExploration”, Journal of Climate.
[22] El-Shafie A, Reda TM, Noureldin A (2007). A Neuro-Fuzzy Model forInflowForecasting of the Nile
River at Aswan High Dam. WaterResour. Manag., 21(3): 533-556.
[23] French MN, Krajewski WF, Cuykendal RR (1992). Rainfall Forecastingin Space and Time Using a
Neural Network. J. Hydrol. Amsterdam,137: 1–37.
[24] Halff AH, Halff HM, Azmoodeh M (1993). Predicting Runoff from RainfallUsing Neural Networks.
Proc. Engrg. Hydrol. ASCE, New York, pp.760–765.
[25] Maier HR, Dandy GC (1996). The Use of Artificial Neural Networks ForThe Prediction of Water
Quality Parameters. Water Resour. Res.,32(4): 1013-1022.
[26] Maria C, Valverde R, HaroldoFraga de Campos V, Nelson Jesus F(2005). Artificial Neural Network
Technique For Rainfall ForecastingApplied to The Sao Paulo Region. J. Hydrol. 301:146–162.
[27] Sahai AK, Somann MK, Satyan V (2000). All India Summer MonsoonRainfall Prediction Using an
Artificial Neural Netw. Clim. Dyn., 16(4):291- 302
[28] A. Altunkaynak, Z. Şen, Steady state flow with hydraulic conductivity change around large diameter
wells, hydrological processes, Wiley publisher
[29] Tony Hall, Precipitation Forecasting Using A Neural Network, Weather And Forecasting Volume 14
[30] Hsu, h.-m, m.w. Moncrieff, w.-w, tung, and c. Liu, 2006a: temporal variability of warm-season
precipitation over north america: statistical analysis of radar measutement. J. Atmos. Sci
[31] French, Mark N., Witold F. Krajewski, and Robert R. Cuykendall. “Rainfall forecasting in space and
time using a neural network.” Journal of hydrology137.1 (1992): 1-31.
[32] Neelamshihani., K. Kumbhar, ManojKulshreshtha, Modeling of extrusion process using response
surface methodology and artificial neural networks, journal of engineering science and technology,
vol. 1, no. 1 (2006) 31-40
[33] N. Q. Hung, M. S. Babel, S. Weesakul, and N. K. Tripathi,An artificial neural network model for
rainfall forecasting inBangkok, Thailand, Hydrol. Earth Syst. Sci., 13, 1413–1425, 2009
[34] V. K. Somvanshi, et al., “Modeling and prediction of rainfall using artificial neural network and
ARIMA techniques.” J. Ind. Geophys. Union, vol. 10, no. 2, pp. 141-151, 2006
[35] B. N. Goswami1, V. Venugopal, D. Sengupta, M. S. Madhusoodanan, Prince K. Xavier, Increasing
Trend of Extreme Rain Events Over India in a Warming Environment. Science 1 December 2006:
Vol. 314 no. 5804 pp. 1442-1445 DOI: 10.1126/science.1132027
[36] Box, G.E.P. and Jenkins, G.M. (1970), Time Series Analysis: Forecasting and Control, San
Francisco:Holden-Da y.