Accurate prediction of electricity demand can bring
extensive benefits to any country as the forecaste
d
values help the relevant authorities to take decisi
ons regarding electricity generation, transmission
and
distribution appropriately. The literature reveals
that, when compared to conventional time series
techniques, the improved artificial intelligent app
roaches provide better prediction accuracies. Howev
er,
the accuracy of predictions using intelligent appro
aches like neural networks are strongly influenced
by the
correct selection of inputs and the number of neuro
-forecasters used for prediction. Deshani, Hansen,
Attygalle, & Karunarathne (2014) suggested that a c
luster analysis could be performed to group similar
day types, which contribute towards selecting a bet
ter set of neuro-forecasters in neural networks. Th
e
cluster analysis was based on the daily total elect
ricity demands as their target was to predict the d
aily
total demands using neural networks. However, predi
cting half-hourly demand seems more appropriate
due to the considerable changes of electricity dema
nd observed during a particular day. As such cluste
rs
are identified considering half-hourly data within
the daily load distribution curves. Thus, this pape
r is an
improvement to Deshani et. al. (2014), which illust
rates how the half hourly demand distribution withi
n a
day, is incorporated when selecting the inputs for
the neuro-forecasters.
IMPROVED NEURAL NETWORK PREDICTION PERFORMANCES OF ELECTRICITY DEMAND: MODIFY...csandit
Accurate prediction of electricity demand can bring extensive benefits to any country as the
forecast values help the relevant authorities to take decisions regarding electricity generation,
transmission and distribution much appropriately. The literature reveals that, when compared
to conventional time series techniques, the improved artificial intelligent approaches provide
better prediction accuracies. However, the accuracy of predictions using intelligent approaches
like neural networks are strongly influenced by the correct selection of inputs and the number of
neuro-forecasters used for prediction. This research shows how a cluster analysis performed to
group similar day types, could contribute towards selecting a better set of neuro-forecasters in
neural networks. Daily total electricity demands for five years were considered for the analysis
and each date was assigned to one of the thirteen day-types, in a Sri Lankan context. As a
stochastic trend could be seen over the years, prior to performing the k-means clustering, the
trend was removed by taking the first difference of the series. Three different clusters were
found using Silhouette plots, and thus three neuro-forecasters were used for predictions. This
paper illustrates the proposed modified neural network procedure using electricity demand
data.
Analytical Modelling of Power Efficient Reliable Operation of Data Fusion in ...IJECEIAES
Irrespective of inclusion of Wireless Sensor Network (WSN) in majority of the research proposition for smart city planning, it is still shrouded with some significant issues. A closer look into problems in WSN shows that energy parameter is the origination point of majority of the other problems in resource-constrained sensors as well as it significant minimizes the reliability in standard sensory operation in adverse environment. Therefore, this manuscript presents a novel analytical model that is meant for establishing a well balance between energy efficiency over multi-path data forwarding and reliable operation with improved network performance. The complete process is emphasized during data fusion stage to ensure data quality too. A simulation study has been carried out using benchmarked test-bed of MEMSIC nodes to find that proposed system offers good energy conservation process during data fusion operation as well as it also ensure good reliable operation in comparison to existing system.
CLUSTERING ALGORITHM FOR A HEALTHCARE DATASET USING SILHOUETTE SCORE VALUEijcsit
The huge amount of healthcare data, coupled with the need for data analysis tools has made data mining
interesting research areas. Data mining tools and techniques help to discover and understand hidden
patterns in a dataset which may not be possible by mainly visualization of the data. Selecting appropriate
clustering method and optimal number of clusters in healthcare data can be confusing and difficult most
times. Presently, a large number of clustering algorithms are available for clustering healthcare data, but
it is very difficult for people with little knowledge of data mining to choose suitable clustering algorithms.
This paper aims to analyze clustering techniques using healthcare dataset, in order to determine suitable
algorithms which can bring the optimized group clusters. Performances of two clustering algorithms (Kmeans
and DBSCAN) were compared using Silhouette score values. Firstly, we analyzed K-means
algorithm using different number of clusters (K) and different distance metrics. Secondly, we analyzed
DBSCAN algorithm using different minimum number of points required to form a cluster (minPts) and
different distance metrics. The experimental result indicates that both K-means and DBSCAN algorithms
have strong intra-cluster cohesion and inter-cluster separation. Based on the analysis, K-means algorithm
performed better compare to DBSCAN algorithm in terms of clustering accuracy and execution time.
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...IJDKP
Huge volume of data from domain specific applications such as medical, financial, library, telephone,
shopping records and individual are regularly generated. Sharing of these data is proved to be beneficial
for data mining application. On one hand such data is an important asset to business decision making by
analyzing it. On the other hand data privacy concerns may prevent data owners from sharing information
for data analysis. In order to share data while preserving privacy, data owner must come up with a solution
which achieves the dual goal of privacy preservation as well as an accuracy of data mining task –
clustering and classification. An efficient and effective approach has been proposed that aims to protect
privacy of sensitive information and obtaining data clustering with minimum information loss
Data Analytics on Solar Energy Using HadoopIJMERJOURNAL
ABSRACT: Missing data is one of the major issues in data miningand pattern recognition. The knowledge contains in attributeswith missing data values are important in improvingregression correlationprocessofanorganization.Thelearningprocesson each instance is necessary as it may containsome exceptional knowledge. There are various methods tohandle missing data in regression correlation. Analysis of photovoltaic cell, Sunlight striking on different geographical location to know the defective or connectionless photovoltaic cell plate. we mainly aims To showcasing the energy produced at different geographical location and to find the defective plate. And also analysis the data sets of energy produced along with current weather in particular area to know the status of the photovoltaic plates. In this project, we used Hadoop Map-Reduce framework to analyze the solar energy datasets.
Predicting of Hosting Animal Centre Outcome Based on Supervised Machine Learn...sushantparte
Research Project - The objective of this project is to predict the outcome of animals placed in shelters given features such as the animal’s age, breed, and colour. There are 5 possible outcomes for each animal with euthanasia being the worst outcome. The shelter hopes to be able to determine which animals are likely to be euthanized as well as find trends into what features increase the chance for adoption. This provides a chance for shelters to try to aid animals with a low chance of adoption. The overall goal is to decrease the yearly number of animals euthanized.
IMPROVED NEURAL NETWORK PREDICTION PERFORMANCES OF ELECTRICITY DEMAND: MODIFY...csandit
Accurate prediction of electricity demand can bring extensive benefits to any country as the
forecast values help the relevant authorities to take decisions regarding electricity generation,
transmission and distribution much appropriately. The literature reveals that, when compared
to conventional time series techniques, the improved artificial intelligent approaches provide
better prediction accuracies. However, the accuracy of predictions using intelligent approaches
like neural networks are strongly influenced by the correct selection of inputs and the number of
neuro-forecasters used for prediction. This research shows how a cluster analysis performed to
group similar day types, could contribute towards selecting a better set of neuro-forecasters in
neural networks. Daily total electricity demands for five years were considered for the analysis
and each date was assigned to one of the thirteen day-types, in a Sri Lankan context. As a
stochastic trend could be seen over the years, prior to performing the k-means clustering, the
trend was removed by taking the first difference of the series. Three different clusters were
found using Silhouette plots, and thus three neuro-forecasters were used for predictions. This
paper illustrates the proposed modified neural network procedure using electricity demand
data.
Analytical Modelling of Power Efficient Reliable Operation of Data Fusion in ...IJECEIAES
Irrespective of inclusion of Wireless Sensor Network (WSN) in majority of the research proposition for smart city planning, it is still shrouded with some significant issues. A closer look into problems in WSN shows that energy parameter is the origination point of majority of the other problems in resource-constrained sensors as well as it significant minimizes the reliability in standard sensory operation in adverse environment. Therefore, this manuscript presents a novel analytical model that is meant for establishing a well balance between energy efficiency over multi-path data forwarding and reliable operation with improved network performance. The complete process is emphasized during data fusion stage to ensure data quality too. A simulation study has been carried out using benchmarked test-bed of MEMSIC nodes to find that proposed system offers good energy conservation process during data fusion operation as well as it also ensure good reliable operation in comparison to existing system.
CLUSTERING ALGORITHM FOR A HEALTHCARE DATASET USING SILHOUETTE SCORE VALUEijcsit
The huge amount of healthcare data, coupled with the need for data analysis tools has made data mining
interesting research areas. Data mining tools and techniques help to discover and understand hidden
patterns in a dataset which may not be possible by mainly visualization of the data. Selecting appropriate
clustering method and optimal number of clusters in healthcare data can be confusing and difficult most
times. Presently, a large number of clustering algorithms are available for clustering healthcare data, but
it is very difficult for people with little knowledge of data mining to choose suitable clustering algorithms.
This paper aims to analyze clustering techniques using healthcare dataset, in order to determine suitable
algorithms which can bring the optimized group clusters. Performances of two clustering algorithms (Kmeans
and DBSCAN) were compared using Silhouette score values. Firstly, we analyzed K-means
algorithm using different number of clusters (K) and different distance metrics. Secondly, we analyzed
DBSCAN algorithm using different minimum number of points required to form a cluster (minPts) and
different distance metrics. The experimental result indicates that both K-means and DBSCAN algorithms
have strong intra-cluster cohesion and inter-cluster separation. Based on the analysis, K-means algorithm
performed better compare to DBSCAN algorithm in terms of clustering accuracy and execution time.
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...IJDKP
Huge volume of data from domain specific applications such as medical, financial, library, telephone,
shopping records and individual are regularly generated. Sharing of these data is proved to be beneficial
for data mining application. On one hand such data is an important asset to business decision making by
analyzing it. On the other hand data privacy concerns may prevent data owners from sharing information
for data analysis. In order to share data while preserving privacy, data owner must come up with a solution
which achieves the dual goal of privacy preservation as well as an accuracy of data mining task –
clustering and classification. An efficient and effective approach has been proposed that aims to protect
privacy of sensitive information and obtaining data clustering with minimum information loss
Data Analytics on Solar Energy Using HadoopIJMERJOURNAL
ABSRACT: Missing data is one of the major issues in data miningand pattern recognition. The knowledge contains in attributeswith missing data values are important in improvingregression correlationprocessofanorganization.Thelearningprocesson each instance is necessary as it may containsome exceptional knowledge. There are various methods tohandle missing data in regression correlation. Analysis of photovoltaic cell, Sunlight striking on different geographical location to know the defective or connectionless photovoltaic cell plate. we mainly aims To showcasing the energy produced at different geographical location and to find the defective plate. And also analysis the data sets of energy produced along with current weather in particular area to know the status of the photovoltaic plates. In this project, we used Hadoop Map-Reduce framework to analyze the solar energy datasets.
Predicting of Hosting Animal Centre Outcome Based on Supervised Machine Learn...sushantparte
Research Project - The objective of this project is to predict the outcome of animals placed in shelters given features such as the animal’s age, breed, and colour. There are 5 possible outcomes for each animal with euthanasia being the worst outcome. The shelter hopes to be able to determine which animals are likely to be euthanized as well as find trends into what features increase the chance for adoption. This provides a chance for shelters to try to aid animals with a low chance of adoption. The overall goal is to decrease the yearly number of animals euthanized.
Improving IF Algorithm for Data Aggregation Techniques in Wireless Sensor Net...IJECEIAES
In Wireless Sensor Network (WSN), fact from different sensor nodes is collected at assembling node, which is typically complete via modest procedures such as averaging as inadequate computational power and energy resources. Though such collections is identified to be extremely susceptible to node compromising attacks. These approaches are extremely prone to attacks as WSN are typically lacking interfere resilient hardware. Thus, purpose of veracity of facts and prestige of sensor nodes is critical for wireless sensor networks. Therefore, imminent gatherer nodes will be proficient of accomplishment additional cultivated data aggregation algorithms, so creating WSN little unresisting, as the performance of actual low power processors affectedly increases. Iterative filtering algorithms embrace inordinate capacity for such a resolution. The way of allocated the matching mass elements to information delivered by each source, such iterative algorithms concurrently assemble facts from several roots and deliver entrust valuation of these roots. Though suggestively extra substantial against collusion attacks beside the modest averaging techniques, are quiet vulnerable to a different cultivated attack familiarize. The existing literature is surveyed in this paper to have a study of iterative filtering techniques and a detailed comparison is provided. At the end of this paper new technique of improved iterative filtering is proposed with the help of literature survey and drawbacks found in the literature.
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMSijdkp
Subspace clustering discovers the clusters embedded in multiple, overlapping subspaces of high
dimensional data. Many significant subspace clustering algorithms exist, each having different
characteristics caused by the use of different techniques, assumptions, heuristics used etc. A comprehensive
classification scheme is essential which will consider all such characteristics to divide subspace clustering
approaches in various families. The algorithms belonging to same family will satisfy common
characteristics. Such a categorization will help future developers to better understand the quality criteria to
be used and similar algorithms to be used to compare results with their proposed clustering algorithms. In
this paper, we first proposed the concept of SCAF (Subspace Clustering Algorithms’ Family).
Characteristics of SCAF will be based on the classes such as cluster orientation, overlap of dimensions etc.
As an illustration, we further provided a comprehensive, systematic description and comparison of few
significant algorithms belonging to “Axis parallel, overlapping, density based” SCAF.
Novel holistic architecture for analytical operation on sensory data relayed...IJECEIAES
With increasing adoption of the sensor-based application, there is an exponential rise of the sensory data that eventually take the shape of the big data. However, the practicality of executing high end analytical operation over the resource-constrained big data has never being studied closely. After reviewing existing approaches, it is explored that there is no cost-effective schemes of big data analytics over large scale sensory data processiing that can be directly used as a service. Therefore, the propsoed system introduces a holistic architecture where streamed data after performing extraction of knowedge can be offered in the form of services. Implemented in MATLAB, the proposed study uses a very simplistic approach considering energy constrained of the sensor nodes to find that proposed system offers better accuracy, reduced mining duration (i.e. faster response time), and reduced memory dependencies to prove that it offers cost effective analytical solution in contrast to existing system.
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERSZac Darcy
With the Advancement of time and technology, Outlier Mining methodologies help to sift through the large
amount of interesting data patterns and winnows the malicious data entering in any field of concern. It has
become indispensible to build not only a robust and a generalised model for anomaly detection but also to
dress the same model with extra features like utmost accuracy and precision. Although the K-means
algorithm is one of the most popular, unsupervised, unique and the easiest clustering algorithm, yet it can
be used to dovetail PCA with hubness and the robust model formed from Guassian Mixture to build a very
generalised and a robust anomaly detection system. A major loophole of the K-means algorithm is its
constant attempt to find the local minima and result in a cluster that leads to ambiguity. In this paper, an
attempt has done to combine K-means algorithm with PCA technique that results in the formation of more
closely centred clusters that work more accurately with K-means algorithm
Comparative study of decision tree algorithm and naive bayes classifier for s...eSAT Journals
Abstract The modeling and analysis of the epidemic disease outbreaks in huge realistic populations is a data intensive task that requires immense computational resources. Such effective computational support becomes useful to study disease outbreak and to facilitate decision making. Epidemiology is one of the traditional approaches which are being used for studying and analyzing the outbreaks of epidemic diseases. Although useful for obtaining numbers of sick, infected and recovered individuals in a population, this traditional approach does not capture the complexity of human interactions. The model is only limited to the person to person interaction in order to track the surveillance of disease and it also having performance issues with large realistic data. In this paper we propose, the combination of computational epidemiology and modern data mining techniques with their comparative analysis for the Swine Flu prediction. The clustering algorithm K mean is used to make a group or cluster of Swine Flu suspects in a particular area. The Decision tree algorithm and Naive Bayes classifier are applied on the same inputs to find out the actual count of suspects and predict the possible surveillance of a Swine Flu in a nearby area from suspected area. The performances of these techniques are compared, based on accuracy. In our case the Naive Bayes classifier performs better than decision tree algorithm while finding the accurate count of suspects. Keywords: Computational Epidemiology, Aerosol-borne Disease, Clustering, Predictive analysis.
— The healthcare industry is considered one of the
largest industry in the world. The healthcare industry is same as
the medical industries having the largest amount of health related
and medical related data. This data helps to discover useful
trends and patters that can be used in diagnosis and decision
making. Clustering techniques like K-means, D-streams,
COBWEB, EM have been used for healthcare purposes like heart
disease diagnosis, cancer detection etc. This paper focuses on the
use of K-means and D-stream algorithm in healthcare. This
algorithms were used in healthcare to determine whether a
person is fit or unfit and this fitness decision was taken based on
his/her historical and current data. Both the clustering
algorithms were analyzed by applying them on patients current
biomedical historical databases, this analysis depends on the
attributes like peripheral blood oxygenation, diastolic arterial
blood pressure, systolic arterial blood pressure, heart rate,
heredity, obesity, and this fitness decision was taken based on
his/her historical and current data. Both the clustering
algorithms were analyzed by applying them on patients current
biomedical historical databases, this analysis depends on the
attributes like peripheral blood oxygenation, diastolic arterial
blood pressure, systolic arterial blood pressure, heart rate,
heredity, obesity, cigarette smoking. By analyzing both the
algorithm it was found that the Density-based clustering
algorithm i.e. the D-stream algorithm proves to give more
accurate results than K-means when used for cluster formation of
historical biomedical data. D-stream algorithm overcomes
drawbacks of K-means algorithm
An efficient feature selection algorithm for health care data analysisjournalBEEI
Diabete is a silent killer, which will slowly kill the person if it goes undetected. The existing system which uses F-score method and K-means clustering of checking whether a person has diabetes or not are 100% accurate, and anything which isn't a 100% is not acceptable in the medical field, as it could cost the lives of many people. Our proposed system aims at using some of the best features of the existing algorithms to predict diabetes, and combine these and based on these features; This research work turns them into a novel algorithm, which will be 100% accurate in its prediction. With the surge in technological advancements, we can use data mining to predict when a person would be diagnosed with diabetes. Specifically, we analyze the best features of chi-square algorithm and advanced clustering algorithm (ACA). This research work is done using the Pima Indian Diabetes dataset provided by National Institutes of Diabetes and Digestive and Kidney Diseases. Using classification theorems and methods we can consider different factors like age, BMI, blood pressure and the importance given to these attributes overall, and singles these attributes out, and use them for the prediction of diabetes.
Certain Investigation on Dynamic Clustering in Dynamic Dataminingijdmtaiir
Clustering is the process of grouping a set of objects
into classes of similar objects. Dynamic clustering comes in a
new research area that is concerned about dataset with dynamic
aspects. It requires updates of the clusters whenever new data
records are added to the dataset and may result in a change of
clustering over time. When there is a continuous update and
huge amount of dynamic data, rescan the database is not
possible in static data mining. But this is possible in Dynamic
data mining process. This dynamic data mining occurs when
the derived information is present for the purpose of analysis
and the environment is dynamic, i.e. many updates occur.
Since this has now been established by most researchers and
they will move into solving some of the problems and the
research is to concentrate on solving the problem of using data
mining dynamic databases. This paper gives some
investigation of existing work done in some papers related with
dynamic clustering and incremental data clustering
Anonymization of data using mapreduce on cloudeSAT Journals
Abstract In computer world cloud services are provided by the service providers. The user wants to share the private data which are stored in cloud server for different reasons like data mining, data analysis etc. These can bring the privacy concern. Privacy preservation can be satisfied by Anonymizing data sets through generalization to satisfy privacy requirements by using k-anonymity technique which is a widely used type of privacy preserving techniques. At present days the data of cloud applications are increasing their scale day by day concern with Big Data trend. So it is very difficult thing to accept, manage, maintain and process the large scaled data with-in the required time stamps. Thus for privacy preserving on privacy sensitive , large scaled data is very difficult task for existing anonymization techniques because they will not manage the scaled data sets. This approach addresses the anonymization problem on large scale cloud data sets using two phase top down specialization approach and MapReduce framework. Innovative MapReduce jobs are carefully designed in both phases of this technique to achieve specialization computation on scalable data sets. Scalability and efficiency of Top Down Specialization (TDS) is significantly increased over the existing approach. Keywords: Top Down Specialization, MapReduce, Data Anonymization, Cloud Computing, Privacy Preservation
10 9242 it geo-spatial information for managing ambiguity(edit ty)IAESIJEECS
An innate test emerging in any dataset containing data of space as well as time is vulnerability due to different wellsprings of imprecision. Incorporating the effect of the instability is a principal while evaluating the unwavering quality (certainty) of any question result from the hidden information. To bargain with vulnerability, arrangements have been proposed freely in the geo-science and the information science look into group. This interdisciplinary instructional exercise crosses over any barrier between the two groups by giving an exhaustive diagram of the distinctive difficulties required in managing indeterminate geo-spatial information, by looking over arrangements from both research groups, and by distinguishing likenesses, cooperative energies and open research issues.
A REVIEW ON PREDICTIVE ANALYTICS IN DATA MININGijccmsjournal
The data mining its main process is to collect, extract and store the valuable information and now-a-days it’s
done by many enterprises actively. In advanced analytics, Predictive analytics is the one of the branch which is
mainly used to make predictions about future events which are unknown. Predictive analytics which uses
various techniques from machine learning, statistics, data mining, modeling, and artificial intelligence for
analyzing the current data and to make predictions about future. The two main objectives of predictive
analytics are Regression and Classification. It is composed of various analytical and statistical techniques used
for developing models which predicts the future occurrence, probabilities or events. Predictive analytics deals
with both continuous changes and discontinuous changes. It provides a predictive score for each individual
(healthcare patient, product SKU, customer, component, machine, or other organizational unit, etc.) to
determine, or influence the organizational processes which pertain across huge numbers of individuals, like in
fraud detection, manufacturing, credit risk assessment, marketing, and government operations including law
enforcement.
Optimal artificial neural network configurations for hourly solar irradiation...IJECEIAES
Solar energy is widely used in order to generate clean electric energy. However, due to its intermittent nature, this resource is only inserted in a limited way within the electrical networks. To increase the share of solar energy in the energy balance and allow better management of its production, it is necessary to know precisely the available solar potential at a fine time step to take into account all these stochastic variations. In this paper, a comparison between different artificial neural network (ANN) configurations is elaborated to estimate the hourly solar irradiation. An investigation of the optimal neurons and layers is investigated. To this end, feedforward neural network, cascade forward neural network and fitting neural network have been applied for this purpose. In this context, we have used different meteorological parameters to estimate the hourly global solar irirradiation in the region of Laghouat, Algeria. The validation process shows that choosing the cascade forward neural network two inputs gives an R2 value equal to 97.24% and an normalized root mean square error (NRMSE) equals to 0.1678 compared to the results of three inputs, which gives an R2 value equaled to 95.54% and an NRMSE equals to 0.2252. The comparison between different existing methods in literature show the goodness of the proposed models.
RAINFALL PREDICTION USING DATA MINING TECHNIQUES - A SURVEYcsandit
Rainfall is considered as one of the major components of the hydrological process; it takes
significant part in evaluating drought and flooding events. Therefore, it is important to have an
accurate model for rainfall prediction. Recently, several data-driven modeling approaches have
been investigated to perform such forecasting tasks as multilayer perceptron neural networks
(MLP-NN). In fact, the rainfall time series modeling (SARIMA) involvesimportant temporal
dimensions. In order to evaluate the incomes of both models, statistical parameters were used to
make the comparison between the two models. These parameters include the Root Mean Square
Error RMSE, Mean Absolute Error MAE, Coefficient Of Correlation CC and BIAS. Two-Third
of the data was used for training the model and One-third for testing.
RAINFALL PREDICTION USING DATA MINING TECHNIQUES - A SURVEYcscpconf
Rainfall is considered as one of the major components of the hydrological process; it takes significant part in evaluating drought and flooding events. Therefore, it is important to have anaccurate model for rainfall prediction. Recently, several data-driven modeling approaches havebeen investigated to perform such forecasting tasks as multilayer perceptron neural networks
(MLP-NN). In fact, the rainfall time series modeling (SARIMA) involvesimportant temporal dimensions. In order to evaluate the incomes of both models, statistical parameters were used to
make the comparison between the two models. These parameters include the Root Mean Square Error RMSE, Mean Absolute Error MAE, Coefficient Of Correlation CC and BIAS. Two-Third of the data was used for training the model and One-third for testing.
Improving IF Algorithm for Data Aggregation Techniques in Wireless Sensor Net...IJECEIAES
In Wireless Sensor Network (WSN), fact from different sensor nodes is collected at assembling node, which is typically complete via modest procedures such as averaging as inadequate computational power and energy resources. Though such collections is identified to be extremely susceptible to node compromising attacks. These approaches are extremely prone to attacks as WSN are typically lacking interfere resilient hardware. Thus, purpose of veracity of facts and prestige of sensor nodes is critical for wireless sensor networks. Therefore, imminent gatherer nodes will be proficient of accomplishment additional cultivated data aggregation algorithms, so creating WSN little unresisting, as the performance of actual low power processors affectedly increases. Iterative filtering algorithms embrace inordinate capacity for such a resolution. The way of allocated the matching mass elements to information delivered by each source, such iterative algorithms concurrently assemble facts from several roots and deliver entrust valuation of these roots. Though suggestively extra substantial against collusion attacks beside the modest averaging techniques, are quiet vulnerable to a different cultivated attack familiarize. The existing literature is surveyed in this paper to have a study of iterative filtering techniques and a detailed comparison is provided. At the end of this paper new technique of improved iterative filtering is proposed with the help of literature survey and drawbacks found in the literature.
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMSijdkp
Subspace clustering discovers the clusters embedded in multiple, overlapping subspaces of high
dimensional data. Many significant subspace clustering algorithms exist, each having different
characteristics caused by the use of different techniques, assumptions, heuristics used etc. A comprehensive
classification scheme is essential which will consider all such characteristics to divide subspace clustering
approaches in various families. The algorithms belonging to same family will satisfy common
characteristics. Such a categorization will help future developers to better understand the quality criteria to
be used and similar algorithms to be used to compare results with their proposed clustering algorithms. In
this paper, we first proposed the concept of SCAF (Subspace Clustering Algorithms’ Family).
Characteristics of SCAF will be based on the classes such as cluster orientation, overlap of dimensions etc.
As an illustration, we further provided a comprehensive, systematic description and comparison of few
significant algorithms belonging to “Axis parallel, overlapping, density based” SCAF.
Novel holistic architecture for analytical operation on sensory data relayed...IJECEIAES
With increasing adoption of the sensor-based application, there is an exponential rise of the sensory data that eventually take the shape of the big data. However, the practicality of executing high end analytical operation over the resource-constrained big data has never being studied closely. After reviewing existing approaches, it is explored that there is no cost-effective schemes of big data analytics over large scale sensory data processiing that can be directly used as a service. Therefore, the propsoed system introduces a holistic architecture where streamed data after performing extraction of knowedge can be offered in the form of services. Implemented in MATLAB, the proposed study uses a very simplistic approach considering energy constrained of the sensor nodes to find that proposed system offers better accuracy, reduced mining duration (i.e. faster response time), and reduced memory dependencies to prove that it offers cost effective analytical solution in contrast to existing system.
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERSZac Darcy
With the Advancement of time and technology, Outlier Mining methodologies help to sift through the large
amount of interesting data patterns and winnows the malicious data entering in any field of concern. It has
become indispensible to build not only a robust and a generalised model for anomaly detection but also to
dress the same model with extra features like utmost accuracy and precision. Although the K-means
algorithm is one of the most popular, unsupervised, unique and the easiest clustering algorithm, yet it can
be used to dovetail PCA with hubness and the robust model formed from Guassian Mixture to build a very
generalised and a robust anomaly detection system. A major loophole of the K-means algorithm is its
constant attempt to find the local minima and result in a cluster that leads to ambiguity. In this paper, an
attempt has done to combine K-means algorithm with PCA technique that results in the formation of more
closely centred clusters that work more accurately with K-means algorithm
Comparative study of decision tree algorithm and naive bayes classifier for s...eSAT Journals
Abstract The modeling and analysis of the epidemic disease outbreaks in huge realistic populations is a data intensive task that requires immense computational resources. Such effective computational support becomes useful to study disease outbreak and to facilitate decision making. Epidemiology is one of the traditional approaches which are being used for studying and analyzing the outbreaks of epidemic diseases. Although useful for obtaining numbers of sick, infected and recovered individuals in a population, this traditional approach does not capture the complexity of human interactions. The model is only limited to the person to person interaction in order to track the surveillance of disease and it also having performance issues with large realistic data. In this paper we propose, the combination of computational epidemiology and modern data mining techniques with their comparative analysis for the Swine Flu prediction. The clustering algorithm K mean is used to make a group or cluster of Swine Flu suspects in a particular area. The Decision tree algorithm and Naive Bayes classifier are applied on the same inputs to find out the actual count of suspects and predict the possible surveillance of a Swine Flu in a nearby area from suspected area. The performances of these techniques are compared, based on accuracy. In our case the Naive Bayes classifier performs better than decision tree algorithm while finding the accurate count of suspects. Keywords: Computational Epidemiology, Aerosol-borne Disease, Clustering, Predictive analysis.
— The healthcare industry is considered one of the
largest industry in the world. The healthcare industry is same as
the medical industries having the largest amount of health related
and medical related data. This data helps to discover useful
trends and patters that can be used in diagnosis and decision
making. Clustering techniques like K-means, D-streams,
COBWEB, EM have been used for healthcare purposes like heart
disease diagnosis, cancer detection etc. This paper focuses on the
use of K-means and D-stream algorithm in healthcare. This
algorithms were used in healthcare to determine whether a
person is fit or unfit and this fitness decision was taken based on
his/her historical and current data. Both the clustering
algorithms were analyzed by applying them on patients current
biomedical historical databases, this analysis depends on the
attributes like peripheral blood oxygenation, diastolic arterial
blood pressure, systolic arterial blood pressure, heart rate,
heredity, obesity, and this fitness decision was taken based on
his/her historical and current data. Both the clustering
algorithms were analyzed by applying them on patients current
biomedical historical databases, this analysis depends on the
attributes like peripheral blood oxygenation, diastolic arterial
blood pressure, systolic arterial blood pressure, heart rate,
heredity, obesity, cigarette smoking. By analyzing both the
algorithm it was found that the Density-based clustering
algorithm i.e. the D-stream algorithm proves to give more
accurate results than K-means when used for cluster formation of
historical biomedical data. D-stream algorithm overcomes
drawbacks of K-means algorithm
An efficient feature selection algorithm for health care data analysisjournalBEEI
Diabete is a silent killer, which will slowly kill the person if it goes undetected. The existing system which uses F-score method and K-means clustering of checking whether a person has diabetes or not are 100% accurate, and anything which isn't a 100% is not acceptable in the medical field, as it could cost the lives of many people. Our proposed system aims at using some of the best features of the existing algorithms to predict diabetes, and combine these and based on these features; This research work turns them into a novel algorithm, which will be 100% accurate in its prediction. With the surge in technological advancements, we can use data mining to predict when a person would be diagnosed with diabetes. Specifically, we analyze the best features of chi-square algorithm and advanced clustering algorithm (ACA). This research work is done using the Pima Indian Diabetes dataset provided by National Institutes of Diabetes and Digestive and Kidney Diseases. Using classification theorems and methods we can consider different factors like age, BMI, blood pressure and the importance given to these attributes overall, and singles these attributes out, and use them for the prediction of diabetes.
Certain Investigation on Dynamic Clustering in Dynamic Dataminingijdmtaiir
Clustering is the process of grouping a set of objects
into classes of similar objects. Dynamic clustering comes in a
new research area that is concerned about dataset with dynamic
aspects. It requires updates of the clusters whenever new data
records are added to the dataset and may result in a change of
clustering over time. When there is a continuous update and
huge amount of dynamic data, rescan the database is not
possible in static data mining. But this is possible in Dynamic
data mining process. This dynamic data mining occurs when
the derived information is present for the purpose of analysis
and the environment is dynamic, i.e. many updates occur.
Since this has now been established by most researchers and
they will move into solving some of the problems and the
research is to concentrate on solving the problem of using data
mining dynamic databases. This paper gives some
investigation of existing work done in some papers related with
dynamic clustering and incremental data clustering
Anonymization of data using mapreduce on cloudeSAT Journals
Abstract In computer world cloud services are provided by the service providers. The user wants to share the private data which are stored in cloud server for different reasons like data mining, data analysis etc. These can bring the privacy concern. Privacy preservation can be satisfied by Anonymizing data sets through generalization to satisfy privacy requirements by using k-anonymity technique which is a widely used type of privacy preserving techniques. At present days the data of cloud applications are increasing their scale day by day concern with Big Data trend. So it is very difficult thing to accept, manage, maintain and process the large scaled data with-in the required time stamps. Thus for privacy preserving on privacy sensitive , large scaled data is very difficult task for existing anonymization techniques because they will not manage the scaled data sets. This approach addresses the anonymization problem on large scale cloud data sets using two phase top down specialization approach and MapReduce framework. Innovative MapReduce jobs are carefully designed in both phases of this technique to achieve specialization computation on scalable data sets. Scalability and efficiency of Top Down Specialization (TDS) is significantly increased over the existing approach. Keywords: Top Down Specialization, MapReduce, Data Anonymization, Cloud Computing, Privacy Preservation
10 9242 it geo-spatial information for managing ambiguity(edit ty)IAESIJEECS
An innate test emerging in any dataset containing data of space as well as time is vulnerability due to different wellsprings of imprecision. Incorporating the effect of the instability is a principal while evaluating the unwavering quality (certainty) of any question result from the hidden information. To bargain with vulnerability, arrangements have been proposed freely in the geo-science and the information science look into group. This interdisciplinary instructional exercise crosses over any barrier between the two groups by giving an exhaustive diagram of the distinctive difficulties required in managing indeterminate geo-spatial information, by looking over arrangements from both research groups, and by distinguishing likenesses, cooperative energies and open research issues.
A REVIEW ON PREDICTIVE ANALYTICS IN DATA MININGijccmsjournal
The data mining its main process is to collect, extract and store the valuable information and now-a-days it’s
done by many enterprises actively. In advanced analytics, Predictive analytics is the one of the branch which is
mainly used to make predictions about future events which are unknown. Predictive analytics which uses
various techniques from machine learning, statistics, data mining, modeling, and artificial intelligence for
analyzing the current data and to make predictions about future. The two main objectives of predictive
analytics are Regression and Classification. It is composed of various analytical and statistical techniques used
for developing models which predicts the future occurrence, probabilities or events. Predictive analytics deals
with both continuous changes and discontinuous changes. It provides a predictive score for each individual
(healthcare patient, product SKU, customer, component, machine, or other organizational unit, etc.) to
determine, or influence the organizational processes which pertain across huge numbers of individuals, like in
fraud detection, manufacturing, credit risk assessment, marketing, and government operations including law
enforcement.
Optimal artificial neural network configurations for hourly solar irradiation...IJECEIAES
Solar energy is widely used in order to generate clean electric energy. However, due to its intermittent nature, this resource is only inserted in a limited way within the electrical networks. To increase the share of solar energy in the energy balance and allow better management of its production, it is necessary to know precisely the available solar potential at a fine time step to take into account all these stochastic variations. In this paper, a comparison between different artificial neural network (ANN) configurations is elaborated to estimate the hourly solar irradiation. An investigation of the optimal neurons and layers is investigated. To this end, feedforward neural network, cascade forward neural network and fitting neural network have been applied for this purpose. In this context, we have used different meteorological parameters to estimate the hourly global solar irirradiation in the region of Laghouat, Algeria. The validation process shows that choosing the cascade forward neural network two inputs gives an R2 value equal to 97.24% and an normalized root mean square error (NRMSE) equals to 0.1678 compared to the results of three inputs, which gives an R2 value equaled to 95.54% and an NRMSE equals to 0.2252. The comparison between different existing methods in literature show the goodness of the proposed models.
RAINFALL PREDICTION USING DATA MINING TECHNIQUES - A SURVEYcsandit
Rainfall is considered as one of the major components of the hydrological process; it takes
significant part in evaluating drought and flooding events. Therefore, it is important to have an
accurate model for rainfall prediction. Recently, several data-driven modeling approaches have
been investigated to perform such forecasting tasks as multilayer perceptron neural networks
(MLP-NN). In fact, the rainfall time series modeling (SARIMA) involvesimportant temporal
dimensions. In order to evaluate the incomes of both models, statistical parameters were used to
make the comparison between the two models. These parameters include the Root Mean Square
Error RMSE, Mean Absolute Error MAE, Coefficient Of Correlation CC and BIAS. Two-Third
of the data was used for training the model and One-third for testing.
RAINFALL PREDICTION USING DATA MINING TECHNIQUES - A SURVEYcscpconf
Rainfall is considered as one of the major components of the hydrological process; it takes significant part in evaluating drought and flooding events. Therefore, it is important to have anaccurate model for rainfall prediction. Recently, several data-driven modeling approaches havebeen investigated to perform such forecasting tasks as multilayer perceptron neural networks
(MLP-NN). In fact, the rainfall time series modeling (SARIMA) involvesimportant temporal dimensions. In order to evaluate the incomes of both models, statistical parameters were used to
make the comparison between the two models. These parameters include the Root Mean Square Error RMSE, Mean Absolute Error MAE, Coefficient Of Correlation CC and BIAS. Two-Third of the data was used for training the model and One-third for testing.
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...cscpconf
In the present study, the abilities of three classification methods of data mining namely artificial
neural networks with feed-forward back propagation algorithm, J48 decision tree method and
logistic regression analysis are compared in a medical real dataset. The prediction of
malignancy in suspected thyroid tumour patients is the objective of the study. The accuracy of
the correct predictions (the minimum error rate), the amount of time consuming in the
modelling process and the interpretability and simplicity of the results for clinical experts are
the factors considered to choose the best method
Clustering heterogeneous categorical data using enhanced mini batch K-means ...IJECEIAES
Clustering methods in data mining aim to group a set of patterns based on their similarity. In a data survey, heterogeneous information is established with various types of data scales like nominal, ordinal, binary, and Likert scales. A lack of treatment of heterogeneous data and information leads to loss of information and scanty decision-making. Although many similarity measures have been established, solutions for heterogeneous data in clustering are still lacking. The recent entropy distance measure seems to provide good results for the heterogeneous categorical data. However, it requires many experiments and evaluations. This article presents a proposed framework for heterogeneous categorical data solution using a mini batch k-means with entropy measure (MBKEM) which is to investigate the effectiveness of similarity measure in clustering method using heterogeneous categorical data. Secondary data from a public survey was used. The findings demonstrate the proposed framework has improved the clustering’s quality. MBKEM outperformed other clustering algorithms with the accuracy at 0.88, v-measure (VM) at 0.82, adjusted rand index (ARI) at 0.87, and Fowlkes-Mallow’s index (FMI) at 0.94. It is observed that the average minimum elapsed time-varying for cluster generation, k at 0.26 s. In the future, the proposed solution would be beneficial for improving the quality of clustering for heterogeneous categorical data problems in many domains.
Mining Frequent Patterns and Associations from the Smart meters using Bayesia...Eswar Publications
In today’s world migration of people from rural areas to urban areas is quite common. Health care services are one of the most challenging aspect that is must require to the people with abnormal health. Advancements in the technologies lead to build the smart homes, which contains various sensor or smart meter devices to automate the process of other electronic device. Additionally these smart meters can be able to capture the daily activities of the patients and also monitor the health conditions of the patients by mining the frequent patterns and
association rules generated from the smart meters. In this work we proposed a model that is able to monitor the activities of the patients in home and can send the daily activities to the corresponding doctor. We can extract the frequent patterns and association rules from the log data and can predict the health conditions of the patients and can give the suggestions according to the prediction. Our work is divided in to three stages. Firstly, we used to record the daily activities of the patient using a specific time period at three regular intervals. Secondly we applied the frequent pattern growth for extracting the association rules from the log file. Finally, we applied k means clustering for the input and applied Bayesian network model to predict the health behavior of the patient and precautions will be given accordingly.
Spatial correlation based clustering algorithm for random and uniform topolog...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
International Journal of Engineering and Science Invention (IJESI)inventionjournals
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
Clustering and Classification in Support of Climatology to mine Weather Data ...MangaiK4
Abstract -Knowledge of climate data of region is essential for business, society, agriculture, pollution and energy applications. Climate is not fixed, the fluctuation in the climate can be seen from year to year.Thedata mining application help meteorological scientists to predictaccurate weather forecast and decisions and also provide more performance and reliability than any other methods. The data mining techniques applied on weather data are efficient when compare to the mathematical models used. Various techniques of data mining are applied on climate data to support weather forecasting, climate scientists, agriculture, vegetation, water resources and tourism. The aim of this paper is to provide a review report on various data mining techniques applied on weather data set in support of weather prediction and climate analysis
Rule Optimization of Fuzzy Inference System Sugeno using Evolution Strategy f...IJECEIAES
The need for accurate load forecasts will increase in the future because of the dramatic changes occurring in the electricity consumption. Sugeno fuzzy inference system (FIS) can be used for short-term load forecasting. However, challenges in the electrical load forecasting are the data used the data trend. Therefore, it is difficult to develop appropriate fuzzy rules for Sugeno FIS. This paper proposes Evolution Strategy method to determine appropriate rules for Sugeno FIS that have minimum forecasting error. Root Mean Square Error (RMSE) is used to evaluate the goodness of the forecasting result. The numerical experiments show the effectiveness of the proposed optimized Sugeno FIS for several test-case problems. The optimized Sugeno FIS produce lower RMSE comparable to those achieved by other wellknown method in the literature.
Data repository for sensor network a data mining approachijdms
The development of sensor data repositories will aid the researchers to create benchmark dataset. These
benchmark dataset will provide a platform for all the researchers to access the data, test and compare the
accuracy of their algorithms. However, the storage and management of sensor data itself is a challenging
task due to various reasons such as noisy, redundant, missing, and faulty data. Therefore it is very
important to create a data repository which contains the precise and accurate data and also storage and
management of data is effective. Hence, in this paper we are proposing to use the combination of
quantitative association rules and decision tree for classification of faulty data and normal data. Usage of
multiple linear regression models for the estimation of missing data. A symbolic table approach for storage
and management of sensor data. And development of a graphical user interface for visualization of sensor
data.
Short term residential load forecasting using long short-term memory recurre...IJECEIAES
Load forecasting plays an essential role in power system planning. The efficiency and reliability of the whole power system can be increased with proper planning and organization. Residential load forecasting is indispensable due to its increasing role in the smart grid environment. Nowadays, smart meters can be deployed at the residential level for collecting historical data consumption of residents. Although the employment of smart meters ensures large data availability, the inconsistency of load data makes it challenging and taxing to forecast accurately. Therefore, the traditional forecasting techniques may not suffice the purpose. However, a deep learning forecasting network-based long short-term memory (LSTM) is proposed in this paper. The powerful nonlinear mapping capabilities of RNN in time series make it effective along with the higher learning capabilities of long sequences of LSTM. The proposed method is tested and validated through available real-world data sets. A comparison of LSTM is then made with two traditionally available techniques, exponential smoothing and auto-regressive integrated moving average model (ARIMA). Real data from 12 houses over three months is used to evaluate and validate the performance of load forecasts performed using the three mentioned techniques. LSTM model has achieved the best results due to its higher capability of memorizing large data in time series-based predictions.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Similar to An exploratory analysis on half hourly electricity load patterns leading to higher performances in neural network predictions (20)
Advanced Computing: An International Journal (ACIJ) is a peer-reviewed, open access peer-reviewed journal that publishes articles which contribute new results in all areas of the advanced computing. The journal focuses on all technical and practical aspects of high performance computing, green computing, pervasive computing, cloud computing etc. The goal of this journal is to bring together researchers and a practitioners from academia and industry to focus on understanding advances in computing and establishing new collaborations in these areas.
Authors are solicited to contribute to the journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in the areas of computing.
Call for Papers - Advanced Computing An International Journal (ACIJ) (2).pdfacijjournal
Submit your Research Papers!!!
Advanced Computing: An International Journal ( ACIJ )
ISSN: 2229 -6727 [Online] ; 2229 - 726X [Print]
Webpage URL: http://airccse.org/journal/acij/acij.html
Submission URL: http://coneco2009.com/submissions/imagination/home.html
Submission Deadline : April 08, 2023
Here's where you can reach us : acijjournal@yahoo.com or acij@aircconline
Advanced Computing: An International Journal (ACIJ
)
is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the advancedcomputing. The journal focuses on all technical and practical aspects of high performancecomputing, green computing, pervasive computing, cloud computing etc. The goal of this journalis to bring together researchers anda practitioners from academia and industry to focus onunderstanding advances in computing and establishing new collaborations in these areas
Submit your Research Papers!!!
Advanced Computing: An International Journal ( ACIJ )
ISSN: 2229 -6727 [Online] ; 2229 - 726X [Print]
Webpage URL: http://airccse.org/journal/acij/acij.html
Submission URL: http://coneco2009.com/submissions/imagination/home.html
Here's where you can reach us : acijjournal@yahoo.com or acij@aircconline.com
7thInternational Conference on Data Mining & Knowledge Management (DaKM 2022)acijjournal
7thInternational Conference on Data Mining & Knowledge Management (DaKM 2022)provides a forum for researchers who address this issue and to present their work in a peer-reviewed forum.
7thInternational Conference on Data Mining & Knowledge Management (DaKM 2022)acijjournal
7thInternational Conference on Data Mining & Knowledge Management (DaKM 2022)provides a forum for researchers who address this issue and to present their work in a peer-reviewed forum.
7thInternational Conference on Data Mining & Knowledge Management (DaKM 2022)acijjournal
7thInternational Conference on Data Mining & Knowledge Management (DaKM 2022)provides a forum for researchers who address this issue and to present their work in a peer-reviewed forum.
4thInternational Conference on Machine Learning & Applications (CMLA 2022)acijjournal
4thInternational Conference on Machine Learning & Applications (CMLA 2022)will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of on Machine Learning & Applications. The aim of the conference is to provide a platform to the researchers and practitioners from both academia as well as industry to meet and share cutting-edge development in the field.
7thInternational Conference on Data Mining & Knowledge Management (DaKM 2022)acijjournal
7thInternational Conference on Data Mining & Knowledge Management (DaKM 2022)provides a forum for researchers who address this issue and to present their work in a peer-reviewed forum.Authors are solicited to contribute to the conference by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in the following areas, but are not limited to these topics only.
3rdInternational Conference on Natural Language Processingand Applications (N...acijjournal
3rdInternational Conference on Natural Language Processing and Applications (NLPA 2022)will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of Natural Language Computing and its applications. The Conference looks for significant contributions to all major fieldsof the Natural Language processing in theoretical and practical aspects.
4thInternational Conference on Machine Learning & Applications (CMLA 2022)acijjournal
4thInternational Conference on Machine Learning & Applications (CMLA 2022)will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of on Machine Learning & Applications. The aim of the conference is to provide a platform to the researchers and practitioners from both academia as well as industry to meet and share cutting-edge development in the field.
Graduate School Cyber Portfolio: The Innovative Menu For Sustainable Developmentacijjournal
In today’s milieu, new demands and trends emerge in the field of Education giving teachers of Higher Education Institutions (HEI’s) no choice but to be innovative to cope with the fast changing technology. To be naturally innovative, a graduate school teacher needs to be technologically and pedagogically competent. One of the ways to be on this level is by creating his cyber portfolio to support students’ eportfolio for lifelong learning. Cyber portfolio is an innovative menu for teachers who seek out strategies to integrate technology in their lessons. This paper presents a straightforward preparation on how to innovate a cyber portfolio that has its practical and breakthrough solution against expensive and inflexible vended software which often saddle many universities. Additionally, this cyber portfolio is free and it addresses the 21st century skills of graduate students blended with higher order thinking skills, multiple intelligence, technology and multimedia.
Genetic Algorithms and Programming - An Evolutionary Methodologyacijjournal
Genetic programming (GP) is an automated method for creating a working computer program from a high-level problem statement of a problem. Genetic programming starts from a high-level statement of “what needs to be done” and automatically creates a computer program to solve the problem. In artificial intelligence, genetic programming (GP) is an evolutionary algorithm-based methodology inspired by biological evolution to find computer programs that perform a user defined task. It is a specialization of genetic algorithms (GA) where each individual is a computer program. It is a machine learning technique used to optimize a population of computer programs according to a fitness span determined by a program's ability to perform a given computational task. This paper presents a idea of the various principles of genetic programming which includes, relative effectiveness of mutation, crossover, breeding computer programs and fitness test in genetic programming. The literature of traditional genetic algorithms contains related studies, but through GP, it saves time by freeing the human from having to design complex algorithms. Not only designing the algorithms but creating ones that give optimal solutions than traditional counterparts in noteworthy ways.
Data Transformation Technique for Protecting Private Information in Privacy P...acijjournal
Data mining is the process of extracting patterns from data. Data mining is seen as an increasingly important tool by modern business to transform data into an informational advantage. Data
Mining can be utilized in any organization that needs to find patterns or relationships in their data. A group of techniques that find relationships that have not previously been discovered. In many situations, the extracted patterns are highly private and it should not be disclosed. In order to maintain the secrecy of data,
there is in need of several techniques and algorithms for modifying the original data in order to limit the extraction of confidential patterns. There have been two types of privacy in data mining. The first type of privacy is that the data is altered so that the mining result will preserve certain privacy. The second type of privacy is that the data is manipulated so that the mining result is not affected or minimally affected. The aim of privacy preserving data mining researchers is to develop data mining techniques that could be
applied on data bases without violating the privacy of individuals. Many techniques for privacy preserving data mining have come up over the last decade. Some of them are statistical, cryptographic, randomization methods, k-anonymity model, l-diversity and etc. In this work, we propose a new perturbative masking technique known as data transformation technique can be used for protecting the sensitive information. An
experimental result shows that the proposed technique gives the better result compared with the existing technique.
Advanced Computing: An International Journal (ACIJ) acijjournal
Advanced Computing: An International Journal (ACIJ) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the advanced computing. The journal focuses on all technical and practical aspects of high performance computing, green computing, pervasive computing, cloud computing etc. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on understanding advances in computing and establishing new collaborations in these areas.
E-Maintenance: Impact Over Industrial Processes, Its Dimensions & Principlesacijjournal
During the course of the industrial 4.0 era, companies have been exponentially developed and have
digitized almost the whole business system to stick to their performance targets and to keep or to even
enlarge their market share. Maintenance function has obviously followed the trend as it’s considered one
of the most important processes in every enterprise as it impacts a group of the most critical performance
indicators such as: cost, reliability, availability, safety and productivity. E-maintenance emerged in early
2000 and now is a common term in maintenance literature representing the digitalized side of maintenance
whereby assets are monitored and controlled over the internet. According to literature, e-maintenance has
a remarkable impact on maintenance KPIs and aims at ambitious objectives like zero-downtime.
10th International Conference on Software Engineering and Applications (SEAPP...acijjournal
10th International Conference on Software Engineering and Applications (SEAPP 2021) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of Software Engineering and Applications. The goal of this Conference is to bring together researchers and practitioners from academia and industry to focus on understanding Modern software engineering concepts and establishing new collaborations in these areas.
10th International conference on Parallel, Distributed Computing and Applicat...acijjournal
10th International conference on Parallel, Distributed Computing and Applications (IPDCA 2021) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of Parallel, Distributed Computing. Original papers are invited on Algorithms and Applications, computer Networks, Cyber trust and security, Wireless networks and mobile Computing and Bioinformatics. The aim of the conference is to provide a platform to the researchers and practitioners from both academia as well as industry to meet and share cutting-edge development in the field.
DETECTION OF FORGERY AND FABRICATION IN PASSPORTS AND VISAS USING CRYPTOGRAPH...acijjournal
In this paper, we present a novel solution to detect forgery and fabrication in passports and visas using
cryptography and QR codes. The solution requires that the passport and visa issuing authorities obtain a
cryptographic key pair and publish their public key on their website. Further they are required to encrypt
the passport or visa information with their private key, encode the ciphertext in a QR code and print it on
the passport or visa they issue to the applicant.
The issuing authorities are also required to create a mobile or desktop QR code scanning app and place it
for download on their website or Google Play Store and iPhone App Store. Any individual or immigration
authority that needs to check the passport or visa for forgery and fabrication can scan its QR code, which
will decrypt the ciphertext encoded in the QR code using the public key stored in the app memory and
displays the passport or visa information on the app screen. The details on the app screen can be
compared with the actual details printed on the passport or visa. Any mismatch between the two is a clear
indication of forgery or fabrication.
Discussed the need for a universal desktop and mobile app that can be used by immigration authorities and
consulates all over the world to enable fast checking of passports and visas at ports of entry for forgery
and fabrication.
Detection of Forgery and Fabrication in Passports and Visas Using Cryptograph...acijjournal
In this paper, wepresenta novel solution to detect forgery and fabrication in passports and visas using cryptography and QR codes. The solution requires that the passport and visa issuing authorities obtain a cryptographic key pair and publish their public key on their website. Further they are required to encrypt the passport or visa information with their private key, encode the ciphertext in a QR code and print it on the passport or visa they issue to the applicant.
The issuing authorities are also required to create a mobile or desktop QR code scanning app and place it for download on their website or Google Play Store and iPhone App Store. Any individual or immigration authority that needs to check the passport or visa for forgery and fabrication can scan its QR code, which will decrypt the ciphertext encoded in the QR code using the public key stored in the app memory and displays the passport or visa information on the app screen. The details on the app screen can be compared with the actual details printed on the passport or visa. Any mismatch between the two is a clear indication of forgery or fabrication.
Discussed the need for a universal desktop and mobile app that can be used by immigration authorities and consulates all over the world to enable fast checking of passports and visas at ports of entry for forgery and fabrication.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
An exploratory analysis on half hourly electricity load patterns leading to higher performances in neural network predictions
1. International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 5, No. 3, May 2014
DOI : 10.5121/ijaia.2014.5303 37
AN EXPLORATORY ANALYSIS ON HALF-HOURLY
ELECTRICITY LOAD PATTERNS LEADING TO
HIGHER PERFORMANCES IN NEURAL NETWORK
PREDICTIONS
K.A.D. Deshani1
, M.D.T. Attygalle2
, L.L. Hansen3
, A. Karunaratne4
1, 2, 4
Department of Statistics, University of Colombo, Colombo 03, Sri Lanka
3
School of Computing, Engineering and Mathematics, University of Western Sydney,
Australia
ABSTRACT
Accurate prediction of electricity demand can bring extensive benefits to any country as the forecasted
values help the relevant authorities to take decisions regarding electricity generation, transmission and
distribution appropriately. The literature reveals that, when compared to conventional time series
techniques, the improved artificial intelligent approaches provide better prediction accuracies. However,
the accuracy of predictions using intelligent approaches like neural networks are strongly influenced by the
correct selection of inputs and the number of neuro-forecasters used for prediction. Deshani, Hansen,
Attygalle, & Karunarathne (2014) suggested that a cluster analysis could be performed to group similar
day types, which contribute towards selecting a better set of neuro-forecasters in neural networks. The
cluster analysis was based on the daily total electricity demands as their target was to predict the daily
total demands using neural networks. However, predicting half-hourly demand seems more appropriate
due to the considerable changes of electricity demand observed during a particular day. As such clusters
are identified considering half-hourly data within the daily load distribution curves. Thus, this paper is an
improvement to Deshani et. al. (2014), which illustrates how the half hourly demand distribution within a
day, is incorporated when selecting the inputs for the neuro-forecasters.
KEYWORDS
Clustering, Silhouette plots, Improve performance, Load curve prediction
1. INTRODUCTION
Predicting the future electricity demand is an essential task for a country, as a huge amount of
money could be saved by utilizing the available electricity generation options. In this scenario,
increasing the accuracy of short-term predictions is very crucial, as decisions regarding the
required load, has to be taken within a short period of time. Literature regarding short-term load
forecasting techniques consist of both conventional time series models and artificial intelligent
approaches from many fields mostly in the field of engineering. To develop a dynamic
forecasting system, intelligent approaches yield better results than conventional time series
techniques as they could be adapted to suit novel conditions and handle more complex patterns in
data. However, the accuracy of predictions using intelligent approaches like neural networks are
strongly influenced by the correct selection of inputs and the number of neuro-forecasters used
for prediction. (Farahat & Talaat, 2012; Barzamini, Hajati, Gheisari, & Motamadinejad, 2012;
Nagi, Yap, Tiong & Ahmed, 2008). Deshani, Hansen, Attygalle, & Karunarathne (2014)
2. International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 5, No. 3, May 2014
38
suggested how a cluster analysis could be performed to group similar day types, which contribute
towards selection of a better set of neuro-forecasters in neural networks. Their proposed cluster
analysis was based on the total daily electricity demand and each day was assigned to one of the
three clusters suggested from the analysis. However, electricity demand varies in accordance with
consumers’ activities with respect to time of the day and the day of the week. As a result of these
variations, the hourly load requirement is never a constant throughout a particular day.
This paper presents a cluster analysis, performed to identify intra-day clusters and to group
similar day types within those clusters respect to half-hourly electricity demand. Even though
many external causes like metrological conditions such as temperature, rainfall, humidity, wind
speed and cloud cover, economic and demographic factors influence the electricity demand, this
paper has considered only a single input, which is the day type. The main focus has been given to
illustrate how data mining techniques can be complimented by cluster analysis in giving efficient
predictions.
A dataset consisting of half-hourly electricity demands in Sri Lanka was considered for the period
of 01st January 2008 to 31st December 2012.
2. LITERATURE REVIEW
Literature related to load forecasting reveal that higher prediction accuracies could be obtained
when using intelligent techniques when compared to using conventional statistical techniques
(Farahat & Talaat, 2012; Barzamini, Hajati, Gheisari, & Motamadinejad, 2012; Nagi, Yap, Tiong
& Ahmed, 2008). Many researchers point out the importance of using intelligent techniques in
situations where quick weather changes lead to fail accurate predictions. (Seetha & Saravanan,
2007; Senjyu, Takara, Uezato, & Funabashi, 2002; Barzamini et al., 2012). Some of those popular
intelligent techniques used in the literature are neural networks, fuzzy inference systems, genetic
algorithms and expert systems.
Many researches had used the effect of different day types to enhance the load predictions
considering their own country’s situations. The literature reveals that, Soared and Medeiros
(2008) had incorporated the maximum number of day types to their model, as Sunday - Saturday,
holiday, working day after holiday, working day before holiday, working day between a holiday
and weekend, Saturday after a holiday, working only during the mornings, working only during
the afternoons and Special holidays. Another research considers seven days of the week and bank
holidays as day types, and a principal component analysis had been performed accordingly and a
segmentation scheme based on the first principal direction had been used to cluster similar
months (Cho, Goude, Brossat, & Yao, 2013). Unlike these approaches, (Barzamini et al., 2012)
had divided the weekly days into four categories based on unique load lags and had incorporated
to the model. Considering the above, this research considers thirteen day types, which can be
considered as different in Sri Lankan context.
Even though thirteen day types are considered, including all these day types into the model will
complicate the prediction process. As such, the ‘day type’ will be clustered into similar day types
in order to avoid complexities in the computation operations and to reduce forecasting error when
training the neural networks (Barzamini et al., 2012; Seetha & Saravanan, 2007). They have
discussed how accurate predictions are made when the inputs are wisely chosen to be fed into the
neural network having different neuro-load forecasters to train similar featured loads. Literature
also shows that in some research, similar days had been clustered based on experience of the
experts of electricity supplying companies rather than performing any statistical analysis (Cho et
al., 2013). Moreover, to understand energy consumption patterns in industrial parks, a cascade
application of a Self-Organizing Map and a clustering k-means algorithm had been performed by
3. International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 5, No. 3, May 2014
39
Hernandez, Baladron, Aguiar, Carro & Esguevillas (2012). Even though no study has considered
performing a cluster analysis, this study focuses on a statistical analysis based on k-means
clustering to complement the neural network approach.
3. METHODOLOGY
3.1. K-Means clustering
K-means clustering is a partitioning method. It partitions data into k mutually exclusive clusters.
Unlike hierarchical clustering, k-means clustering operates on actual observations (rather than the
larger set of dissimilarity measures), and creates a single level of clusters. The distinctions mean
that k-means clustering is often more suitable than hierarchical clustering for large amounts of
data.
Each cluster in the partition is defined by its member objects and by its centroid, or center. The
centroid for each cluster is the point to which the sum of distances from all objects in that cluster
is minimized. kmeans computes cluster centroids differently for each distance measure, to
minimize the sum with respect to the measure that you specify.
Distance measure: In this paper, ‘kmeans’ function in Matlab software has been used with
appropriate distance measures based on the clustering data.
Distance
Measure
Description
sqEuclidean Squared Euclidean distance. Each centroid is the mean of the points in
that cluster.
cityblock Sum of absolute differences. Each centroid is the component-wise
median of the points in that cluster
correlation One minus the sample correlation between points (treated as
sequences of values). Each centroid is the component-wise mean of
the points in that cluster, after centering and normalizing those points
to zero mean and unit standard deviation.
Determining the number of clusters: To get an idea of how well-separated the resulting clusters
are silhouette plot can be used. The silhouette plot displays a measure of how close each point in
one cluster is to points in the neighboring clusters. This measure ranges from +1, indicating points
that are very distant from neighboring clusters, through 0, indicating points that are not distinctly
in one cluster or another, to -1, indicating points that are probably assigned to the wrong cluster.
Avoiding Local Minima: Like many other types of numerical minimizations, the solution that
kmeans reaches often depends on the starting points. It is possible for kmeans to reach a local
minimum, where reassigning any one point to a new cluster would increase the total sum of point-
to-centroid distances, but where a better solution does exist. However using 'replicates' one can
overcome that problem by taking the one with the lowest total sum of distances, over all
replicates as the final answer. (The MathWorks)
4. International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 5, No. 3, May 2014
40
4. ANALYSIS AND INTERPRETATION
4.1. Identifying Intra-day Clusters
A daily electricity load curve represents the electricity load as a function of time. Figure 1
displays fluctuations of half-hourly electricity demand of Sri Lanka from January 2008 to
December 2012.
Figure 1. Fluctuations of electricity load curves over the years
From Figure 1, it is clearly shown that the daily demand curves appear to be having a similar
pattern over the years, with a gradual increase of load from year to year. It is also noticeable that
there are two sudden increments, one in the morning and the other at night, in each daily laod
curve. The peak demand, starting around 6.30p.m and ending around 9.30p.m is identified as the
most crucial aspect that needs to be addressed as the generation cost is very expensive during that
time when compared to non-peak hours. A colour map scaling plot (Figure 2) shows that three
main clusters can be seen.
Figure 2. Colour map scaling of electricity load curves over the years
Figure 2. Colour map scaling plot of the load curves
1
1
2
2
3
5. International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 5, No. 3, May 2014
41
In order to statistically validate the number of clusters within a day, 48 half-hours were clustered
using the k-means algorithm. Even though there seems a trend in the data, it is assumed that the
trend may not affect the clustering process as it is same for all the half-hour periods. The
maximum average Silhouette Value of 0.8053 was resulted from the 3 clusters instance, and
therefore three clusters were selected as the most appropriate number of clusters. In order to avoid
the iterations to end up at local minimas, each clustering procedure was replicated 5 times and
considered the one with the lowest total sum of distances.
Figure 3. Silhouette Plots to determine the correct number of clusters
Based on the cluster analysis results, Table 1 displays how the 48 half-hours were grouped to the
three clusters.
Table 1. Time periods identified for the three clusters
The three clusters can be interpreted in the following way. Cluster 1 represents the time period
where the demand is minimum, and is the time during which most people are a sleep. This is the
time period 11:00PM – 05:00AM. In the same cluster, the time 7.30-8.00 am could possibly be
the time where most people travel, and are in vehicles. Cluster 2 with the maximum demand is the
peak demand period of 06:30PM - 09:00PM. This is the time where most families live in their
6. International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 5, No. 3, May 2014
42
homes engaged in various activities, and hence the usage of electricity is a maximum. Finally,
Cluster 3 includes the rest of the hours of the day.
As the three clusters seems to justify the electricity usage in the Sri Lankan context, predicting
half-hourly demands seems more appropriate. As Deshani et al. (2014) have found that there is a
day effect for the daily electricity demand, the electricity usage patterns for different days within
each of these chosen clusters will also be analysed.
4.2. Clustering Similar Day Types within Intra-day Clusters
In the dataset, each day had been assigned into one of the thirteen categories; Sunday, Monday,
Tuesday, Wednesday, Thursday, Friday, Saturday, Poyaday, PBM Holiday (Public, Bank,
Mercantile), PB Holiday (Public, Bank), Working day before holiday, Working day after holiday,
Working day between a holiday and a weekend, Saturday after holiday (Deshani et al., 2014). We
use the same day classification in order to analyse the day effects within each of the identified
clusters.
Intra-day Cluster 1:
• 11:00PM - 05:00AM
Figure 4. Fluctuation of electricity demand during the 11:00PM-05:00AM time period
Recall that the cluster 1 is times between 11:00PM - 05:00AM and the half-hour 7.30-8.00.
Figure 4 displays how electricity demand varies within 11:00PM - 05:00AM time period. The
load patterns during this period do not cluster based on the speciality of the day or month. As
stated before, this seems reasonable, as during this time people do not engage in much activities.
7. International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 5, No. 3, May 2014
43
• 07:30AM
(a) (b)
Figure 4. Fluctuation of electricity demand in the 07:30AM time period
(a) Original series (b) First differenced series
Figure 4 is drawn considering the half-hour 7:30AM - 8:00 AM. From the unit root test it was
identified that there is a trend in the series (Figure 4 (a)) and was de-trended by taking the first
difference (Figure 4 (b)). The de-trended data was clustered and two sub-clusters (maximum
average Silhouette Value of 0.7012) could be identified.
Table 2. Distribution of the days between the two clusters for 07:30AM
It could be seen that all the weekdays, Sundays and Holidays had been clustered as cluster A and
Saturdays and most of the working days before holidays had been classified for to cluster B.
Cluster
A B
Monday 96.3% 3.7%
Tuesday 88.7% 11.3%
Wednesday 84.4% 15.6%
Thursday 85.9% 14.1%
Friday 75.5% 24.5%
Saturday 1.8% 98.2%
Sunday 98.0% 2.0%
Poyaday 79.0% 21.0%
PBM Holiday 79.5% 20.5%
PB Holiday 47.8% 52.2%
Working day before a holiday 32.3% 67.7%
Working day after a holiday 98.4% 1.6%
Working day between a holiday and weekend 59.3% 40.7%
Saturday after holiday 23.5% 76.5%
8. International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 5, No. 3, May 2014
44
Intra-day Cluster 2: 06:30PM – 09:00PM
Figure 5. Fluctuation of electricity demand during the 06:30PM-09:30PM time period
According to Figure 5, load patterns within cluster 2’s 06:30PM – 09:00PM time period seem to
have a periodic pattern and a slight trend. The clustering was based on k-means clustering using
the ‘correlation’ distance measure which resulted in two sub-clusters having the maximum
average Silhouette Value of 0.8313. There was no need to remove the trend prior to clustering as
the distance measure treated the point as a sequence of values and normalized values had been
used for the calculations. The day types did not show any significant clustering result and hence
considered months to identify for any clustering effect.
Table 3. Distribution of the days between the two clusters for 06:30PM – 09:00PM
This peak time is shown to be clustered from January to August and October to December.
September seems to be different than other months. The reason could be that part of September
belonging to one cluster and the other part to the other cluster.
Cluster
G H
Month January 76.8% 23.2%
February 96.5% 3.5%
March 97.4% 2.6%
April 94.7% 5.3%
May 98.7% 1.3%
June 100.0% 0.0%
July 98.7% 1.3%
August 98.7% 1.3%
September 60.0% 40.0%
October 1.3% 98.7%
November .7% 99.3%
December 5.8% 94.2%
9. International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 5, No. 3, May 2014
45
Intra-day Cluster 3:
• 05:30AM – 07:00AM
Figure 6. Fluctuation of electricity demand during the 05:30AM-07:00AM time period
Figure 6 displays how half-hourly electricity demands vary during the 05:30AM-07:00AM time
period of the cluster 2. Two sub clusters (maximum average Silhouette Value of 0.0.7889) were
prominent when clustering using the ‘correlation’ distance measure, for the cluster 3. Even
though a clustering effect could be identified based on the type of the day, a better clustering
affect could be seen considering both the type of the day and the month. It is to be noted that
certain day types were combined in order to avoid complexities.
Table 4. Distribution of the days between the two clusters for 05:30AM-07:00AM
Month Day Type X Y
January Week day 58.3% 41.7%
Week end 0.0% 100.0%
Holidays 0.0% 100.0%
February Week day 74.7% 25.3%
Week end 0.0% 100.0%
Holidays 6.7% 93.3%
March Week day 95.2% 4.8%
Week end 0.0% 100.0%
Holidays 0.0% 100.0%
April Week day 63.8% 36.2%
Week end 7.9% 92.1%
Holidays 22.2% 77.8%
May Week day 99.0% 1.0%
Week end 36.6% 63.4%
Holidays 80.0% 20.0%
June Week day 97.1% 2.9%
Week end 9.8% 90.2%
Holidays 100.0% 0.0%
10. International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 5, No. 3, May 2014
46
July Week day 99.1% .9%
Week end 2.3% 97.7%
Holidays 20.0% 80.0%
August Week day 17.1% 82.9%
Week end 0.0% 100.0%
Holidays 37.5% 62.5%
September Week day 87.5% 12.5%
Week end 10.3% 89.7%
Holidays 42.9% 57.1%
October Week day 98.0% 2.0%
Week end 16.3% 83.7%
Holidays 50.0% 50.0%
November Week day 93.0% 7.0%
Week end 2.5% 97.5%
Holidays 0.0% 100.0%
December Week day 16.5% 83.5%
Week end 0.0% 100.0%
Holidays 0.0% 100.0%
When considering cluster 3,the early morning time period, all weekends over the year had been
clustered to cluster Y. Moreover, holidays in many months have also been clustered to Y. In
addition, the weekdays of August and December, which are school holiday times in Sri Lanka,
are clustered in Y. In contrast, in May and June, the holidays are clustered into cluster X as
normal working days, as the electricity demand is comparatively higher in Wesak and Poson
festival season.
• 08:00AM – 06:00PM
Figure 7. Fluctuation of electricity demand during the 08:00AM – 06:00PM time period
The electricity demand during 8.00am 6.00pm is shown in Figure 7. During this period, a
slight trend can be seen and this period is the normal working time period of the day. As
earlier time periods two sub-clusters could be identified for this period also.
11. International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 5, No. 3, May 2014
47
Table 5. Distribution of the days between the two clusters for 08:00AM – 06:00PM
Day Type
Cluster
P Q
Monday 1.9% 98.1%
Tuesday 3.8% 96.2%
Wednesday 3.3% 96.7%
Thursday 4.7% 95.3%
Friday 5.2% 94.8%
Saturday 52.4% 47.6%
Sunday 98.8% 1.2%
Poyaday 95.2% 4.8%
PBM Holiday 100.0% 0.0%
PB Holiday 34.8% 65.2%
Working day before a holiday 9.7% 90.3%
Working day after a holiday 9.7% 90.3%
Working day between a holiday and weekend 14.8% 85.2%
Saturday after holiday 82.4% 17.6%
It could be clearly seen that all working days cluster to Q whereas Sundays, Poyadays, PBM
Holidays, Saturday after holidays clustered into P.
• 09:00PM – 10:30PM
Figure 8. Fluctuation of electricity demand during the 09:00PM – 10:30PM time period
Figure 8 displays how electricity demand gradually increases within cluster 3’s 09:00PM –
10:30PM time period. During this period two sub-clusters were identified as the most
appropriate number of clusters.
12. International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 5, No. 3, May 2014
48
Table 6. Distribution of the days between the two clusters for 09:00PM – 10:30PM
Cluster
M N
Monday 37.0% 63.0%
Tuesday 43.9% 56.1%
Wednesday 42.9% 57.1%
Thursday 50.7% 49.3%
Friday 81.6% 18.4%
Saturday 93.3% 6.7%
Sunday 6.1% 93.9%
Poyaday 22.6% 77.4%
PBM Holiday 25.6% 74.4%
PB Holiday 47.8% 52.2%
Working day before a holiday
87.1% 12.9%
Working day after a holiday
45.2% 54.8%
Working day between a holiday and
weekend
81.5% 18.5%
Saturday after holiday 76.5% 23.5%
Even though many day types did not exhibit a clear clustering approach based on the day type,
Friday, Saturday, Working day before a holiday, Working day between a holiday and a weekend
and Saturday after holidays clustered to cluster M. Sundays, Poyadays and PB holidays clustered
to cluster N.
5. CONCLUSION
The electricity demand varies in accordance with consumers’ activities with respect to time of the
day and the day of the week. When predicting half-hourly electricity demand in a short term
manner, these patterns influence the prediction process a lot. Therefore, a cluster analysis was
carried out to identify intra-day clusters and to group similar day types within those clusters. The
results of the cluster analysis were used to select a better set of neuro-forecasters for neural
network predictions.
The K-means clustering algorithm was used to cluster data and three intra-day clusters were
identified. Cluster 1 represents the time period where most people are a sleep (11:00PM –
05:00AM) and the time where most people may possible be traveling, and are in vehicles
(07:30AM-08:00AM). Cluster 2 is the peak demand period of 06:30PM - 09:00PM where most
families live in their homes engaged in various activities. Finally, cluster 3 includes the rest of the
hours of the day. Based on the day type or the month, the three main intra-day clusters were
further grouped and most appropriate number of clusters for each time period was considered as
the number of neuro-forecasters used for prediction in that period. The number of neuro-
forecasters and training set selection criteria has been shown in Table 7. Using this input selection
process, the prediction performances of the neural networks can be improved.
13. International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 5, No. 3, May 2014
49
Table 7. Section of neuro-forecasters based on the cluster analysis
Cluster Time Period/s
Number of
Neuro-
forecasters
Training set selected based
on
1
11:00PM - 05:00AM 1 All data can be used
07:30AM 2 day type
2 06:30PM - 09:00PM 2 month
3
05:30AM - 07:00AM 2 month and day type
08:00AM - 06:00PM 2 day type
09:30PM-10:30PM 2 day type
If the considered time duration could be expanded, results can be improved and will be more
accurate as there will be more observations for the sub categories.
5. ACKNOWLEDGEMENTS
I would like to thank my supervisors Dr. M.D.T. Attygalle, Dr. Liwan Liyanage Hansen and Ms.
A. Karunaratne for the invaluable guidance, advice and the support given to do my research. Also
I would like to thank the Vice Chancellor of University of Colombo, Dr. W.K. Hirimburegama,
the Dean of the faculty, Prof. K.R.R. Mahanama for financially supporting this research through
the University of Colombo Research Grant. I am grateful for the Ceylon Electricity Board of Sri
Lanka for providing the data set and the background information about the electricity demand. At
last but not least to my loving husband, my little son, my mother my father and my sister for their
invaluable help and patience during the period I do my research.
REFERENCES
[1] Barzamini, R., Hajati, F., Gheisari, S., & Motamadinejad, M. B. (2012). Short Term Load Forecasting
using Multi-layer Perception and Fuzzy Inference Syatems for Islamic Countries. Journal of Applied
Sciences , pp40-47.
[2] Deshani, K.A.D, Hansen, L. L., Attygalle, M.D.T, & Karunarathne, A. (2014). Improved Neural
Network Prediction Performances of Electicity Demand: Modifying Inputs through Clustering.
Second International Conference on Computational Science and Engineering (pp. 137-147). India:
AIRCC.
[3] Farahat, M. A., & Talaat, M. (2012). Short-Term Load Forecasting Using Curve Fitting Prediction
Optimized by Genetic Algorithms. International Journal of Energy Engineering , pp23-28.
[4] Hernandez, L., Baladron, C., Aguiar, J. M., Carro, B., & Esguevillas, A. S. (2012). Classification and
Clustering of Electricity Demand Patterns in Industrial Parks. Energies , pp5215-5227.
[5] Nagi, J., Yap, K. S., Tiong, S. K., & Ahmed, S. K. (2008). Electrical Power Load Forecasting using
Hybrid Self-Organizing Maps and Support Vector Machines. International Power Engineering and
Optimization Conference, (pp. 51-56). Selangor.
[6] Seetha, H., & Saravanan, R. (2007). Short Term Electricity Load Prediction Using Fuzzy BP . Journal
of Computing and Information Technology , pp267-282.
14. International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 5, No. 3, May 2014
50
[7] Soares, L. J., & Medeiros, M. C. (2008). Modeling and Forecasting short-term electricity load: A
comparison of methods with an application to Brazilian data. International Journal of Forecasting ,
pp630-644.
[8] The MathWorks, I. Statistics Toolbox.
Authors
Ms. K.A.D. Deshani is a Lecturer (Probationary) in the Department of Statistics,
University of Colombo since March 2011. Before she was absorbed to the permanent
cadre, she has been working in the same department as an Assistant Lecture, obtaining a
B.Sc Special Degree in Statistics with Computer Science in 2008. She has a keen interest
in developing computer systems to incorporate the dynamic nature to statistical
interpretations. Her research interests are in the areas of Operational Research and Data
Mining. Currently, she is a member of the research team of the project titled “Developing an Economical
Strategy for the Future Electricity Generation Procedure in Sri Lanka”; which received the University of
Colombo research grants 2011 which is carried out in collaboration with University of Western Sydney,
Australia and Ceylon Electricity Board, Sri Lanka. In June 2012 she registered for a M.Phil under the said
grant. With her interest in research, she was a contributed speaker at the International Statistics Conference
on the publication titled “Analysis of Efficiency of a Multi-Queue against a Single Queue with Many
Servers: A Study on Advertisement Counter Queues at a Leading Newspaper Company” and was published
in the Proceedings of the International Statistics Conference 2011, Colombo Sri Lanka. In 2013 a research
paper titled “A Study of the Dynamic Behaviour of Daily Load Curve for Short Term Predictions” was
published in the proceedings of the International Symposium for Next Generation Infrastructure (ISNGI)
Australia.
------------------------------------------------------
Dr. Attygalle has been Head of the Department of Statistics from 2010 to present. As a
Senior Lecturer attached to the Department of Statistics from 2006 to present, she has
been routinely involved in Teaching, Research and many other administrative roles such
as being the Coordinator of the MSc in Applied Statistics, BSc Special degree and Joint
Special degree programmes conducted by the Department of Statistics. She holds
professional memberships of the Sri Lanka Association for the Advancement of Science
(SLASS) and the Institute of Applied Statistics -Sri Lanka.
Dr. Attygalle obtained her PhD in Statistics from the Lancaster University, UK, and a MSc in Statistics
from the Warwick University, UK, in 2006 and 1996 respectively. Prior to this she completed a Diploma in
Applied Statistics, from the University of Colombo in 1992. She obtained her first degree majoring in
Statistics, Applied Mathematics and Pure Mathematics also from the University of Colombo, graduating
with a first class in 1987. As professional qualifications she has obtained the Staff and Educational
Development Association (SEDA)-UK accreditation as a teacher in higher education in 2005 and the
Certificate in Teaching in Higher Education (CTHE) by the Staff Development Centre of the University of
Colombo also in 2005. Her key research areas are Statistical Modelling, Model Diagnostics, Data
Visualization and Sports Statistics. As a senior lecturer she has supervised many undergraduate and
postgraduate research projects. Currently she is so-supervising two MPhil/PhD research students. She had
been instrumental in developing industry links with many private and government organisations over the
years and thus has carried out many consultancy projects and other training programs through the
Department of Statistics. She had also won one of the University of Colombo research grants in 2011.
-------------------------------------------------------
Dr Liwan Liyanage joined University of Western Sydney in the year 1989 with university
level teaching experience at University of Colombo, University of Wollongong and King
Saud University Riyadh, totalling 12 years. Qualifications: B. Sc (First Class), Graduate
Diploma in Applied Statistics, Masters Degree in Theoretical Statistics and the Ph. D. in
the area of Applied Probability gives her the breadth of coverage across the statistics
disciplines. At UWS she has been instrumental in developing many degree programs in
15. International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 5, No. 3, May 2014
51
particular the integrated degree B. Maths and IT using data mining as the integrating tool. Senior lecturer
(1995) and head of program of B. Maths & IT (1999). Her PhD was in random walk models, diffusion and
related applications namely queuing theory and game theory. Thus her initial research was in applied
probability, namely random walk models with difference equations, the master equation models with partial
differential equations, and queuing models. This leads to differential equations representing diffusion and
double diffusion. Her research, bridge the probabilistic models to the differential equation models of
diffusion. Her passion to integrate disciplines and research methods have led her to the current research
areas which include innovative work in “Operational Statistics” a new area developed in collaboration with
UC Berkeley; Optimisation Techniques and Data Mining. Application areas include bio security, public
health, climate change, electricity production and demand. From her 10 PhD/Masters students 8 had
completed the research successfully. She has established ongoing national and international linkages and
research collaborations and 30+ publications and a book chapter. Her paper on Operational Statistics was
the 2nd most downloaded paper in April 2006 from Science Direct’s TOP 25 articles.
-------------------------------------------------------
Mrs. A. Karunarathne is a former head of the Department of Statistics, and also had served
as the former head of Department of Statistics and Computer Science. She had been in the
service for more than 40 years and has been the key person to start the Special Degrees in
Statistics and also initiate the Internship program in the Department. As a senior lecturer
she is conducting lectures mainly in the field of Operational Research and Stochastic
Processes.
She obtained a Diploma di. Sp.(Operational Research) from University of Rome and her
first degree was B.Sc.(Mathematics) from the University of Colombo. She has contributed to the
continuous development and transmittance of statistical knowledge through many diverse avenues, a key
example of which is her involvement in the publication of a book on basics of statistics titled “Moolika
Sankayanaya” written for University entrants and A/L Science Students in Sept 1997. As a senior lecturer
she has supervised many undergraduate and postgraduate research projects.. Her key research areas are
Stochastic Processes, Simulation, Queuing Models and Performance Modelling of Communication
Networks.