In this paper, we discusses about five functionalities of data mining in IOT that affects the performance and that
are: Data anomaly detection, Data clustering, Data classification, feature selection, time series prediction. Some
important algorithm has also been reviewed here of each functionalities that show advantages and limitations as
well as some new algorithm that are in research direction. Here we had represent knowledge view of data
mining in IOT.
Minkowski Distance based Feature Selection Algorithm for Effective Intrusion ...IJMER
Intrusion Detection System (IDS) plays a major role in the provision of effective security to various types of networks. Moreover, Intrusion Detection System for networks need appropriate rule set for classifying network bench mark data into normal or attack patterns. Generally, each dataset is characterized by a large set of features. However, all these features will not be relevant or fully contribute in identifying an attack. Since different attacks need various subsets to provide better detection accuracy. In this paper an improved feature selection algorithm is proposed to identify the most appropriate subset of features for detecting a certain attacks. This proposed method is based on Minkowski distance feature ranking and an improved exhaustive search that selects a better combination of features. This system has been evaluated using the KDD CUP 1999 dataset and also with EMSVM [1] classifier. The experimental results show that the proposed system provides high classification accuracy and low false alarm rate when applied on the reduced feature subsets
C LUSTERING B ASED A TTRIBUTE S UBSET S ELECTION U SING F AST A LGORITHmIJCI JOURNAL
In machine learning and data mining, attribute sel
ect is the practice of selecting a subset o
f most
consequential attributes for utilize in model const
ruction. Using an attribute select method is that t
he data
encloses many redundant or extraneous attributes. W
here redundant attributes are those which sup
ply
no supplemental information than the presently
selected attributes, and impertinent attribut
es offer
no valuable information in any context
SURVEY PAPER ON OUT LIER DETECTION USING FUZZY LOGIC BASED METHODIJCI JOURNAL
Fuzzy logic can be used to reason like humans and can deal with uncertainty other than randomness. Outlier detection is a difficult task to be performed, due to uncertainty involved in it. The outlier itself is a fuzzy concept and difficult to determine in a deterministic way. fuzzy logic system is very promising, since they exactly tackle the situation associated with outliers. Fuzzy logic that addresses the seemingly conflicting goals (i) removing noise, (ii) smoothing out outliers and certain other salient feature. This paper provides a detailed fuzzy logic used for outlier detection by discussing their pros and cons. Thus this is a very helpful document for naive researchers in this field.
The premise of this paper is to discover frequent patterns by the use of data grids in WEKA 3.8 environment. Workload imbalance occurs due to the dynamic nature of the grid computing hence data grids are used for the creation and validation of data. Association rules are used to extract the useful information from the large database. In this paper the researcher generate the best rules by using WEKA 3.8 for better performance. WEKA 3.8 is used to accomplish best rules and implementation of various algorithms.
Improving the performance of Intrusion detection systemsyasmen essam
Intrusion detection systems (IDS) are widely studied by
researchers nowadays due to the dramatic growth in
network-based technologies. Policy violations and
unauthorized access is in turn increasing which makes
intrusion detection systems of great importance. Existing
approaches to improve intrusion detection systems focus on feature selection or reduction since some features are
irrelevant or redundant which when removed improve the
accuracy as well as the learning time.
Minkowski Distance based Feature Selection Algorithm for Effective Intrusion ...IJMER
Intrusion Detection System (IDS) plays a major role in the provision of effective security to various types of networks. Moreover, Intrusion Detection System for networks need appropriate rule set for classifying network bench mark data into normal or attack patterns. Generally, each dataset is characterized by a large set of features. However, all these features will not be relevant or fully contribute in identifying an attack. Since different attacks need various subsets to provide better detection accuracy. In this paper an improved feature selection algorithm is proposed to identify the most appropriate subset of features for detecting a certain attacks. This proposed method is based on Minkowski distance feature ranking and an improved exhaustive search that selects a better combination of features. This system has been evaluated using the KDD CUP 1999 dataset and also with EMSVM [1] classifier. The experimental results show that the proposed system provides high classification accuracy and low false alarm rate when applied on the reduced feature subsets
C LUSTERING B ASED A TTRIBUTE S UBSET S ELECTION U SING F AST A LGORITHmIJCI JOURNAL
In machine learning and data mining, attribute sel
ect is the practice of selecting a subset o
f most
consequential attributes for utilize in model const
ruction. Using an attribute select method is that t
he data
encloses many redundant or extraneous attributes. W
here redundant attributes are those which sup
ply
no supplemental information than the presently
selected attributes, and impertinent attribut
es offer
no valuable information in any context
SURVEY PAPER ON OUT LIER DETECTION USING FUZZY LOGIC BASED METHODIJCI JOURNAL
Fuzzy logic can be used to reason like humans and can deal with uncertainty other than randomness. Outlier detection is a difficult task to be performed, due to uncertainty involved in it. The outlier itself is a fuzzy concept and difficult to determine in a deterministic way. fuzzy logic system is very promising, since they exactly tackle the situation associated with outliers. Fuzzy logic that addresses the seemingly conflicting goals (i) removing noise, (ii) smoothing out outliers and certain other salient feature. This paper provides a detailed fuzzy logic used for outlier detection by discussing their pros and cons. Thus this is a very helpful document for naive researchers in this field.
The premise of this paper is to discover frequent patterns by the use of data grids in WEKA 3.8 environment. Workload imbalance occurs due to the dynamic nature of the grid computing hence data grids are used for the creation and validation of data. Association rules are used to extract the useful information from the large database. In this paper the researcher generate the best rules by using WEKA 3.8 for better performance. WEKA 3.8 is used to accomplish best rules and implementation of various algorithms.
Improving the performance of Intrusion detection systemsyasmen essam
Intrusion detection systems (IDS) are widely studied by
researchers nowadays due to the dramatic growth in
network-based technologies. Policy violations and
unauthorized access is in turn increasing which makes
intrusion detection systems of great importance. Existing
approaches to improve intrusion detection systems focus on feature selection or reduction since some features are
irrelevant or redundant which when removed improve the
accuracy as well as the learning time.
A Survey on Constellation Based Attribute Selection Method for High Dimension...IJERA Editor
Attribute Selection is an important topic in Data Mining, because it is the effective way for reducing dimensionality, removing irrelevant data, removing redundant data, & increasing accuracy of the data. It is the process of identifying a subset of the most useful attributes that produces compatible results as the original entire set of attribute. Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group called a cluster are more similar in some sense or another to each other than to those in other groups (Clusters). There are various approaches & techniques for attribute subset selection namely Wrapper approach, Filter Approach, Relief Algorithm, Distributional clustering etc. But each of one having some disadvantages like unable to handle large volumes of data, computational complexity, accuracy is not guaranteed, difficult to evaluate and redundancy detection etc. To get the upper hand on some of these issues in attribute selection this paper proposes a technique that aims to design an effective clustering based attribute selection method for high dimensional data. Initially, attributes are divided into clusters by using graph-based clustering method like minimum spanning tree (MST). In the second step, the most representative attribute that is strongly related to target classes is selected from each cluster to form a subset of attributes. The purpose is to increase the level of accuracy, reduce dimensionality; shorter training time and improves generalization by reducing over fitting.
Semi-supervised learning approach using modified self-training algorithm to c...IJECEIAES
Burst header packet flooding is an attack on optical burst switching (OBS) network which may cause denial of service. Application of machine learning technique to detect malicious nodes in OBS network is relatively new. As finding sufficient amount of labeled data to perform supervised learning is difficult, semi-supervised method of learning (SSML) can be leveraged. In this paper, we studied the classical self-training algorithm (ST) which uses SSML paradigm. Generally, in ST, the available true-labeled data (L) is used to train a base classifier. Then it predicts the labels of unlabeled data (U). A portion from the newly labeled data is removed from U based on prediction confidence and combined with L. The resulting data is then used to re-train the classifier. This process is repeated until convergence. This paper proposes a modified self-training method (MST). We trained multiple classifiers on L in two stages and leveraged agreement among those classifiers to determine labels. The performance of MST was compared with ST on several datasets and significant improvement was found. We applied the MST on a simulated OBS network dataset and found very high accuracy with a small number of labeled data. Finally we compared this work with some related works.
EFFICIENT FEATURE SUBSET SELECTION MODEL FOR HIGH DIMENSIONAL DATAIJCI JOURNAL
This paper proposes a new method that intends on reducing the size of high dimensional dataset by
identifying and removing irrelevant and redundant features. Dataset reduction is important in the case of
machine learning and data mining. The measure of dependence is used to evaluate the relationship
between feature and target concept and or between features for irrelevant and redundant feature removal.
The proposed work initially removes all the irrelevant features and then a minimum spanning tree of
relevant features is constructed using Prim’s algorithm. Splitting the minimum spanning tree based on the
dependency between features leads to the generation of forests. A representative feature from each of the
forests is taken to form the final feature subset
Introduction to feature subset selection methodIJSRD
Data Mining is a computational progression to ascertain patterns in hefty data sets. It has various important techniques and one of them is Classification which is receiving great attention recently in the database community. Classification technique can solve several problems in different fields like medicine, industry, business, science. PSO is based on social behaviour for optimization problem. Feature Selection (FS) is a solution that involves finding a subset of prominent features to improve predictive accuracy and to remove the redundant features. Rough Set Theory (RST) is a mathematical tool which deals with the uncertainty and vagueness of the decision systems.
Intrusion Detection and Forensics based on decision tree and Association rule...IJMER
This paper present an approach based on the combination of, two techniques using
decision tree and Association rule mining for Probe attack detection. This approach proves to be
better than the traditional approach of generating rules for fuzzy expert system by clustering methods.
Association rule mining for selecting the best attributes together and decision tree for identifying the
best parameters together to create the rules for fuzzy expert system. After that rules for fuzzy expert
system are generated using association rule mining and decision trees. Decision trees is generated for
dataset and to find the basic parameters for creating the membership functions of fuzzy inference
system. Membership functions are generated for the probe attack. Based on these rules we have
created the fuzzy inference system that is used as an input to neuro-fuzzy system. Fuzzy inference
system is loaded to neuro-fuzzy toolbox as an input and the final ANFIS structure is generated for
outcome of neuro-fuzzy approach. The experiments and evaluations of the proposed method were
done with NSL-KDD intrusion detection dataset. As the experimental results, the proposed approach
based on the combination of, two techniques using decision tree and Association rule mining
efficiently detected probe attacks. Experimental results shows better results for detecting intrusions as
compared to others existing methods
Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
The clustering is a without monitoring process and one of the most common data mining techniques. The
purpose of clustering is grouping similar data together in a group, so were most similar to each other in a
cluster and the difference with most other instances in the cluster are. In this paper we focus on clustering
partition k-means, due to ease of implementation and high-speed performance of large data sets, After 30
year it is still very popular among the developed clustering algorithm and then for improvement problem of
placing of k-means algorithm in local optimal, we pose extended PSO algorithm, that its name is ECPSO.
Our new algorithm is able to be cause of exit from local optimal and with high percent produce the
problem’s optimal answer. The probe of results show that mooted algorithm have better performance
regards as other clustering algorithms specially in two index, the carefulness of clustering and the quality
of clustering.
Artificial Neural Content Techniques for Enhanced Intrusion Detection and Pre...AM Publications
This paper presents a novel approach for detecting network intrusions based on a competitive training neural
network. In the paper, the performance of this approach is compared to that of the self-organizing map (SOM), which is a
popular unsupervised training algorithm used in intrusion detection. While obtaining a similarly accurate detection rate as
the SOM does, the proposed approach uses only one forth of the computation times of the SOM. Furthermore, the
clustering result of this method is independent of the number of the initial neurons. This approach also exhibits the ability
to detect the known and unknown network attacks. The experimental results obtained by applying this approach to the
KDD-99 data set demonstrate that the proposed approach performs exceptionally in terms of both accuracy and
computation time.
A Combined Approach for Feature Subset Selection and Size Reduction for High ...IJERA Editor
selection of relevant feature from a given set of feature is one of the important issues in the field of
data mining as well as classification. In general the dataset may contain a number of features however it is not
necessary that the whole set features are important for particular analysis of decision making because the
features may share the common information‟s and can also be completely irrelevant to the undergoing
processing. This generally happen because of improper selection of features during the dataset formation or
because of improper information availability about the observed system. However in both cases the data will
contain the features that will just increase the processing burden which may ultimately cause the improper
outcome when used for analysis. Because of these reasons some kind of methods are required to detect and
remove these features hence in this paper we are presenting an efficient approach for not just removing the
unimportant features but also the size of complete dataset size. The proposed algorithm utilizes the information
theory to detect the information gain from each feature and minimum span tree to group the similar features
with that the fuzzy c-means clustering is used to remove the similar entries from the dataset. Finally the
algorithm is tested with SVM classifier using 35 publicly available real-world high-dimensional dataset and the
results shows that the presented algorithm not only reduces the feature set and data lengths but also improves the
performances of the classifier.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
With the development of database, the data volume stored in database increases rapidly and in the large
amounts of data much important information is hidden. If the information can be extracted from the
database they will create a lot of profit for the organization. The question they are asking is how to extract
this value. The answer is data mining. There are many technologies available to data mining practitioners,
including Artificial Neural Networks, Genetics, Fuzzy logic and Decision Trees. Many practitioners are
wary of Neural Networks due to their black box nature, even though they have proven themselves in many
situations. This paper is an overview of artificial neural networks and questions their position as a
preferred tool by data mining practitioners.
Testing DRAM and Correcting errorsby using Adaptive TechniqueIJERA Editor
DRAM(dynamic random access memory) is most widely used in memorytoday. Leakage power is the main
issue of DRAM cell. Iteffects the performance of the DRAM. In this paper introduce a new technique ie
adaptive technique a spare wire is used to reroute the data in cell which is damaged
Mainly talks about the traffic jams and management countermeasuresIJERA Editor
With the economic development of China's large and medium-sized cities and city scale expands unceasingly, city
traffic congestion problem is also growing, has become a bottleneck hindering the development of the city further.
At present, governance urban traffic problem is the first strategic task of traffic congestion. Congestion,
maximizing efficiency, convenient travel is to be solved.
A Survey on Constellation Based Attribute Selection Method for High Dimension...IJERA Editor
Attribute Selection is an important topic in Data Mining, because it is the effective way for reducing dimensionality, removing irrelevant data, removing redundant data, & increasing accuracy of the data. It is the process of identifying a subset of the most useful attributes that produces compatible results as the original entire set of attribute. Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group called a cluster are more similar in some sense or another to each other than to those in other groups (Clusters). There are various approaches & techniques for attribute subset selection namely Wrapper approach, Filter Approach, Relief Algorithm, Distributional clustering etc. But each of one having some disadvantages like unable to handle large volumes of data, computational complexity, accuracy is not guaranteed, difficult to evaluate and redundancy detection etc. To get the upper hand on some of these issues in attribute selection this paper proposes a technique that aims to design an effective clustering based attribute selection method for high dimensional data. Initially, attributes are divided into clusters by using graph-based clustering method like minimum spanning tree (MST). In the second step, the most representative attribute that is strongly related to target classes is selected from each cluster to form a subset of attributes. The purpose is to increase the level of accuracy, reduce dimensionality; shorter training time and improves generalization by reducing over fitting.
Semi-supervised learning approach using modified self-training algorithm to c...IJECEIAES
Burst header packet flooding is an attack on optical burst switching (OBS) network which may cause denial of service. Application of machine learning technique to detect malicious nodes in OBS network is relatively new. As finding sufficient amount of labeled data to perform supervised learning is difficult, semi-supervised method of learning (SSML) can be leveraged. In this paper, we studied the classical self-training algorithm (ST) which uses SSML paradigm. Generally, in ST, the available true-labeled data (L) is used to train a base classifier. Then it predicts the labels of unlabeled data (U). A portion from the newly labeled data is removed from U based on prediction confidence and combined with L. The resulting data is then used to re-train the classifier. This process is repeated until convergence. This paper proposes a modified self-training method (MST). We trained multiple classifiers on L in two stages and leveraged agreement among those classifiers to determine labels. The performance of MST was compared with ST on several datasets and significant improvement was found. We applied the MST on a simulated OBS network dataset and found very high accuracy with a small number of labeled data. Finally we compared this work with some related works.
EFFICIENT FEATURE SUBSET SELECTION MODEL FOR HIGH DIMENSIONAL DATAIJCI JOURNAL
This paper proposes a new method that intends on reducing the size of high dimensional dataset by
identifying and removing irrelevant and redundant features. Dataset reduction is important in the case of
machine learning and data mining. The measure of dependence is used to evaluate the relationship
between feature and target concept and or between features for irrelevant and redundant feature removal.
The proposed work initially removes all the irrelevant features and then a minimum spanning tree of
relevant features is constructed using Prim’s algorithm. Splitting the minimum spanning tree based on the
dependency between features leads to the generation of forests. A representative feature from each of the
forests is taken to form the final feature subset
Introduction to feature subset selection methodIJSRD
Data Mining is a computational progression to ascertain patterns in hefty data sets. It has various important techniques and one of them is Classification which is receiving great attention recently in the database community. Classification technique can solve several problems in different fields like medicine, industry, business, science. PSO is based on social behaviour for optimization problem. Feature Selection (FS) is a solution that involves finding a subset of prominent features to improve predictive accuracy and to remove the redundant features. Rough Set Theory (RST) is a mathematical tool which deals with the uncertainty and vagueness of the decision systems.
Intrusion Detection and Forensics based on decision tree and Association rule...IJMER
This paper present an approach based on the combination of, two techniques using
decision tree and Association rule mining for Probe attack detection. This approach proves to be
better than the traditional approach of generating rules for fuzzy expert system by clustering methods.
Association rule mining for selecting the best attributes together and decision tree for identifying the
best parameters together to create the rules for fuzzy expert system. After that rules for fuzzy expert
system are generated using association rule mining and decision trees. Decision trees is generated for
dataset and to find the basic parameters for creating the membership functions of fuzzy inference
system. Membership functions are generated for the probe attack. Based on these rules we have
created the fuzzy inference system that is used as an input to neuro-fuzzy system. Fuzzy inference
system is loaded to neuro-fuzzy toolbox as an input and the final ANFIS structure is generated for
outcome of neuro-fuzzy approach. The experiments and evaluations of the proposed method were
done with NSL-KDD intrusion detection dataset. As the experimental results, the proposed approach
based on the combination of, two techniques using decision tree and Association rule mining
efficiently detected probe attacks. Experimental results shows better results for detecting intrusions as
compared to others existing methods
Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
The clustering is a without monitoring process and one of the most common data mining techniques. The
purpose of clustering is grouping similar data together in a group, so were most similar to each other in a
cluster and the difference with most other instances in the cluster are. In this paper we focus on clustering
partition k-means, due to ease of implementation and high-speed performance of large data sets, After 30
year it is still very popular among the developed clustering algorithm and then for improvement problem of
placing of k-means algorithm in local optimal, we pose extended PSO algorithm, that its name is ECPSO.
Our new algorithm is able to be cause of exit from local optimal and with high percent produce the
problem’s optimal answer. The probe of results show that mooted algorithm have better performance
regards as other clustering algorithms specially in two index, the carefulness of clustering and the quality
of clustering.
Artificial Neural Content Techniques for Enhanced Intrusion Detection and Pre...AM Publications
This paper presents a novel approach for detecting network intrusions based on a competitive training neural
network. In the paper, the performance of this approach is compared to that of the self-organizing map (SOM), which is a
popular unsupervised training algorithm used in intrusion detection. While obtaining a similarly accurate detection rate as
the SOM does, the proposed approach uses only one forth of the computation times of the SOM. Furthermore, the
clustering result of this method is independent of the number of the initial neurons. This approach also exhibits the ability
to detect the known and unknown network attacks. The experimental results obtained by applying this approach to the
KDD-99 data set demonstrate that the proposed approach performs exceptionally in terms of both accuracy and
computation time.
A Combined Approach for Feature Subset Selection and Size Reduction for High ...IJERA Editor
selection of relevant feature from a given set of feature is one of the important issues in the field of
data mining as well as classification. In general the dataset may contain a number of features however it is not
necessary that the whole set features are important for particular analysis of decision making because the
features may share the common information‟s and can also be completely irrelevant to the undergoing
processing. This generally happen because of improper selection of features during the dataset formation or
because of improper information availability about the observed system. However in both cases the data will
contain the features that will just increase the processing burden which may ultimately cause the improper
outcome when used for analysis. Because of these reasons some kind of methods are required to detect and
remove these features hence in this paper we are presenting an efficient approach for not just removing the
unimportant features but also the size of complete dataset size. The proposed algorithm utilizes the information
theory to detect the information gain from each feature and minimum span tree to group the similar features
with that the fuzzy c-means clustering is used to remove the similar entries from the dataset. Finally the
algorithm is tested with SVM classifier using 35 publicly available real-world high-dimensional dataset and the
results shows that the presented algorithm not only reduces the feature set and data lengths but also improves the
performances of the classifier.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
With the development of database, the data volume stored in database increases rapidly and in the large
amounts of data much important information is hidden. If the information can be extracted from the
database they will create a lot of profit for the organization. The question they are asking is how to extract
this value. The answer is data mining. There are many technologies available to data mining practitioners,
including Artificial Neural Networks, Genetics, Fuzzy logic and Decision Trees. Many practitioners are
wary of Neural Networks due to their black box nature, even though they have proven themselves in many
situations. This paper is an overview of artificial neural networks and questions their position as a
preferred tool by data mining practitioners.
Testing DRAM and Correcting errorsby using Adaptive TechniqueIJERA Editor
DRAM(dynamic random access memory) is most widely used in memorytoday. Leakage power is the main
issue of DRAM cell. Iteffects the performance of the DRAM. In this paper introduce a new technique ie
adaptive technique a spare wire is used to reroute the data in cell which is damaged
Mainly talks about the traffic jams and management countermeasuresIJERA Editor
With the economic development of China's large and medium-sized cities and city scale expands unceasingly, city
traffic congestion problem is also growing, has become a bottleneck hindering the development of the city further.
At present, governance urban traffic problem is the first strategic task of traffic congestion. Congestion,
maximizing efficiency, convenient travel is to be solved.
Sustainable development in agriculture: a socio-ecological approachIJERA Editor
In this paper is presented a perspective on sustainability in agriculture - which derives from a notion of
development tied to the idea of growth - supported by technological advances aimed at ensuring sustainable
management of natural resources. In this sense, we consider here a socio-ecological approach in order to bring
together the individual and their environment, showing that this relationship is fundamental for a process of coevolution,
where nature and human being together can define the organization society.
A Mathematical Modeling Approach of the Failure Analysis for the Real-Time Me...IJERA Editor
In this paper, a simulation of the Mathematical Model for Real-Time Satellite Launch Platform approach in
Mexico is presented. Mexico holds the fourth best place in the world for building a platform to launch space
satellites, since its geographic location is optimal for its construction. It is essential to have the Probabilistic
Failure Analysis in Space Systems Engineering from its design, in order to minimize risks and avoid any
possible catastrophe. The mathematical approach of Failure Analysis presented throughout this paper, is
complementary to the simulation results, previously obtained with Windchill Quality Software. The final results
were performed with the Failure Analysis through fault trees (FTA) by means of a probabilistic approach
Quantitative Mathematical Model. This is the first step to propose and build the first Satellite Launch Platform
in Mexico.
Research on the characteristics and evaluation of nightscape along the LRT lineIJERA Editor
With people's increasing demand of nightlife activities, the nightscape has become more important than ever to
enhance the image of city. In this study, we tried to analyze the effects and influence of the landscape lighting
that produced the nightscape and pointed out that the optimal nightscapes along the LRT (Light Rail Transit)
line. We selected the urban landscapes along the LRT wayside as the research objects, used the SD (Semantic
Differential) technique to compare the difference between the daytime and the nighttime landscapes by the
vision engineering and measurement psychology. As a result, it became clear as follows: 1) image evaluation of
the nightscapes got higher estimation than that of daytime landscapes. The importance of the nightscape has been
recognized once again; 2) landscape lighting played the important role in the charming nightscape; 3) the
optimal nightscapes along the LRT routes could be chosen with the results of factor analysis.
Implementation of “Traslator Strategy” For Migration of Ipv4 to Ipv6IJERA Editor
This paper is focused on the Translator strategy for migration of IPv4 to Ipv6 implemented in Cisco packet
tracer. It describes the design and configuration of network devices and packet transfer between devices of IPv4
and IPv6 networks using NAT-PT as transition mechanism. First major version of IP, IPv4 is the dominant
protocol of internet.IPv6 is developed to deal with long anticipated problem of IPv4 running out of addresses.
The migration from IPv4 to IPv6 must be implemented node by node by using auto-configuration procedures to
eliminate the need to configure IPv6 hosts manually.
A Novel Weighted Clustering Based Approach for Improving the Wireless Sensor ...IJERA Editor
Great lifetime and reliability is the key aim of Wireless Sensor Network (WSN) design. As for prolonging
lifetime of this type of network, energy is the most important resource; all recent researches are focused on more
and more energy efficient techniques. Proposed work is Weighted Clustering Approach based on Weighted
Cluster Head Selection, which is highly energy efficient and reliable in mobile network scenario. Weight
calculation using different attributes of the nodes like SNR (Signal to Noise Ratio), Remaining Energy, Node
Degree, Mobility, and Buffer Length gives efficient Cluster Head (CH) on regular interval of time. CH rotation
helps in optimum utilization of energy available with all nodes; results in prolonged network lifetime.
Implementation is done using the NS2 network simulator and performance evaluation is carried out in terms of
PDR (Packet Delivery Ratio), End to End Delay, Throughput, and Energy Consumption. Demonstration of the
obtained results shows that proposed work is adaptable for improving the performance. In order to justify the
solution, the performance of proposed technique is compared with the performance of traditional approach. The
performance of proposed technique is found optimum as compared to the traditional techniques.
Design and development of high performance panel air filter with experimental...IJERA Editor
In automobile vehicles mostly plastic molded panel filters used for the purpose of engine air filtration. Fibrous
structured cellulose media were being used with different permeability’s according to requirement of rated air
flow rate required for the engine. To optimize the filter pleat design of automotive panel air filter, it is
important to study correlation of pressure drop, dust holding capacity & efficiency. The main role of a filter is to
provide least pressure drop with high dust holding and efficiency. A channel made for the testing of different
pleat designs. This research comprises of experimental design & evaluation of filter element with variable pleat
depth and pleat density. This assessment offers the selection of pleat design according to the performance
requirements.
Comparison of Conventional Converter with Three-Phase Single- Stage Ac-Dc PWM...IJERA Editor
The main objective of this concept is to examine the operation of fundamental buck-based three-phase singlestage
ac–dc full-bridge converters. In this concept, the operation of this fundamental converter is explained and
analyzed, and a procedure for the design of its key components is derived and demonstrated with an example
and general concluding remarks comparing Buck-Based and Boost Based three Phase single-stage AC to DC
full Bridge converters are made.
Leachate Monitoring In The Extractive Industry: A Case Study Of Nigerian Liqu...IJERA Editor
Activities of extractive industry (NLNG is typical) introduces some chemical substances into the groundwater.
These change the groundwater signature and bioaccumulation of some of these classified as hazardous may
result in various wealth challenges. Seven areas within the plant where identified by NLNG Six as high risk
pollution areas and one (Nature Park) as no pollution risk area. Groundwater samples were collected from all
seven areas and analyzed for the presence of Cu, Cr, Zn, nitrate, phosphate and PH. Samples from the no
pollution risk area served as control. Results were compared with WHO limits. Except for Cr content which was
stable, other results showed fluctuations with time, albeit on the increase, though all remained within WHO
limits. Nitrate value is fast approaching limit and requires urgent attention. Unexpected high values of the
measured parameters were observed at Nature Park (no pollution risk area) even beyond the high risk pollution
areas. This precludes NLNG activities being responsible. The necessity of pre-activity groundwater quality
assessment is thus established. Close monitoring of groundwater quality of the extractive industry zones is vital
for the protection of source quality.
Waste to Wealth; The Utilization of Scrap Tyre as Aggregate in Bituminous Mix...IJERA Editor
The problem associated with solid waste management is on the increase both in the industries, urban cities and
in the rural areas. In the United States of America, Asia and Europe, there are over hundreds of waste to wealth
combustion plants from where solid wastes are incinerated. In Nigeria, amidst the increasing importation of
vehicle tyre such plants are scarcely in existence to enhance generation of revenue from waste through the
extraction of raw material for the production of light weight aggregates, printing ink, paints, shoe polish, dry cell
and battery heads. This research paper seeks to utilize vehicle scrap tyre (VST) as aggregates in asphaltic
mixture by adopting the dry process to evaluate the effect of rubber-bitumen interaction on asphaltic concrete
properties; laboratory investigation using 4.75mm, 2.36mm and 0.600mm chunk tyre particle size modified
asphalt mixture containing 2%, 4%, 6%, 8% and 10% scrap tyre and 0% tyre content as control mixture. The
mixtures were subjected to Marshall Tests where the stability, flow, percentage air void, unit weight, void
mineral aggregate, height of specimen and specific gravity were determined. The results obtained shows that as
tyre percentage increase the stability, unit weight and specific gravity value decreases. On the other hand, as the
tyre content increases, the flow and height of specimen increases while as the tyre content increases the
percentage air void and VMA increased for 4.75mm Tyre Particle Size (TPS) and 2.36mm TPS while for
0.600mm TPS, reverse is the case. In summary and in comparism with standard specification for road
construction material, the Marshall tests conducted on the tyre modified specimens remained intact and by
interpretation; material possessing such property indicates good impact resistance when use as surface course in
flexible pavement. Conclusively, the use of 10% 4.75mm, 4% 2.36mm or 4% 0.600mm TPS by weight of
aggregate in asphaltic concrete is recommended for medium traffic volume pavement which in turn leads to a
considerable percentage of sanitation in our cities in terms of reduction in scrap tyre waste management and
waste to wealth generation.
Low Temperature Pyrolysis of Graptolite Argillite (Dictyonema Shale) in Autoc...IJERA Editor
The results of the systematic experimental study obtained in this work on the effects of temperature (340–420
°C) and exposure time (0–8h) at nominal temperature on the yield of pyrolysis products from Estonian
graptolite argillite (GA) generated in autoclaves without any solvent are described. The yields of solid residue
(SR), gas, pyrogenetic water (W) and extractable with benzenemix ofthermobitumen and oil (TBO) were
estimated. The compound groups of TBO were assessed. The highest yield of TBO, 2.18% on dry GA basis and
13.2% of organic matter (OM) was obtained at temperature of 420 °C and duration 0.5 h. The main compound
groups in TBO obtained at 400 ᵒC are polar hetero-atomic compounds and polycyclic hydrocarbons surpassing
45% and 30% of TBO. The shares of aliphatic and monocyclic hydrocarbons are below 15% of TBO. The yield
of W from GA is – about 10-15% of OM. The quantity of OM left in SR after pyrolysis is high, about 65% of
OM. The yield of pyrolysis products from GA and the composition of its TBO are compared with those obtained
under similar conditions from different oil shales: Estonian Kukersite, US Utah Green River, and Jordanian
Attarat.
This article reports on the optical analysis of Cu2+ (0.5 mol %): 59.5B2O3 – 20 TeO2 – 10CdO – 10Li2O glass.
From XRD spectrum , amorphous nature of the glass has been studied. Absorption spectrum of the copper glass
shows a broad absorption band (2B1g→2B1g) at 829 nm has been observed. Emission spectrum of Cu2+ (0.5 mol
%): 59.5B2O3 – 20 TeO2 – 10CdO – 10Li2O glass has exhibits a blue emission at 439 nm with an excitation
wavelength 379 nm.
Thermal Simulations of an Electronic System using Ansys IcepakIJERA Editor
Present electronics industry component sizes are efficiently reducing to meet the product requirement with
compact size with greater performance in compact size products resulting in different problems from thermal
prospective to bring product better performance electrically and mechanically.
In this paper we will study how to overcome the thermal problem for a product which includes components
reliability and PCB performance by using CFD thermal simulation tool (Ansys Icepak).
CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)ieijjournal1
Any abnormal activity can be assumed to be anomalies intrusion. In the literature several techniques and
algorithms have been discussed for anomaly detection. In the most of cases true positive and false positive
parameters have been used to compare their performance. However, depending upon the application a
wrong true positive or wrong false positive may have severe detrimental effects. This necessitates inclusion
of cost sensitive parameters in the performance. Moreover the most common testing dataset KDD-CUP-99
has huge size of data which intern require certain amount of pre-processing. Our work in this paper starts
with enumerating the necessity of cost sensitive analysis with some real life examples. After discussing
KDD-CUP-99 an approach is proposed for feature elimination and then features selection to reduce the
number of more relevant features directly and size of KDD-CUP-99 indirectly. From the reported
literature general methods for anomaly detection are selected which perform best for different types of
attacks. These different classifiers are clubbed to form an ensemble. A cost opportunistic technique is
suggested to allocate the relative weights to classifiers ensemble for generating the final result. The cost
sensitivity of true positive and false positive results is done and a method is proposed to select the elements
of cost sensitivity metrics for further improving the results to achieve the overall better performance. The
impact on performance trade of due to incorporating the cost sensitivity is discussed.
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETEditor IJMTER
Data mining environment produces a large amount of data that need to be analyzed.
Using traditional databases and architectures, it has become difficult to process, manage and analyze
patterns. To gain knowledge about the Big Data a proper architecture should be understood.
Classification is an important data mining technique with broad applications to classify the various
kinds of data used in nearly every field of our life. Classification is used to classify the item
according to the features of the item with respect to the predefined set of classes. This paper put a
light on various classification algorithms including j48, C4.5, Naive Bayes using large dataset.
Distributed Digital Artifacts on the Semantic WebEditor IJCATR
Distributed digital artifacts incorporate cryptographic hash values to URI called trusty URIs in a distributed environment
building good in quality, verifiable and unchangeable web resources to prevent the rising man in the middle attack. The greatest
challenge of a centralized system is that it gives users no possibility to check whether data have been modified and the communication
is limited to a single server. As a solution for this, is the distributed digital artifact system, where resources are distributed among
different domains to enable inter-domain communication. Due to the emerging developments in web, attacks have increased rapidly,
among which man in the middle attack (MIMA) is a serious issue, where user security is at its threat. This work tries to prevent MIMA
to an extent, by providing self reference and trusty URIs even when presented in a distributed environment. Any manipulation to the
data is efficiently identified and any further access to that data is blocked by informing user that the uniform location has been
changed. System uses self-reference to contain trusty URI for each resource, lineage algorithm for generating seed and SHA-512 hash
generation algorithm to ensure security. It is implemented on the semantic web, which is an extension to the world wide web, using
RDF (Resource Description Framework) to identify the resource. Hence the framework was developed to overcome existing
challenges by making the digital artifacts on the semantic web distributed to enable communication between different domains across
the network securely and thereby preventing MIMA.
A Novel Classification via Clustering Method for Anomaly Based Network Intrus...IDES Editor
Intrusion detection in the internet is an active
area of research. Intruders can be classified into two
types, namely; external intruders who are unauthorized
users of the computers they attack, and internal
intruders, who have permission to access the system but
with some restrictions. The aim of this paper is to present
a methodology to recognize attacks during the normal
activities in a system. A novel classification via sequential
information bottleneck (sIB) clustering algorithm has
been proposed to build an efficient anomaly based
network intrusion detection model. We have compared
our proposed method with other clustering algorithms
like X-Means, Farthest First, Filtered clusters, DBSCAN,
K-Means, and EM (Expectation-Maximization)
clustering in order to find the suitability of our proposed
algorithm. A subset of KDDCup 1999 intrusion detection
benchmark dataset has been used for the experiment.
Results show that the proposed method is efficient in
terms of detection accuracy, low false positive rate in
comparison to the other existing methods.
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...IJNSA Journal
Building practical and efficient intrusion detection systems in computer network is important in industrial areas today and machine learning technique provides a set of effective algorithms to detect network
intrusion. To find out appropriate algorithms for building such kinds of systems, it is necessary to evaluate various types of machine learning algorithms based on specific criteria. In this paper, we propose a novel evaluation formula which incorporates 6 indexes into our comprehensive measurement, including precision, recall, root mean square error, training time, sample complexity and practicability, in order to
find algorithms which have high detection rate, low training time, need less training samples and are easy
to use like constructing, understanding and analyzing models. Detailed evaluation process is designed to
get all necessary assessment indicators and 6 kinds of machine learning algorithms are evaluated.
Experimental results illustrate that Logistic Regression shows the best overall performance.
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...IJNSA Journal
Building practical and efficient intrusion detection systems in computer network is important in industrial areas today and machine learning technique provides a set of effective algorithms to detect network intrusion. To find out appropriate algorithms for building such kinds of systems, it is necessary to evaluate various types of machine learning algorithms based on specific criteria. In this paper, we propose a novel evaluation formula which incorporates 6 indexes into our comprehensive measurement, including precision, recall, root mean square error, training time, sample complexity and practicability, in order to find algorithms which have high detection rate, low training time, need less training samples and are easy to use like constructing, understanding and analyzing models. Detailed evaluation process is designed to get all necessary assessment indicators and 6 kinds of machine learning algorithms are evaluated. Experimental results illustrate that Logistic Regression shows the best overall performance.
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYEditor IJMTER
Data mining environment produces a large amount of data, that need to be
analyses, pattern have to be extracted from that to gain knowledge. In this new period with
rumble of data both ordered and unordered, by using traditional databases and architectures, it
has become difficult to process, manage and analyses patterns. To gain knowledge about the
Big Data a proper architecture should be understood. Classification is an important data mining
technique with broad applications to classify the various kinds of data used in nearly every
field of our life. Classification is used to classify the item according to the features of the item
with respect to the predefined set of classes. This paper provides an inclusive survey of
different classification algorithms and put a light on various classification algorithms including
j48, C4.5, k-nearest neighbor classifier, Naive Bayes, SVM etc., using random concept.
An Empirical Comparison and Feature Reduction Performance Analysis of Intrusi...ijctcm
This paper reports on the empirical evaluation of five machine learning algorithm such as J48, BayesNet, OneR, NB and ZeroR using ten performance criteria: accuracy, precision, recall, F-Measure, incorrectly classified instances, kappa statistic, mean absolute error, root mean squared error, relative absolute error, root relative squared error. The aim of this paper is to find out which classifier is better in its performance for intrusion detection system. Machine Learning is one of the methods used in the intrusion detection system (IDS).Based on this study, it can be concluded that J48 decision tree is the most suitable associated algorithm than the other four algorithms. In this paper we compared the performance of Intrusion Detection System (IDS) Classifiers using seven feature reduction techniques.
A Threshold fuzzy entropy based feature selection method applied in various b...IJMER
Large amount of data have been stored and manipulated using various database
technologies. Processing all the attributes for the particular means is the difficult task. To avoid such
difficulties, feature selection process is processed.In this paper,we are collect a eight various benchmark
datasets from UCI repository.Feature selection process is carried out using fuzzy entropy based
relevance measure algorithm and follows three selection strategies like Mean selection strategy,Half
selection strategy and Neural network for threshold selection strategy. After the features are selected,
they are evaluated using Radial Basis Function (RBF) network,Stacking,Bagging,AdaBoostM1 and Antminer
classification methodologies.The test results depicts that Neural network for threshold selection
strategy works well in selecting features and Ant-miner methodology works best in bringing out better
accuracy with selected feature than processing with original dataset.The obtained result of this
experiment shows that clearly the Ant-miner is superiority than other classifiers.Thus, this proposed Antminer
algorithm could be a more suitable method for producing good results with fewer features than
the original datasets.
N ETWORK F AULT D IAGNOSIS U SING D ATA M INING C LASSIFIERScsandit
Mobile networks are under more pressure than ever b
efore because of the increasing number of
smartphone users and the number of people relying o
n mobile data networks. With larger
numbers of users, the issue of service quality has
become more important for network operators.
Identifying faults in mobile networks that reduce t
he quality of service must be found within
minutes so that problems can be addressed and netwo
rks returned to optimised performance. In
this paper, a method of automated fault diagnosis i
s presented using decision trees, rules and
Bayesian classifiers for visualization of network f
aults. Using data mining techniques the model
classifies optimisation criteria based on the key p
erformance indicators metrics to identify
network faults supporting the most efficient optimi
sation decisions. The goal is to help wireless
providers to localize the key performance indicator
alarms and determine which Quality of
Service factors should be addressed first and at wh
ich locations
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
Analysis on different Data mining Techniques and algorithms used in IOT
1. Shweta Bhatia Int. Journal of Engineering Research and Applications www.ijera.com
ISSN: 2248-9622, Vol. 5, Issue 11, (Part - 1) November 2015, pp.82-85
www.ijera.com 82 |
P a g e
Analysis on different Data mining Techniques and algorithms
used in IOT
Shweta Bhatia1
, Sweety Patel2
1 Assistant Professor, Shri Ramkrishna Institute of Computer Education &Applied Sciences, Sarvajanik
Education Society, Athwagate, Surat, India
2 Assistant Professor, S.V.Patel College of Management and Computer Application, sumul dairy raod, surat.
Abstract
In this paper, we discusses about five functionalities of data mining in IOT that affects the performance and that
are: Data anomaly detection, Data clustering, Data classification, feature selection, time series prediction. Some
important algorithm has also been reviewed here of each functionalities that show advantages and limitations as
well as some new algorithm that are in research direction. Here we had represent knowledge view of data
mining in IOT.
I. Introduction
The Internet of Things (IOT) and its related
technologies can seamlessly integrate classical
networks with network instruments and devices. The
data in the Internet of Things can be categorized into
several types: RFID data stream, address identifiers,
descriptive data, positional data, environment data
and sensor network data etc. [1]. Today, IOT brings
the great challenges for managing, analysing and
mining data. In IOT systems, data quality
management is a critical technology to provide high-
quality and trusted data to business-level analysis,
optimization and decision making. In order to
improve quality of data, anomaly detection
techniques are widely used to remove noises and
inaccurate data. For anomaly detection, having more
data means it’s easier to detect an unusual event
against the background of normal events [3].
Data Clustering refers to grouping of data based
on specific features and its value. In IOT, Data
clustering is an intermediate step for identifying
patterns from the collected data. It’s most common
process in unsupervised machine learning. Clustering
methods are divided into 4 major categories such as:
partitioning methods, hierarchical methods, density-
based methods and grid based methods. Other
clustering techniques also exist such as: fuzzy
clustering, artificial neural network and generic
algorithms.
The problem of Data classification is stated as:
given a set of training data points along with
associated label for an unlabelled test instances.
Classification algorithm contain 2 phases: Training
phase and Testing phase. On the basis of training data
set, segmentation is done which encodes knowledge
about the structure of the groups in form of target
variable. Thus classification problem is referred to as
supervised learning.
The feature selection is the process used to
recognized pattern and allows you to identify
attributes that affect quality index the most. After
some initial level of experiment feature selection is
preferable, identify what are attributes that affects a
specific problem most and then perform data
classification, time series prediction or anomaly
detection more easily as it reduce the dimensionality
in mining the problem. Features selection is to find a
satisfactory feature subset from the candidate feature
set, so that to reach an optimal classification accuracy
and computing complexity control.
A time series is collection of temporal data
objects, which includes characteristics such as: large
data size, high dimensionality, and updating
continuously. Representation, similarity measures
and indexing are 3 components of time series task
relies on. Time series representation reduces the
dimension and it divides into 3 categories: model
based representation, non-adaptive data
representation and adaptive data representation. The
similarity measure is carried out in proper manner
such as: research directions include subsequence
matching and full subsequence matching. The
indexing of time series is linked with representation
and similar measure tools [2].
II. Anomaly detection algorithms
Anomaly detection algorithm could be either
global or local. Role of different algorithms that can
be used for IOT with data anomaly are nearest
neighbour-based anomaly detection, clustering based
anomaly detection, statistical anomaly detection, and
spectral anomaly detection.
NN based algorithms assign the anomaly score
of data occurrences relative to their neighborhood.
Nearest-neighbor (NN) based anomaly detection are
broadly used in areas such as: for finding similar
RESEARCH ARTICLE OPEN ACCESS
2. Shweta Bhatia Int. Journal of Engineering Research and Applications www.ijera.com
ISSN: 2248-9622, Vol. 5, Issue 11, (Part - 1) November 2015, pp.00-00
www.ijera.com 83 | P a g e
patches in images, HTM based applications for IT
analytics, wireless sensor networks, etc.
NN based algorithms are: 1. k-NN Global
Anomaly Score 2. Local Outlier Factor (LOF) 3.
Connectivity based Outlier Factor (COF) 4. Local
Outlier Probability (LoOP) 5. Influenced Outlierness
(INFLO) 6. Local Correlation Integral (LOCI) [5].
1. k-NN Global anomaly score: Score is the distance
to the k-th neighbor and Score is the average distance
of k neighbors. k-NN is simplest method for
classification used in data mining that indirectly
associate with IOT.
2. LOF: Local Outlier Factor: Most prominent AD
algorithm and is able to find local anomalies. Efforts
is O(n2
). For example: compute the local density:
LRDmin(p) = 1/ (∑ oԑNmin (p) reach_dist min (p,o) / |
Nmin(P) | )
Statistical anomaly detection is quickly
becoming a required capability in new world of the
IOT. This is traceable for many special cases such as:
continuous signal, discrete event timing, and user log
files. Statistical anomaly detection is that you encode
the patterns of what is normal as a probabilistic
model. From benefit point of view, probabilistic
models come with built-in measure of anomaly in
terms of determining what is anomalous and this
models come with a way of learning from observed
data that is called a training algorithm. For instance: a
rate model for web traffic anomaly detector can do a
very good at predicting traffic during late December,
even though it was trained only on last week of
November and first week of December [9].
The key property of probabilistic is the constraint that
the probability of all possible things has to sum to
one. On basis of this constraint, training algorithm
model concentrating probability around what is
normal and thus making the modelled probability of
anomalies lower.
Spectral techniques attempt to find an
approximate of data using combination of attributes
that capture bulk of variations in data. This technique
automatically perform dimensionality reduction and
suitable for handling high dimensional data sets. This
can be used with unsupervised setting. The
disadvantage of techniques are useful only if the
normal and anomalous instances are separable in
lower dimensional embedding of the data and it
contains high computational complexity.
III. Data Clustering, Classification and
Feature Selection Algorithm
For real time IOT data stream, fast density based
clustering is required which includes density based
data stream clustering. This clustering grouped as
density-grid based method and density based
microclustering method. The main advantage of
density-grid approach is its fast processing time that
is independent of the number of data points and
dependent only on number of cells. On other hand,
density based microclustering method keep summary
of clusters in microclusters and form a final clusters
from them. By using advantage of both method an
author in paper [4], proposed real experiment with
HDC (hybrid density-based clustering for data
stream)-stream algorithm. HDC-Stream only searches
in potential list and if it cannot find the suitable
microcluster, the data point is mapped to the grid,
which keeps the outlier buffer. In future, author will
focus on distributed HDC-stream density-based
clustering to improve performance in IOT.
Classification widely used algorithms for IOT
while mining a data on internet are: 1. C4.5 or
Decision tree, 2. k-nearest neighbour algorithm, 3.
support vector machine, 4. the apriori algorithm, 5.
AdaBoost algorithm. These classification algorithm
can be implemented on different types of data sets
and on basis of performance these algorithm also
used to detect the natural disasters like cloud
bursting, earth quake, etc.
1. On the basis of feature values, decision tree
classifies instances. For a given set S cases, C4.5 first
grows an initial A. tree using divide-and-conquer
algorithm as follow: tree is leaf labelled if all cases
belongs to same class S or S is small. B. Otherwise,
select test based on single attribute.
Some limitation this algorithm pertains are: Empty
branches, insignificant branches and over fitting.
Most decision tree algorithm cannot perform well
with problem that require diagonal partitioning.
2. The most common role of data mining is to find
frequent itemsets from transaction datasets and derive
association rules. Once itemsets are obtained, it’s
upfront to generate association rules. To achieve this
Apriori algorithm is helpful. This algorithm is
assumes that items within transaction or itemsets are
sorted in lexicographic order. The Apriori algorithm
generally perform in 2steps join and prune step. It
then calculates frequency only for those candidates
generated by scanning the database.
As growing pressure for classification of data in
urgency situation: data classification for breach
response, for e-discovery, for business unity as
moving towards cloud [8].
3. Shweta Bhatia Int. Journal of Engineering Research and Applications www.ijera.com
ISSN: 2248-9622, Vol. 5, Issue 11, (Part - 1) November 2015, pp.00-00
www.ijera.com 84 | P a g e
There are general 3 classes of feature selection
algorithms: 1. Filter methods, 2. Wrapper methods
and 3. Embedded methods [12].
1. To assign a scoring to each feature, filter selection
method apply a statistical measure. The attribute are
ranked by the score and either selected to be kept or
removed from the dataset. Examples of some filter
method are: chi squared test, information gain and
correlation coefficient scores.
2. Wrapper methods consider the selection of set of
attributes as a search problem. Score is assigned
based on model accuracy where combination of
features get evaluated. The search process may use
different methods such as best-first search, random
hill-climbing algorithm or heuristic. Example of
wrapper method is recursive feature elimination
algorithm.
3. When model is created, embedded methods learn
which features best contribute to the accuracy of the
model. Regularization method is most common that
introduce additional constraints into optimization of a
predictive algorithm that prejudice the model towards
lower complexity. Example of this method are:
LASSO, Elastic Net and Ridge Regression [7].
IV. Time Series Analysis and Need of
IOT, challenges and conclusion.
Deep learning algorithms could apply to IOT and
Smart city domains in time series analysis. The new
model has been proposed with temporal patterns in
deep learning namely, the Recursive Convolutional
Bayesian
Model (RCBM), which is capable of addressing
2 tasks: identification of multi-scale signatures and
mining of compositional interactions [10].By
building a layered structure of signature detectors,
where each layer is responsible for a specific time
scale, is major idea behind RCBM.
The Trendalyze Decision is also dedicated in
developing and providing an analytic platform for
search, visual discovery, and operational monitoring
of frequently occurring patterns in time series data
streams generated by IOT.
Examples of time series applications include:
capacity planning, inventory replenishment, sales
forecasting and future staffing levels.
To address the need for connecting large number
of IOT to application infrastructure a new approach is
presented called SDP developed by cloud and its
public domain project available for free.
Further challenges in IOT: 1. The most
practical applications are happening in Industrial IOT
(IIOT) are nearly limitless such as : smarter and more
efficient factories, greener energy generation, self-
regulating buildings that optimize energy
consumption, cities that adjust traffic patterns to
respond to congestion, etc. implementation will be a
challenge. 2. Security is playing vital role at all
layers in IOT on the devices. There is no threat
detection can mitigate effectively. Major challenges
in security are: ubiquitous data collection, potential
for unexpected uses of consumer data and heightened
security risks. 3. IOT is not only who owns the data
but who controls and receive access to that data.
From a consumer perspective this will be a major
challenge. 4. Shared standards and infrastructure is
complex part of IOT. Structure of hardware, sensors,
applications and devices that need to be able to
communicate geographically and across verticals. A
largest players in the market is working on
developing such standards, AT&T, CISCO, IBM and
Intel and GE are on the way to improve integration of
physical and digital world.
V. Conclusion
This paper has focused a some algorithm on all
techniques of data mining that can be applied on IOT
with their advantages and disadvantages. We had also
covered some future challenges and the integrated
approach required to fulfil needs of IOT in present
era.
REFERENCES
[1] Shen Bin , Liu Yuan* , Wang Xiaoyi*,
Ningbo Institute of Technology, Zhejiang
University Ningbo, China, College of
Management, Zhejiang University
Hangzhou, China on “Research on Data
4. Shweta Bhatia Int. Journal of Engineering Research and Applications www.ijera.com
ISSN: 2248-9622, Vol. 5, Issue 11, (Part - 1) November 2015, pp.00-00
www.ijera.com 85 | P a g e
Mining Models for the Internet of Things”
in IEEE 2010.
[2] Joshua Cooper and Anne James on
“Challenges for Database Management in
the Internet of Things” in IETE
TECHNICAL REVIEW, Researchgate.net
SEPTEMBER 2009.
[3] Feng Chen , Pan Deng, Jiafu Wan, Daqiang
Zhang,Athanasios V. Vasilakos and Xiaohui
Rong on “Data Mining for the Internet of
Things: Literature Review and Challenges”.
[4] Amineh Amini, Hadi Saboohi, Teh Ying
Wah, and Tutut Herawan on “A Fast
Density-Based Clustering Algorithm for
Real-Time Internet of Things Stream” by
Hindawi in 2014.
[5] Mennatallah Amer and Markus Goldstein on
“Nearest-Neighbor and Clustering based
Anomaly Detection Algorithms for
RapidMiner” in Researchgate.net 2012
[6] Enzo Busseti, Ian Osband, Scott Wong on
“Deep Learning for Time Series Modeling
CS 229 Final Project Report” in 2012.
[7] Pawan Gupta, Susheel Jain, Anurag Jain on
“A Review Of Fast Clustering-Based
Feature Subset Selection Algorithm” in nov-
2014.
[8] Jay Cline on “Growing pressure for data
classification” in 2007.
[9] Numenta on “The Science of Anomaly
Detection” white paper 2015.
[10] Huan-Kai Peng*, Radu Marculescu on
“Multi-Scale Compositionality: Identifying
the Compositional Structures of Social
Dynamics Using Deep Learning” in April
2015.
[11] Rosaria Silipo on “Data mining and
Predictive Analysis” in October 2014.
[12] Jason Brownlee on “An introduction to
feature selection” in 2014.