This document describes a method for improving classifier accuracy using unlabeled data in addition to a small set of labeled data. The algorithm builds an initial classifier using just the labeled data, then uses that classifier to label a larger set of unlabeled data. A new classifier is then built using both the original labeled data and the now labeled unlabeled data. Experimental results using three common learning algorithms (neural networks, Naive Bayes, C4.5) on 10 datasets show average accuracy improvements of 5%, 3%, and 8% respectively when incorporating unlabeled data. The results indicate that leveraging unlabeled data can significantly boost classifier performance when labeled data is limited.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESVikash Kumar
IMAGE CLASSIFICATION USING KNN, RANDOM FOREST AND SVM ALGORITHM ON GLAUCOMA DATASETS AND EXPLAIN THE ACCURACY, SENSITIVITY, AND SPECIFICITY OF EACH AND EVERY ALGORITHMS
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Due to continuous growth of the Internet technology, it needs to establish security mechanism. Intrusion Detection System (IDS) is increasingly becoming a crucial component for computer and network security systems. Most of the existing intrusion detection techniques emphasize on building intrusion detection model based on all features provided. Some of these features are irrelevant or redundant. This paper is proposed to identify important input features in building IDS that is computationally efficient and effective. In this paper, we identify important attributes for each attack type by analyzing the detection rate. We input the specific attributes for each attack types to classify using Naive Bayes, and Random Forest. We perform our experiments on NSL-KDD intrusion detection data set, which consists of selected records of the complete KDD Cup 1999 intrusion detection dataset.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESVikash Kumar
IMAGE CLASSIFICATION USING KNN, RANDOM FOREST AND SVM ALGORITHM ON GLAUCOMA DATASETS AND EXPLAIN THE ACCURACY, SENSITIVITY, AND SPECIFICITY OF EACH AND EVERY ALGORITHMS
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Due to continuous growth of the Internet technology, it needs to establish security mechanism. Intrusion Detection System (IDS) is increasingly becoming a crucial component for computer and network security systems. Most of the existing intrusion detection techniques emphasize on building intrusion detection model based on all features provided. Some of these features are irrelevant or redundant. This paper is proposed to identify important input features in building IDS that is computationally efficient and effective. In this paper, we identify important attributes for each attack type by analyzing the detection rate. We input the specific attributes for each attack types to classify using Naive Bayes, and Random Forest. We perform our experiments on NSL-KDD intrusion detection data set, which consists of selected records of the complete KDD Cup 1999 intrusion detection dataset.
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINERIJCSEA Journal
Comparison study of algorithms is very much required before implementing them for the needs of any
organization. The comparisons of algorithms are depending on the various parameters such as data
frequency, types of data and relationship among the attributes in a given data set. There are number of
learning and classifications algorithms are used to analyse, learn patterns and categorize data are
available. But the problem is the one to find the best algorithm according to the problem and desired
output. The desired result has always been higher accuracy in predicting future values or events from the
given dataset. Algorithms taken for the comparisons study are Neural net, SVM, Naïve Bayes, BFT and
Decision stump. These top algorithms are most influential data mining algorithms in the research
community. These algorithms have been considered and mostly used in the field of knowledge discovery
and data mining.
Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...IJERA Editor
This paper proposes the Rainfall Prediction System by using classification technique. The advanced and modified neural network called Data Core Based Fuzzy Min Max Neural Network (DCFMNN) is used for pattern classification. This classification method is applied to predict Rainfall. The neural network called fuzzy min max neural network (FMNN) that creates hyperboxes for classification and predication, has a problem of overlapping neurons that resoled in DCFMNN to give greater accuracy. This system is composed of forming of hyperboxes, and two kinds of neurons called as Overlapping Neurons and Classifying neurons, and classification used for prediction. For each kind of hyperbox its data core and geometric center of data is calculated. The advantage of this method is it gives high accuracy and strong robustness. According to evaluation results we can say that this system gives better prediction of rainfall and classification tool in real environment.
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSEditor IJCATR
This paper presents a hybrid data mining approach based on supervised learning and unsupervised learning to identify the closest data patterns in the data base. This technique enables to achieve the maximum accuracy rate with minimal complexity. The proposed algorithm is compared with traditional clustering and classification algorithm and it is also implemented with multidimensional datasets. The implementation results show better prediction accuracy and reliability.
Mobile Network Coverage Determination at 900MHz for Abuja Rural Areas using A...ijtsrd
This study proposes Artificial Neural Network ANN based field strength prediction models for the rural areas of Abuja, the federal capital territory of Nigeria. The ANN based models were created on bases of the Generalized Regression Neural network GRNN and the Multi Layer Perceptron Neural Network MLP NN . These networks were created, trained and tested for field strength prediction using received power data recorded at 900MHz from multiple Base Transceiver Stations BTSs distributed across the rural areas. Results indicate that the GRNN and MLP NN based models with Root Mean Squared Error RMSE values of 4.78dBm and 5.56dBm respectively, offer significant improvement over the empirical Hata Okumura counterpart, which overestimates the signal strength by an RMSE value of 20.17dBm. Deme C. Abraham ""Mobile Network Coverage Determination at 900MHz for Abuja Rural Areas using Artificial Neural Networks"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-2 , February 2020,
URL: https://www.ijtsrd.com/papers/ijtsrd30228.pdf
Paper Url : https://www.ijtsrd.com/computer-science/artificial-intelligence/30228/mobile-network-coverage-determination-at-900mhz-for-abuja-rural-areas-using-artificial-neural-networks/deme-c-abraham
We propose an algorithm for training Multi Layer Preceptrons for classification problems, that we named Hidden Layer Learning Vector Quantization (H-LVQ). It consists of applying Learning Vector Quantization to the last hidden layer of a MLP and it gave very successful results on problems containing a large number of correlated inputs. It was applied with excellent results on classification of Rurtherford
backscattering spectra and on a benchmark problem of image recognition. It may also be used for efficient feature extraction.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...MLAI2
While tasks could come with varying the number of instances and classes in realistic settings, the existing meta-learning approaches for few-shot classification assume that number of instances per task and class is fixed. Due to such restriction, they learn to equally utilize the meta-knowledge across all the tasks, even when the number of instances per task and class largely varies. Moreover, they do not consider distributional difference in unseen tasks, on which the meta-knowledge may have less usefulness depending on the task relatedness. To overcome these limitations, we propose a novel meta-learning model that adaptively balances the effect of the meta-learning and task-specific learning within each task. Through the learning of the balancing variables, we can decide whether to obtain a solution by relying on the meta-knowledge or task-specific learning. We formulate this objective into a Bayesian inference framework and tackle it using variational inference. We validate our Bayesian Task-Adaptive Meta-Learning (Bayesian TAML) on two realistic task- and class-imbalanced datasets, on which it significantly outperforms existing meta-learning approaches. Further ablation study confirms the effectiveness of each balancing component and the Bayesian learning framework.
Deep learning algorithms have drawn the attention of researchers working in the field of computer vision, speech
recognition, malware detection, pattern recognition and natural language processing. In this paper, we present an overview of
deep learning techniques like Convolutional neural network, deep belief network, Autoencoder, Restricted Boltzmann machine
and recurrent neural network. With this, current work of deep learning algorithms on malware detection is shown with the
help of literature survey. Suggestions for future research are given with full justification. We also showed the experimental
analysis in order to show the importance of deep learning techniques.
Abstract—Classical machine learning techniques have been employed severally in intrusion detection. But due to the rising cases and sophistication of attacks, more advanced machine learning techniques including ensemble-based methods, neural networks and deep learning techniques have been applied. However, there is still need for improved machine learning approach to detect attacks more effectively and efficiently. Stacked generalization approach has been shown to be capable of learning from features and meta-features but has been limited by the deficiencies of base classifiers and lack of optimization in the choice of meta-feature combination. This paper therefore proposes a stacked generalization ensemble approach based on two-tier meta-learner, in which the outputs of classical stacked ensemble are passed to multi-feature-based stacked ensemble, which is optimized. A Grid-search approach is used for the optimization. Nine data features and four meta-features derived from Logistic Regression, Support Vector Machine, Naïve Bayes, and Multilayer Perceptron neural network are used for the machine learning classification task. By applying neural networks as the meta-learner for the classification of NSL-KDD data, improved performances in terms of accuracy, precision, recall and F-measure of 0.97, 0.98, 0.98 and 0.98, respectively are achieved.
International Journal of Computer Science and Information Security,IJCSIS ISSN 1947-5500, Pittsburgh, PA, USA
Email: ijcsiseditor@gmail.com
http://sites.google.com/site/ijcsis/
https://google.academia.edu/JournalofComputerScience
https://www.linkedin.com/in/ijcsis-research-publications-8b916516/
http://www.researcherid.com/rid/E-1319-2016
Task Adaptive Neural Network Search with Meta-Contrastive LearningMLAI2
Most conventional Neural Architecture Search (NAS) approaches are limited in that they only generate architectures without searching for the optimal parameters. While some NAS methods handle this issue by utilizing a supernet trained on a large-scale dataset such as ImageNet, they may be suboptimal if the target tasks are highly dissimilar from the dataset the supernet is trained on. To address such limitations, we introduce a novel problem of Neural Network Search (NNS), whose goal is to search for the optimal pretrained network for a novel dataset and constraints (e.g. number of parameters), from a model zoo. Then, we propose a novel framework to tackle the problem, namely Task-Adaptive Neural Network Search (TANS). Given a model-zoo that consists of network pretrained on diverse datasets, we use a novel amortized meta-learning framework to learn a cross-modal latent space with contrastive loss, to maximize the similarity between a dataset and a high-performing network on it, and minimize the similarity between irrelevant dataset-network pairs. We validate the effectiveness and efficiency of our method on ten real-world datasets, against existing NAS/AutoML baselines. The results show that our method instantly retrieves networks that outperform models obtained with the baselines with significantly fewer training steps to reach the target performance, thus minimizing the total cost of obtaining a task-optimal network. Our code and the model-zoo are available at https://anonymous.4open.science/r/TANS-33D6
A Novel Classification via Clustering Method for Anomaly Based Network Intrus...IDES Editor
Intrusion detection in the internet is an active
area of research. Intruders can be classified into two
types, namely; external intruders who are unauthorized
users of the computers they attack, and internal
intruders, who have permission to access the system but
with some restrictions. The aim of this paper is to present
a methodology to recognize attacks during the normal
activities in a system. A novel classification via sequential
information bottleneck (sIB) clustering algorithm has
been proposed to build an efficient anomaly based
network intrusion detection model. We have compared
our proposed method with other clustering algorithms
like X-Means, Farthest First, Filtered clusters, DBSCAN,
K-Means, and EM (Expectation-Maximization)
clustering in order to find the suitability of our proposed
algorithm. A subset of KDDCup 1999 intrusion detection
benchmark dataset has been used for the experiment.
Results show that the proposed method is efficient in
terms of detection accuracy, low false positive rate in
comparison to the other existing methods.
A novel ensemble modeling for intrusion detection system IJECEIAES
Vast increase in data through internet services has made computer systems more vulnerable and difficult to protect from malicious attacks. Intrusion detection systems (IDSs) must be more potent in monitoring intrusions. Therefore an effectual Intrusion Detection system architecture is built which employs a facile classification model and generates low false alarm rates and high accuracy. Noticeably, IDS endure enormous amounts of data traffic that contain redundant and irrelevant features, which affect the performance of the IDS negatively. Despite good feature selection approaches leads to a reduction of unrelated and redundant features and attain better classification accuracy in IDS. This paper proposes a novel ensemble model for IDS based on two algorithms Fuzzy Ensemble Feature selection (FEFS) and Fusion of Multiple Classifier (FMC). FEFS is a unification of five feature scores. These scores are obtained by using feature-class distance functions. Aggregation is done using fuzzy union operation. On the other hand, the FMC is the fusion of three classifiers. It works based on Ensemble decisive function. Experiments were made on KDD cup 99 data set have shown that our proposed system works superior to well-known methods such as Support Vector Machines (SVMs), K-Nearest Neighbor (KNN) and Artificial Neural Networks (ANNs). Our examinations ensured clearly the prominence of using ensemble methodology for modeling IDSs, and hence our system is robust and efficient.
2a Enquesta "Perfil social del blocaire ebrenc"Daniel Gil
En: EbreBloc 2009, 3a trobada de Blocaires Ebrencs (Tortosa i Roquetes, 6 i 7 de febrer de 2009)
Enquesta que recull les principals característiques socials dels blocaires de les Terres de l'Ebre
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINERIJCSEA Journal
Comparison study of algorithms is very much required before implementing them for the needs of any
organization. The comparisons of algorithms are depending on the various parameters such as data
frequency, types of data and relationship among the attributes in a given data set. There are number of
learning and classifications algorithms are used to analyse, learn patterns and categorize data are
available. But the problem is the one to find the best algorithm according to the problem and desired
output. The desired result has always been higher accuracy in predicting future values or events from the
given dataset. Algorithms taken for the comparisons study are Neural net, SVM, Naïve Bayes, BFT and
Decision stump. These top algorithms are most influential data mining algorithms in the research
community. These algorithms have been considered and mostly used in the field of knowledge discovery
and data mining.
Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...IJERA Editor
This paper proposes the Rainfall Prediction System by using classification technique. The advanced and modified neural network called Data Core Based Fuzzy Min Max Neural Network (DCFMNN) is used for pattern classification. This classification method is applied to predict Rainfall. The neural network called fuzzy min max neural network (FMNN) that creates hyperboxes for classification and predication, has a problem of overlapping neurons that resoled in DCFMNN to give greater accuracy. This system is composed of forming of hyperboxes, and two kinds of neurons called as Overlapping Neurons and Classifying neurons, and classification used for prediction. For each kind of hyperbox its data core and geometric center of data is calculated. The advantage of this method is it gives high accuracy and strong robustness. According to evaluation results we can say that this system gives better prediction of rainfall and classification tool in real environment.
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSEditor IJCATR
This paper presents a hybrid data mining approach based on supervised learning and unsupervised learning to identify the closest data patterns in the data base. This technique enables to achieve the maximum accuracy rate with minimal complexity. The proposed algorithm is compared with traditional clustering and classification algorithm and it is also implemented with multidimensional datasets. The implementation results show better prediction accuracy and reliability.
Mobile Network Coverage Determination at 900MHz for Abuja Rural Areas using A...ijtsrd
This study proposes Artificial Neural Network ANN based field strength prediction models for the rural areas of Abuja, the federal capital territory of Nigeria. The ANN based models were created on bases of the Generalized Regression Neural network GRNN and the Multi Layer Perceptron Neural Network MLP NN . These networks were created, trained and tested for field strength prediction using received power data recorded at 900MHz from multiple Base Transceiver Stations BTSs distributed across the rural areas. Results indicate that the GRNN and MLP NN based models with Root Mean Squared Error RMSE values of 4.78dBm and 5.56dBm respectively, offer significant improvement over the empirical Hata Okumura counterpart, which overestimates the signal strength by an RMSE value of 20.17dBm. Deme C. Abraham ""Mobile Network Coverage Determination at 900MHz for Abuja Rural Areas using Artificial Neural Networks"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-2 , February 2020,
URL: https://www.ijtsrd.com/papers/ijtsrd30228.pdf
Paper Url : https://www.ijtsrd.com/computer-science/artificial-intelligence/30228/mobile-network-coverage-determination-at-900mhz-for-abuja-rural-areas-using-artificial-neural-networks/deme-c-abraham
We propose an algorithm for training Multi Layer Preceptrons for classification problems, that we named Hidden Layer Learning Vector Quantization (H-LVQ). It consists of applying Learning Vector Quantization to the last hidden layer of a MLP and it gave very successful results on problems containing a large number of correlated inputs. It was applied with excellent results on classification of Rurtherford
backscattering spectra and on a benchmark problem of image recognition. It may also be used for efficient feature extraction.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...MLAI2
While tasks could come with varying the number of instances and classes in realistic settings, the existing meta-learning approaches for few-shot classification assume that number of instances per task and class is fixed. Due to such restriction, they learn to equally utilize the meta-knowledge across all the tasks, even when the number of instances per task and class largely varies. Moreover, they do not consider distributional difference in unseen tasks, on which the meta-knowledge may have less usefulness depending on the task relatedness. To overcome these limitations, we propose a novel meta-learning model that adaptively balances the effect of the meta-learning and task-specific learning within each task. Through the learning of the balancing variables, we can decide whether to obtain a solution by relying on the meta-knowledge or task-specific learning. We formulate this objective into a Bayesian inference framework and tackle it using variational inference. We validate our Bayesian Task-Adaptive Meta-Learning (Bayesian TAML) on two realistic task- and class-imbalanced datasets, on which it significantly outperforms existing meta-learning approaches. Further ablation study confirms the effectiveness of each balancing component and the Bayesian learning framework.
Deep learning algorithms have drawn the attention of researchers working in the field of computer vision, speech
recognition, malware detection, pattern recognition and natural language processing. In this paper, we present an overview of
deep learning techniques like Convolutional neural network, deep belief network, Autoencoder, Restricted Boltzmann machine
and recurrent neural network. With this, current work of deep learning algorithms on malware detection is shown with the
help of literature survey. Suggestions for future research are given with full justification. We also showed the experimental
analysis in order to show the importance of deep learning techniques.
Abstract—Classical machine learning techniques have been employed severally in intrusion detection. But due to the rising cases and sophistication of attacks, more advanced machine learning techniques including ensemble-based methods, neural networks and deep learning techniques have been applied. However, there is still need for improved machine learning approach to detect attacks more effectively and efficiently. Stacked generalization approach has been shown to be capable of learning from features and meta-features but has been limited by the deficiencies of base classifiers and lack of optimization in the choice of meta-feature combination. This paper therefore proposes a stacked generalization ensemble approach based on two-tier meta-learner, in which the outputs of classical stacked ensemble are passed to multi-feature-based stacked ensemble, which is optimized. A Grid-search approach is used for the optimization. Nine data features and four meta-features derived from Logistic Regression, Support Vector Machine, Naïve Bayes, and Multilayer Perceptron neural network are used for the machine learning classification task. By applying neural networks as the meta-learner for the classification of NSL-KDD data, improved performances in terms of accuracy, precision, recall and F-measure of 0.97, 0.98, 0.98 and 0.98, respectively are achieved.
International Journal of Computer Science and Information Security,IJCSIS ISSN 1947-5500, Pittsburgh, PA, USA
Email: ijcsiseditor@gmail.com
http://sites.google.com/site/ijcsis/
https://google.academia.edu/JournalofComputerScience
https://www.linkedin.com/in/ijcsis-research-publications-8b916516/
http://www.researcherid.com/rid/E-1319-2016
Task Adaptive Neural Network Search with Meta-Contrastive LearningMLAI2
Most conventional Neural Architecture Search (NAS) approaches are limited in that they only generate architectures without searching for the optimal parameters. While some NAS methods handle this issue by utilizing a supernet trained on a large-scale dataset such as ImageNet, they may be suboptimal if the target tasks are highly dissimilar from the dataset the supernet is trained on. To address such limitations, we introduce a novel problem of Neural Network Search (NNS), whose goal is to search for the optimal pretrained network for a novel dataset and constraints (e.g. number of parameters), from a model zoo. Then, we propose a novel framework to tackle the problem, namely Task-Adaptive Neural Network Search (TANS). Given a model-zoo that consists of network pretrained on diverse datasets, we use a novel amortized meta-learning framework to learn a cross-modal latent space with contrastive loss, to maximize the similarity between a dataset and a high-performing network on it, and minimize the similarity between irrelevant dataset-network pairs. We validate the effectiveness and efficiency of our method on ten real-world datasets, against existing NAS/AutoML baselines. The results show that our method instantly retrieves networks that outperform models obtained with the baselines with significantly fewer training steps to reach the target performance, thus minimizing the total cost of obtaining a task-optimal network. Our code and the model-zoo are available at https://anonymous.4open.science/r/TANS-33D6
A Novel Classification via Clustering Method for Anomaly Based Network Intrus...IDES Editor
Intrusion detection in the internet is an active
area of research. Intruders can be classified into two
types, namely; external intruders who are unauthorized
users of the computers they attack, and internal
intruders, who have permission to access the system but
with some restrictions. The aim of this paper is to present
a methodology to recognize attacks during the normal
activities in a system. A novel classification via sequential
information bottleneck (sIB) clustering algorithm has
been proposed to build an efficient anomaly based
network intrusion detection model. We have compared
our proposed method with other clustering algorithms
like X-Means, Farthest First, Filtered clusters, DBSCAN,
K-Means, and EM (Expectation-Maximization)
clustering in order to find the suitability of our proposed
algorithm. A subset of KDDCup 1999 intrusion detection
benchmark dataset has been used for the experiment.
Results show that the proposed method is efficient in
terms of detection accuracy, low false positive rate in
comparison to the other existing methods.
A novel ensemble modeling for intrusion detection system IJECEIAES
Vast increase in data through internet services has made computer systems more vulnerable and difficult to protect from malicious attacks. Intrusion detection systems (IDSs) must be more potent in monitoring intrusions. Therefore an effectual Intrusion Detection system architecture is built which employs a facile classification model and generates low false alarm rates and high accuracy. Noticeably, IDS endure enormous amounts of data traffic that contain redundant and irrelevant features, which affect the performance of the IDS negatively. Despite good feature selection approaches leads to a reduction of unrelated and redundant features and attain better classification accuracy in IDS. This paper proposes a novel ensemble model for IDS based on two algorithms Fuzzy Ensemble Feature selection (FEFS) and Fusion of Multiple Classifier (FMC). FEFS is a unification of five feature scores. These scores are obtained by using feature-class distance functions. Aggregation is done using fuzzy union operation. On the other hand, the FMC is the fusion of three classifiers. It works based on Ensemble decisive function. Experiments were made on KDD cup 99 data set have shown that our proposed system works superior to well-known methods such as Support Vector Machines (SVMs), K-Nearest Neighbor (KNN) and Artificial Neural Networks (ANNs). Our examinations ensured clearly the prominence of using ensemble methodology for modeling IDSs, and hence our system is robust and efficient.
2a Enquesta "Perfil social del blocaire ebrenc"Daniel Gil
En: EbreBloc 2009, 3a trobada de Blocaires Ebrencs (Tortosa i Roquetes, 6 i 7 de febrer de 2009)
Enquesta que recull les principals característiques socials dels blocaires de les Terres de l'Ebre
Classification Of Iris Plant Using Feedforward Neural Networkirjes
The classification and recognition of type on the basis of individual features and behaviors constitute
a preliminary measure and is an important target in the behavioral sciences. Current statistical methods do not
always yield satisfactory answers. A Feed Forward Artificial Neural Network is the computer model inspired by
the structure of the Human Brain. It views as in the set of artificial nerve cells that are interconnected with the
other neurons. The primary aim of this paper is to demonstrate the process of developing the Artificial Neural
network based classifier which classifies the Iris database. The problem concerns the identification of Iris plant
species on the basis of plant attribute measurements. This paper is related to the use of feed forward neural
networks towards the identification of iris plants on the basis of the following measurements: sepal length, sepal
width, petal length, and petal width. Using this data set a Neural Network (NN) is used for the classification of
iris data set. The EBPA is used for training of this ANN. The results of simulations illustrate the effectiveness of
the neural system in iris class identification.
Performance Evaluation of Classifiers used for Identification of Encryption A...IDES Editor
Evaluating classifier performance is a critical
problem in pattern recognition and machine learning. In this
paper pattern recognition techniques were applied to identify
encryption algorithms. Four different block cipher algorithms
were considered, DES, IDEA, AES, and RC2 operating in
(Electronic Codebook) ECB mode. Eight different classification
techniques were used for this purpose, these are: Naïve
Bayesian (NB), Support Vector Machine (SVM), neural
network (MLP), Instance based learning (IBL), Bagging (Ba),
AdaBoostM1 (MdaBM1), Rotation Forest (RoFo), and Decision
Tree (C4. 5). The result shows that using pattern recognition
is a useful technique to identify the encryption algorithm,
and according to our simulation using one encryption of key
provide better classification than using different keys.
Furthermore, increase the number of the input files will
improve the accuracy.
ntegrating Knowledge Bases with Neural Networks - by Nick Powell:
Knowledge bases are used as the under-pinning for reasoning systems. This talk will describe experiences using deep learning to facilitate knowledge base completion. With an existing knowledge base as a training set, we programmed the neural net as a binary classifier to find likely relationships and then insert them back into the graph. We'll describe lessons learned and next steps.
Predicting rainfall using ensemble of ensemblesVarad Meru
The Paper was done in a group of three for the class project of CS 273: Introduction to Machine Learning at UC Irvine. The group members were Prolok Sundaresan, Varad Meru, and Prateek Jain.
Regression is an approach for modeling the relationship between data X and the dependent variable y. In this report, we present our experiments with multiple approaches, ranging from Ensemble of Learning to Deep Learning Networks on the weather modeling data to predict the rainfall. The competition was held on the online data science competition portal ‘Kaggle’. The results for weighted ensemble of learners gave us a top-10 ranking, with the testing root-mean-squared error being 0.5878.
Analysis of Classification Algorithm in Data Miningijdmtaiir
Data Mining is the extraction of hidden predictive
information from large database. Classification is the process
of finding a model that describes and distinguishes data classes
or concept. This paper performs the study of prediction of class
label using C4.5 and Naïve Bayesian algorithm.C4.5 generates
classifiers expressed as decision trees from a fixed set of
examples. The resulting tree is used to classify future samples
.The leaf nodes of the decision tree contain the class name
whereas a non-leaf node is a decision node. The decision node
is an attribute test with each branch (to another decision tree)
being a possible value of the attribute. C4.5 uses information
gain to help it decide which attribute goes into a decision node.
A Naïve Bayesian classifier is a simple probabilistic classifier
based on applying Baye’s theorem with strong (naive)
independence assumptions. Naive Bayesian classifier assumes
that the effect of an attribute value on a given class is
independent of the values of the other attribute. This
assumption is called class conditional independence. The
results indicate that Predicting of class label using Naïve
Bayesian classifier is very effective and simple compared to
C4.5 classifier
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...IAEME Publication
This paper presents an approach based on applying an aggregated predictor formed by multiple versions of a multilayer neural network with a back-propagation optimization algorithm for helping the engineer to get a list of the most appropriate well-test interpretation models for a given set of pressure/ production data. The proposed method consists of three stages: (1) data decorrelation through principal component analysis to reduce the covariance between the variables and the dimension of the input layer in the artificial neural network, (2) bootstrap replicates of the learning set where the data is repeatedly sampled with a random split of the data into train sets and using these as new learning sets, and (3) automatic reservoir model identification through aggregated predictor formed by a plurality vote when predicting a new class. This method is described in detail to ensure successful replication of results. The required training and test dataset were generated by using analytical solution models. In our case, there were used 600 samples: 300 for training, 100 for cross-validation, and 200 for testing. Different network structures were tested during this study to arrive at optimum network design. We notice that the single net methodology always brings about confusion in selecting the correct model even though the training results for the constructed networks are close to 1. We notice also that the principal component analysis is an effective strategy in reducing the number of input features, simplifying the network structure, and lowering the training time of the ANN. The results obtained show that the proposed model provides better performance when predicting new data with a coefficient of correlation approximately equal to 95% Compared to a previous approach 80%, the combination of the PCA and ANN is more stable and determine the more accurate results with lesser computational complexity than was feasible previously. Clearly, the aggregated predictor is more stable and shows less bad classes compared to the previous approach.
The International Journal of Engineering and Science (The IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
This work is proposed the feed forward neural network with symmetric table addition method to design the
neuron synapses algorithm of the sine function approximations, and according to the Taylor series
expansion. Matlab code and LabVIEW are used to build and create the neural network, which has been
designed and trained database set to improve its performance, and gets the best a global convergence with
small value of MSE errors and 97.22% accuracy.
Robust Fault-Tolerant Training Strategy Using Neural Network to Perform Funct...Eswar Publications
This paper is intended to introduce an efficient as well as robust training mechanism for a neural network which can be used for testing the functionality of software. The traditional setup of neural network architecture is used constituting the two phases -training phase and evaluation phase. The input test cases are to be trained in first phase and consequently they behave like normal test cases to predict the output as untrained test cases. The test oracle measures the deviation between the outputs of untrained test cases with trained test cases and authorizes a final decision. Our framework can be applied to systems where number of test cases outnumbers the
functionalities or the system under test is too complex. It can also be applied to the test case development when the modules of a system become tedious after modification.
Improving Classifier Accuracy using Unlabeled Data..doc
1. IMPROVING CLASSIFIER ACCURACY USING UNLABELED DATA
Thamar I. Solorio Olac Fuentes
Department of Computer Science
Instituto Nacional de Astrofísica, Óptica y Electrónica
Luis Enrique Erro #1
Santa María Tonantzintla, Puebla, México
ABSTRACT Thus the question is: can we take advantage of the large
This paper describes an algorithm for improving classifier pool of unlabeled data? It would be extremely useful if we
accuracy using unlabeled data. This is of practical could find an algorithm that allowed improving
significance given the high cost of obtaining labeled data, classification accuracy when the labeled data are
and the large pool of unlabeled data readily available. The insufficient. This is the problem addressed in this paper.
algorithm consists of building a classifier using a very We evaluated the impact of incorporating unlabeled data
small set of previously labeled data, then classifying a to the learning process using several learning algorithms.
larger set of unlabeled data using that classifier, and Experimental results show that the classifiers trained with
finally building a new classifier using a combined data set labeled and unlabeled data are more accurate than the
containing the original set of labeled data and the set of ones trained with labeled data only. This is the result of
previously unlabeled data. The algorithm proposed here the overall averages from ten learning tasks.
was implemented using three well known learning
algorithms: feedforward neural networks trained with Even though the interest in learning algorithms that use
Backpropagation, the Naive Bayes Classifier and the C4.5 unlabeled data is recent, several methods have been
rule induction algorithm as base learning algorithms. proposed. Blum and Mitchell proposed a method for
Preliminary experimental results using 10 datasets from combining labeled and unlabeled data called co-training
the UCI repository show that using unlabeled data [1]. This method is targeted to a particular type of
improves the classification accuracy by 5% on average problem: classification where the examples can naturally
and that for 80% of the experiments the use of unlabeled be described using several different types of information.
data results in an improvement in the classifier's accuracy. In other words, an instance can be classified using
different subsets of the attributes describing that instance.
1. INTRODUCTION Basically, the co-training algorithm is this: two weak
classifiers are built, each one using different kind of
One of the problems addressed by machine learning is information, then, bootstrap from these classifiers using
that of data classification. Since the 1960’s, many unlabeled data. They focused on the problem of web-page
algorithms for data classification have been proposed. classification where each example can be classified using
However, all learning algorithms suffer the same the words contained in that page or using the links that
weakness: when the training set is small the classifier point to that page.
accuracy is low. Thus, these algorithms can become an
impractical solution due to the need of a very large Nigam et al. proposed a different approach, where a
training set. In many domains, unlabeled data are readily theoretical argument is presented showing that useful
available, but manual labeling is time-consuming, difficult information about the target function can be extracted
or even impossible. For example, there are millions of text from unlabeled data [2]. The algorithm learns to classify
documents available on the world-wide web, but, for the text from labeled and unlabeled documents. The idea in
vast majority, a label indicating their topic is not Nigam’s approach was to combine the Expectation
available. Another example is character recognition: Maximization algorithm (EM) with the Naive Bayes
gathering examples with handwritten characters is easy, classifier. They report an error reduction of up to 30%. In
but manual labeling each character is a tedious task. In this work we extended this approach, incorporating
astronomy, something similar occurs, thousands of spectra unlabeled data to three different learning algorithms, and
per night can be obtained with an automated telescope, evaluate it using several data sets form the UCI
but an astronomer needs several minutes to manually Repository [3].
classify each spectrum.
Unlabeled data have also been used for improving the
performance of artificial neural networks. Fardanesh and
Okan used the backpropagation algorithm, and the results
2. show that the classifier error can be decreased using The Naive Bayes classifier is a probabilistic algorithm
unlabeled data in some problem domains [4]. based on the simplifying assumption that the attribute
values are conditionally independent given the target
The paper is organized as follows: the next section values. Even though we know that in practice this
presents the learning algorithms. Section 3 describes how assumption does not hold, the algorithm's performance
unlabeled data are incorporated to the classifier's training. has been shown to be comparable to that of neural
Section 4 presents experimental results that compare the networks in some domains [6,7]. The Naive Bayes
performance of the algorithms trained using labeled and classifier applies to learning tasks where each instance x
unlabeled data to those obtained by the classifiers trained can be described as a tuple of attribute values <a1, a2, …
with labeled data only. Finally, some conclusions and an> and the target function f(x) can take on any value
directions for future work are presented. from a finite set V.
When a new instance x is presented, the Naive Bayes
2. LEARNING ALGORITHMS classifier assigns to it the most probable target value by
applying this rule:
Experiments in this work were made with three of the
most successful classification learning algorithms: F(x)=argmaxvj∈VP(vi)∏IP(ai|vj)
feedforward neural networks trained with
backpropagation, the C4.5 learning algorithm [5] and the To summarize, the learning task of the Naive Bayes is to
Naive Bayes classifier. build a hypothesis by estimating the different P(vi) and
P(ai|vj) terms based on their frequencies over the
2.1 Backpropagation and Feedforward training data.
Neural Networks
For problems involving real-valued attributes, Artificial
2.3 The C4.5 Algorithm
Neural Networks (ANNs) are among the most effective
learning methods currently known. Algorithms such as C4.5 is an extension to the decision-tree learning
Backpropagation use gradient descent or other algorithm ID3 [8]. Only a brief description of the method
optimization algorithm to tune network parameters to best is given here, more information can be found in [5]. The
fit a training set of input-output pairs. The algorithm consists of the following steps:
Backpropagation algorithm was applied in this work to a 1. Build the decision tree form the training set
feedforward network containing two layers of sigmoidal (conventional ID3).
units. 2. Convert the resulting tree into an equivalent set of
rules. The number of rules is equivalent to the
number of possible paths from the root to a leaf
node.
X1 HIDDEN LAYER 3. Prune each rule by removing any preconditions that
result in improving its accuracy, according to a
X2 validation set.
4. Sort the pruned rules in descending order according
to their accuracy, and consider them in this sequence
when classifying subsequent instances.
X3
Since the learning tasks used to evaluate this work involve
nominal and numeric values, we implemented the version
X4 WHO of C4.5 that incorporates continuous values.
WIH 3. INCORPORATING UNLABELED DATA
Figure 1. Representation of a feedforward neural network
with one hidden layer. The algorithm for combining labeled and unlabeled data is
described in this section. In the three learning algorithms
we apply this same procedure. First, the data set is divided
2.2 Naive Bayes Classifier randomly into several groups, one of these groups is
considered with its original classifications as the training
set, another group is separated as the test set and the
3. remaining data are the unlabeled examples. A classifier C1
is built using the training set and the learning algorithm As we can see in the three figures, the algorithm that
L1. Then, we use C1 to classify the unlabeled examples. shows the largest improvement with the incorporation of
With the labels assigned by C1 we merge both sets into unlabeled data is C4.5. In the ten learning tasks C4.5
one training set to build a final classifier C2. Finally, the presented an improvement average of 8% while the
test data are classified using C2. improvement averages for neural networks and Naive
Bayes were 5% and 3% respectively. Table 1 summarizes
The process described above was carried out ten times the results obtained in these experiments.
with each learning task and the overall averages are the
results described in the next section. 5. CONCLUSIONS AND FUTURE WORK
We have shown how learning from small sets of labeled
4. EXPERIMENTAL RESULTS training data can be improved upon with the use of larger
sets of unlabeled data. Our experimental results using
We used the following dataset form the UCI repository: several training sets and three different learning
wine, glass, chess, breast cancer, lymphography, balloons, algorithms show that for the vast majority of the cases,
thyroid disease, tic-tac-toe, ionosphere and iris. Figure 2 using unlabeled data improves the quality of the
compares the performance of C4.5 trained using the predictions made by the algorithms. This is of practical
labeled data only with the same algorithm using both significance in domains where unlabeled data are readily
labeled and unlabeled data as described in the previous available, but manual labeling may be time-consuming,
section. One point is plotted for each of the ten learning difficult or impractical. Present and future work includes:
tasks taken from the Irving repository of machine learning • Applying this methodology using ensembles of
datasets [2]. We can see that most points lie above the classifiers, where presumably the labeling of the
dotted line, which indicates that the error rate of the C4.5 unlabeled data and thus the final classifications
classifier trained with labeled and unlabeled data is assigned by the algorithm can be made more
smaller than the error of C4.5 trained with labeled data accurate.
only. Similarly, Figure 3 compares the performance of the • Experimental studies to characterize situations in
Naive Bayes classifier trained using labeled and unlabeled which this approach is not applicable. It is clear that
data to that obtained using only labeled data. Again, a when the set of labeled examples is large enough or
lower degree of error can be attained incorporating when the pseudo-labels can not be assigned
unlabeled data. Finally, Figure 4 shows the performance accurately, the use of unlabeled data can not improve
comparison of incorporating unlabeled data to a neural and may even decrease the overall classification
network's training to that using only labeled data. accuracy.
Naive Bayes
Classifier Neural Networks C4.5
C1 C2 Ratio C1 C2 Ratio C1 C2 Ratio
Wine 11.11 6.31 0.57 7.61 7.57 0.99 22.34 20.56 0.92
Glass 69.35 68.82 0.99 25.23 26.21 1.04 61.04 58.60 0.96
Chess 28.35 37.91 1.34 19.58 18.53 0.95
Breast 27.30 27.51 1.01 5.94 5.36 0.90 10.70 10.28 0.96
Lympho 27.82 36.72 1.32 40.17 38.28 0.95
Balloons 28.43 32.18 1.13 32.50 25.00 0.77
Tiroides 9.31 8.44 0.91 10.00 8.18 0.82 18.97 18.46 0.97
tic_tac_toe 34.87 32.98 0.95 20.94 19.81 0.95
ionosphere 64.04 64.04 1.00 12.52 12.47 1.00 25.64 21.81 0.85
Iris 4.93 2.64 0.54 9.80 9.42 0.96 18.10 16.76 0.93
Average 0.97 0.95 0.92
4. Table 1. Comparison of the error rates of the three algorithms. C1 is the classifier built using labeled data only, C2 is the
classifier built combining labeled and unlabeled data. Column Ratio presents results for C2 divided by the corresponding
figure for C1. In bold we can see lowest error for a given dataset and the largest reduction in error as a fraction of the original
error for each learning task. C4.5 shows the best improvement in 60% of the tasks. In 77% of the learning tasks the error was
reduced when using unlabeled data, and in 80% of the tasks the best overall results where obtained by a classifier that used
unlabeled data.
Figure 2. Comparison of C4.5 using labeled data only 6. ACKNOWLEDGEMENT: We would like to
with C4.5 using unlabeled data. Points above the diagonal thank CONACyT for partially supporting this work under
line exhibit lower error when the C4.5 is given unlabeled grant J31877-A.
data.
7. REFERENCES
[1] A. Blum, T. Mitchell, Combining Labeled and
Unlabeled Data with Co-Training. Proc. 1998
Conference on Computational Learning Theory, July
1998.
[2] K. Nigam, A. McCallum, S. Thrun , & T. Mitchell,
Learning to Classify Text from Labeled and
Unlabeled Documents, Machine Learning,
1999,1-22.
[3] C. Merz, & P. M. Murphy, UCI repository of
machine learning databases,
Figure 3. Comparison of Naive Bayes Classifier using http://www.ics.uci.edu./~mlearn/MLRepository.ht
labeled data only with Neural Network trained with a ml, 1996.
Naive Bayes Classifier using unlabeled data. Points above
the diagonal line exhibit lower error when the Neural
Network is given unlabeled data [4] M.T. Fardanesh and Okan K. Ersoy, "Classification
Accuracy Improvement of Neural Network
Classifiers by Using Unlabeled Data," IEEE
Transactions on Geoscience and Remote Sensing ,
Vol. 36, No. 3, 1998, 1020-1025.
[5] J. R. Quinlan, C4.5: Programs for Machine Learning
(San Mateo, CA: Morgan Kaufmann, 1993).
5. [6] D. Lewis, & M. Ringuette, A comparison of two 1997 International Conference on Machine
learning algorithms for text categorization, Third Learning.1997.
Annual Symposium on Document Analysis and
Information Retrieval, 1994, 81-93. [8] J. R. Quinlan, Induction of decision trees. Machine
Learning, 1(1), 1986, 81-106.
[7] T. Joachims, A probabilistic analysis of the Rocchio
algorithm with TFIDF for text categorization, Proc.
Figure 4.Comparison results of an ANN. Points above the
diagonal line exhibit lower error when the ANN is given
unlabeled data.