This document presents a study evaluating the performance of machine learning algorithms for network intrusion detection systems (NIDS) using benchmark datasets. Specifically, it applies an AdaBoost-based machine learning algorithm to NIDS and tests its detection accuracy on the KDD Cup 99 and NSL-KDD intrusion detection datasets. The experimental results show that the AdaBoost-based NIDS performs better on the NSL-KDD dataset compared to the KDD Cup 99 dataset, achieving a higher detection rate and lower false alarm rate.
1) Interval classifiers are machine learning algorithms that originated in artificial intelligence research but are now being applied to database mining. They generate decision trees to classify data into intervals based on attribute values.
2) The author implemented the IC interval classifier algorithm and tested it on small datasets, finding higher classification errors than reported in literature due to small training set sizes. Parameter testing showed accuracy improved with larger training sets and more restrictive interval definitions.
3) While efficiency couldn't be fully tested, results suggest interval classifiers may perform well for database applications if further tuned based on dataset characteristics. More research is still needed on algorithm modifications and dynamic training approaches.
Test case optimization in configuration testing using ripper algorithmeSAT Journals
Abstract
Software systems are highly configurable. Although there are lots of advantages in improving the configuration, it is difficult to test unique errors hiding in configurations. To overcome this problem, combinatorial interaction testing (CIT) is used to selects strength and computes a covering array which includes all configuration option combinations. It poorly identifies the effective configuration space. So the cost required for testing get increased. In this work, techniques includes hierarchical clustering algorithm and ripper algorithm. It gives high strength interaction which it can be missed by CIT approach and it identifies effective configuration space. We evaluated and comparecoverage achieves by CIT and RIPPER classification with hierarchical clustering. Using this approach we validate loop as well as statement based configurations. Our results strongly suggest that Proto-interaction formed by RIPPER classificationwith hierarchical clusteringcan effectively covers sets of configurations than traditional CIT.
Keywords: Configuration options, Hierarchical Clustering, RIPPER Algorithm
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGcsandit
Although opinion mining is in a nascent stage of development but still the ground is set for dense growth of researches in the field. One of the important activities of opinion mining is to extract opinions of people based on characteristics of the object under study. Feature extraction in opinion mining can be done by various ways like that of clustering, support vector machines
etc. This paper is an attempt to appraise the various techniques of feature extraction. The first part discusses various techniques and second part makes a detailed appraisal of the major techniques used for feature extraction
Analytical study of feature extraction techniques in opinion miningcsandit
Although opinion mining is in a nascent stage of development but still the ground is set for
dense growth of researches in the field. One of the important activities of opinion mining is to
extract opinions of people based on characteristics of the object under study. Feature extraction
in opinion mining can be done by various ways like that of clustering, support vector machines
etc. This paper is an attempt to appraise the various techniques of feature extraction. The first
part discusses various techniques and second part makes a detailed appraisal of the major
techniques used for feature extraction
Iaetsd performance voting algorithms for software safetyIaetsd Iaetsd
This document summarizes and evaluates different weighted-average voting algorithms for safety-critical systems. It first describes standard weighted-average voting (SWAV), flexible weighted-average voting (FLWAV), and fuzzy weighted-average voting (FZWAV). It then proposes novel algorithms like weighted average voter with soft threshold value (WAVST), fuzzy weighted-average voter with dynamic bandwidth selection (FWAVDB), score-based fuzzy weighted-average voter (SBFWAV), and score-based fuzzy weighted-average voter with dynamic bandwidth and dynamic threshold selection (SBFWAVDBDT). The paper experimentally compares the safety performance of these algorithms on a triple-modular redundant system, finding that SBFWAV
Methodological study of opinion mining and sentiment analysis techniquesijsc
Decision making both on individual and organizational level is always accompanied by the search of
other’s opinion on the same. With tremendous establishment of opinion rich resources like, reviews, forum
discussions, blogs, micro-blogs, Twitter etc provide a rich anthology of sentiments. This user generated
content can serve as a benefaction to market if the semantic orientations are deliberated. Opinion mining
and sentiment analysis are the formalization for studying and construing opinions and sentiments. The
digital ecosystem has itself paved way for use of huge volume of opinionated data recorded. This paper is
an attempt to review and evaluate the various techniques used for opinion and sentiment analysis.
This document discusses decision tree induction and rule induction. It describes splitting functions used to split data when building a decision tree, including information gain. It discusses overfitting in decision trees and the need for pruning trees to avoid capturing noise. Common pruning techniques are outlined, including removing internal nodes that do not harm accuracy on a validation set. Decision trees are appropriate when attributes are mixed, the target is discrete, a disjunctive normal form would be effective, and the data may have errors.
Session-Based Recommendations with Recurrent Neural Networks(Balazs Hidasi, ...hyunsung lee
The document summarizes a research paper about using recurrent neural networks for session-based recommendations. It introduces factor model and neighborhood approaches commonly used for recommendations. It then discusses limits of these approaches for session data and how RNNs are well-suited to handle sequential session information. The proposed model uses GRUs to process item sequences, outputs predicted item scores, and is trained on mini-batches with a ranking loss to optimize for next-item predictions. Experiments evaluated the model on two datasets.
1) Interval classifiers are machine learning algorithms that originated in artificial intelligence research but are now being applied to database mining. They generate decision trees to classify data into intervals based on attribute values.
2) The author implemented the IC interval classifier algorithm and tested it on small datasets, finding higher classification errors than reported in literature due to small training set sizes. Parameter testing showed accuracy improved with larger training sets and more restrictive interval definitions.
3) While efficiency couldn't be fully tested, results suggest interval classifiers may perform well for database applications if further tuned based on dataset characteristics. More research is still needed on algorithm modifications and dynamic training approaches.
Test case optimization in configuration testing using ripper algorithmeSAT Journals
Abstract
Software systems are highly configurable. Although there are lots of advantages in improving the configuration, it is difficult to test unique errors hiding in configurations. To overcome this problem, combinatorial interaction testing (CIT) is used to selects strength and computes a covering array which includes all configuration option combinations. It poorly identifies the effective configuration space. So the cost required for testing get increased. In this work, techniques includes hierarchical clustering algorithm and ripper algorithm. It gives high strength interaction which it can be missed by CIT approach and it identifies effective configuration space. We evaluated and comparecoverage achieves by CIT and RIPPER classification with hierarchical clustering. Using this approach we validate loop as well as statement based configurations. Our results strongly suggest that Proto-interaction formed by RIPPER classificationwith hierarchical clusteringcan effectively covers sets of configurations than traditional CIT.
Keywords: Configuration options, Hierarchical Clustering, RIPPER Algorithm
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGcsandit
Although opinion mining is in a nascent stage of development but still the ground is set for dense growth of researches in the field. One of the important activities of opinion mining is to extract opinions of people based on characteristics of the object under study. Feature extraction in opinion mining can be done by various ways like that of clustering, support vector machines
etc. This paper is an attempt to appraise the various techniques of feature extraction. The first part discusses various techniques and second part makes a detailed appraisal of the major techniques used for feature extraction
Analytical study of feature extraction techniques in opinion miningcsandit
Although opinion mining is in a nascent stage of development but still the ground is set for
dense growth of researches in the field. One of the important activities of opinion mining is to
extract opinions of people based on characteristics of the object under study. Feature extraction
in opinion mining can be done by various ways like that of clustering, support vector machines
etc. This paper is an attempt to appraise the various techniques of feature extraction. The first
part discusses various techniques and second part makes a detailed appraisal of the major
techniques used for feature extraction
Iaetsd performance voting algorithms for software safetyIaetsd Iaetsd
This document summarizes and evaluates different weighted-average voting algorithms for safety-critical systems. It first describes standard weighted-average voting (SWAV), flexible weighted-average voting (FLWAV), and fuzzy weighted-average voting (FZWAV). It then proposes novel algorithms like weighted average voter with soft threshold value (WAVST), fuzzy weighted-average voter with dynamic bandwidth selection (FWAVDB), score-based fuzzy weighted-average voter (SBFWAV), and score-based fuzzy weighted-average voter with dynamic bandwidth and dynamic threshold selection (SBFWAVDBDT). The paper experimentally compares the safety performance of these algorithms on a triple-modular redundant system, finding that SBFWAV
Methodological study of opinion mining and sentiment analysis techniquesijsc
Decision making both on individual and organizational level is always accompanied by the search of
other’s opinion on the same. With tremendous establishment of opinion rich resources like, reviews, forum
discussions, blogs, micro-blogs, Twitter etc provide a rich anthology of sentiments. This user generated
content can serve as a benefaction to market if the semantic orientations are deliberated. Opinion mining
and sentiment analysis are the formalization for studying and construing opinions and sentiments. The
digital ecosystem has itself paved way for use of huge volume of opinionated data recorded. This paper is
an attempt to review and evaluate the various techniques used for opinion and sentiment analysis.
This document discusses decision tree induction and rule induction. It describes splitting functions used to split data when building a decision tree, including information gain. It discusses overfitting in decision trees and the need for pruning trees to avoid capturing noise. Common pruning techniques are outlined, including removing internal nodes that do not harm accuracy on a validation set. Decision trees are appropriate when attributes are mixed, the target is discrete, a disjunctive normal form would be effective, and the data may have errors.
Session-Based Recommendations with Recurrent Neural Networks(Balazs Hidasi, ...hyunsung lee
The document summarizes a research paper about using recurrent neural networks for session-based recommendations. It introduces factor model and neighborhood approaches commonly used for recommendations. It then discusses limits of these approaches for session data and how RNNs are well-suited to handle sequential session information. The proposed model uses GRUs to process item sequences, outputs predicted item scores, and is trained on mini-batches with a ranking loss to optimize for next-item predictions. Experiments evaluated the model on two datasets.
This document discusses decision tree algorithms C4.5 and CART. It explains that ID3 has limitations in dealing with continuous data and noisy data, which C4.5 aims to address through techniques like post-pruning trees to avoid overfitting. CART uses binary splits and measures like Gini index or entropy to produce classification trees, and sum of squared errors to produce regression trees. It also performs cost-complexity pruning to find an optimal trade-off between accuracy and model complexity.
EFFECTIVENESS PREDICTION OF MEMORY BASED CLASSIFIERS FOR THE CLASSIFICATION O...cscpconf
Classification is a step by step practice for allocating a given piece of input into any of the given
category. Classification is an essential Machine Learning technique. There are many
classification problem occurs in different application areas and need to be solved. Different
types are classification algorithms like memory-based, tree-based, rule-based, etc are widely
used. This work studies the performance of different memory based classifiers for classification
of Multivariate data set from UCI machine learning repository using the open source machine
learning tool. A comparison of different memory based classifiers used and a practical
guideline for selecting the most suited algorithm for a classification is presented. Apart fromthat some empirical criteria for describing and evaluating the best classifiers are discussed
Optimization of latency of temporal key Integrity protocol (tkip) using graph...ijcseit
The document discusses optimization of latency in the Temporal Key Integrity Protocol (TKIP) using hardware-software co-design and graph theory. It presents a mathematical model to partition TKIP algorithms between hardware and software blocks to minimize latency. Simulation results showed the proposed technique achieved lower latency than a hardware-only implementation, reducing latency from 10us to 8us. The technique models TKIP modules as a graph and uses algorithms to assign modules to hardware or software based on latency calculations.
This document summarizes an empirical study comparing several supervised machine learning approaches for word sense disambiguation: Naive Bayes, decision tree, decision list, and support vector machine (SVM). The study used a dataset of 15 words annotated with senses from WordNet and Senseval-3. Each approach was implemented and evaluated based on its accuracy in identifying the correct sense of each word. The results showed that the decision list approach achieved the highest overall accuracy of 69.12%, followed by SVM at 56.11%, naive Bayes at 58.32%, and decision tree at 45.14%. Thus, the study concluded that decision list performed best on this dataset for the task of word sense disambiguation.
This document discusses a novel method for intrusion awareness using Distributed Situational Awareness (D-SA). It proposes using D-SA and support vector machines (SVM) for network intrusion detection and classification. The method is evaluated using the KDD Cup 1999 intrusion detection dataset. Experimental results show the proposed D-SA method achieves higher detection rates compared to rule-based classification techniques.
Here are the key calculations:
1) Probability that persons p and q will be at the same hotel on a given day d is 1/100 × 1/100 × 10-5 = 10-9, since there are 100 hotels and each person stays in a hotel with probability 10-5 on any given day.
2) Probability that p and q will be at the same hotel on given days d1 and d2 is (10-9) × (10-9) = 10-18, since the events are independent.
This document compares different supervised learning approaches for word sense disambiguation (WSD), including Naive Bayes, Decision Tree, and Decision List classifiers. An experiment is conducted using a dataset of 15 words and their senses from WordNet. The Decision List approach achieves the highest accuracy at 69.12%, followed by Naive Bayes at 58.32% and Decision Tree at 45.14%. While no single approach performed best for all words, overall Decision List provided the most accurate WSD and is presented as the best performing method for this problem among the three approaches studied.
International Journal of Computational Engineering Research(IJCER)ijceronline
This document summarizes a research paper that proposes a novel approach to improve the detection rate and search efficiency of signature-based network intrusion detection systems (NIDS). The approach uses data mining and classification algorithms like C4.5 and ensemble algorithms like MadaBoost to improve detection rates. It also uses a modified signature apriori algorithm to more efficiently search for signatures of related attacks based on known signatures, in order to improve search efficiency. The full paper describes these approaches in more technical detail and evaluates their effectiveness at improving NIDS performance.
This document describes a parameter-less density-based clustering algorithm. It calculates point density by exponentially weighting distances between points. Cluster centers are identified as local maxima in the density surface using gradient ascent from random starting points. The algorithm is compared to DBSCAN and shown to perform better without parameter tuning on stochastic data. Code and examples are provided to cluster data and assign points to centers based on distance.
This document provides an overview of decision tree induction methods and their application to big data. It discusses decision trees as a method for identifying patterns in large datasets that has the advantage of being interpretable. The document describes the basic principles of decision tree induction, different algorithms for constructing decision trees, and measures for evaluating tree performance and structure. It also discusses challenges such as fitting trees to existing expert knowledge and improving classification through feature selection.
Decision trees are a machine learning technique that use a tree-like model to predict outcomes. They break down a dataset into smaller subsets based on attribute values. Decision trees evaluate attributes like outlook, temperature, humidity, and wind to determine the best predictor. The algorithm calculates information gain to determine which attribute best splits the data into the most homogeneous subsets. It selects the attribute with the highest information gain to place at the root node and then recursively builds the tree by splitting on subsequent attributes.
Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...Kishor Datta Gupta
—Recommendation is crucial in both academia andindustry, and various techniques are proposed such as content-based collaborative filtering, matrix factorization, logistic re-gression, factorization machines, neural networks and multi-armed bandits. However, most of the previous studies sufferfrom two limitations: (1) considering the recommendation asa static procedure and ignoring the dynamic interactive naturebetween users and the recommender systems; (2) focusing on theimmediate feedback of recommended items and neglecting thelong-term rewards. To address the two limitations, in this paperwe propose a novel recommendation framework based on deepreinforcement learning, called DRR. The DRR framework treatsrecommendation as a sequential decision making procedure andadopts an “Actor-Critic” reinforcement learning scheme to modelthe interactions between the users and recommender systems,which can consider both the dynamic adaptation and long-term rewards. Further more, a state representation module isincorporated into DRR, which can explicitly capture the interac-tions between items and users. Three instantiation structures aredeveloped. Extensive experiments on four real-world datasets areconducted under both the offline and online evaluation settings.The experimental results demonstrate the proposed DRR methodindeed outperforms the state-of-the-art competitors
This work is proposed the feed forward neural network with symmetric table addition method to design the
neuron synapses algorithm of the sine function approximations, and according to the Taylor series
expansion. Matlab code and LabVIEW are used to build and create the neural network, which has been
designed and trained database set to improve its performance, and gets the best a global convergence with
small value of MSE errors and 97.22% accuracy.
Interval Type-2 Fuzzy Logic Systems (IT2 FLSs) have shown popularity, superiority, and more accuracy in performance in a number of applications in the last decade. This is due to its ability to cope with uncertainty and precisions adequately when compared with its type-1 counterpart. In this paper, an Interval Type-2 Fuzzy Logic System (IT2FLS) is employed for remote vital signs monitoring and predicting of shock level in cardiac patients. Also, the conventional, Type-1 Fuzzy Logic System (T1FLS) is applied to the prediction problems for comparison purpose. The cardiac patients’ health datasets were used to perform empirical comparison on the developed system. The result of study indicated that IT2FLS could coped with more information and handled more uncertainties in health data than T1FLS. The statistical evaluation using performance metrices indicated a minimal error with IT2FLS compared to its counterpart, T1FLS. It was generally observed that the shock level prediction experiment for cardiac patients showed the superiority of IT2FLS paradigm over T1FLS.
On the use of voice activity detection in speech emotion recognitionjournalBEEI
Emotion recognition through speech has many potential applications, however the challenge comes from achieving a high emotion recognition while using limited resources or interference such as noise. In this paper we have explored the possibility of improving speech emotion recognition by utilizing the voice activity detection (VAD) concept. The emotional voice data from the Berlin Emotion Database (EMO-DB) and a custom-made database LQ Audio Dataset are firstly preprocessed by VAD before feature extraction. The features are then passed to the deep neural network for classification. In this paper, we have chosen MFCC to be the sole determinant feature. From the results obtained using VAD and without, we have found that the VAD improved the recognition rate of 5 emotions (happy, angry, sad, fear, and neutral) by 3.7% when recognizing clean signals, while the effect of using VAD when training a network with both clean and noisy signals improved our previous results by 50%.
Introduction to Some Tree based Learning MethodHonglin Yu
Random Forest, Boosted Trees and other ensemble learning methods build multiple models to improve predictive performance over single models. They combine "weak learners" like decision trees into a "strong learner". Random Forest adds randomness by selecting a random subset of features at each split. Boosting trains trees sequentially on weighted data from previous trees. Both reduce variance compared to bagging. Random Forest often outperforms Boosting while being faster to train. Neural networks can also be viewed as an ensemble method by combining simple units.
Types of Machine Learnig Algorithms(CART, ID3)Fatimakhan325
The document summarizes several machine learning algorithms used for data mining:
- Decision trees use nodes and edges to iteratively divide data into groups for classification or prediction.
- Naive Bayes classifiers use Bayes' theorem for text classification, spam filtering, and sentiment analysis due to their multi-class prediction abilities.
- K-nearest neighbors algorithms find the closest K data points to make predictions for classification or regression problems.
- ID3, CART, and k-means clustering are also summarized highlighting their uses, advantages, and disadvantages.
This document summarizes an experimental study on the performance and emissions of a diesel engine fueled with crude rice bran oil methyl ester (CRBOME) and its blends with diesel and kerosene. Tests were conducted on blends containing 20%, 40%, 60%, and 80% CRBOME. Additional tests used blends containing 20%, 40%, 60%, and 80% CRBOME with 5%, 10%, 15%, and 20% kerosene respectively, with the remainder being diesel. The engine was tested at various loads and engine performance measures and emissions were evaluated. Results showed that a blend of 20% CRBOME and diesel had similar performance to diesel alone. Replacing 5% diesel with kerosene
This study examined the scientific attitude of 9th class students based on management, locality, and sex. 300 9th class students were surveyed using a scientific attitude test. The study found that:
1. Management and sex had a significant influence on scientific attitude, with government school students and female students having higher scientific attitudes.
2. Locality did not have a significant influence on scientific attitude.
3. The study concluded that sex, management, and locality should be considered to improve science education and foster scientific attitude among students. Teachers should work to create interest in science for all students.
This document summarizes a research paper on visual cryptography, which is a technique that allows information like images and text to be encrypted in a way that can be decrypted by the human visual system without using computers. It discusses how visual cryptography works by splitting a secret image into random shares, such that overlaying the shares reveals the original secret image. The document then describes the specific SDS algorithm used in the paper for keyless image encryption by sieving, dividing, and shuffling the image pixels into multiple random shares. It concludes by discussing potential applications and areas for further research on visual cryptography.
This document summarizes a journal article about a traffic light control system using radio frequency (RF) for emergency vehicles. The system uses an RF transmitter in emergency vehicles that sends a signal to an RF receiver at an intersection. When the receiver gets the emergency signal, it overrides the normal traffic light sequence and changes the light for the emergency vehicle to green for a set time. This allows emergency vehicles to pass through intersections more quickly. The system was tested up to a range of 20 meters outdoors and 30 meters indoors using a 434MHz RF module and PIC microcontroller. The system aims to help reduce accidents by giving emergency vehicles priority at traffic lights.
This document discusses decision tree algorithms C4.5 and CART. It explains that ID3 has limitations in dealing with continuous data and noisy data, which C4.5 aims to address through techniques like post-pruning trees to avoid overfitting. CART uses binary splits and measures like Gini index or entropy to produce classification trees, and sum of squared errors to produce regression trees. It also performs cost-complexity pruning to find an optimal trade-off between accuracy and model complexity.
EFFECTIVENESS PREDICTION OF MEMORY BASED CLASSIFIERS FOR THE CLASSIFICATION O...cscpconf
Classification is a step by step practice for allocating a given piece of input into any of the given
category. Classification is an essential Machine Learning technique. There are many
classification problem occurs in different application areas and need to be solved. Different
types are classification algorithms like memory-based, tree-based, rule-based, etc are widely
used. This work studies the performance of different memory based classifiers for classification
of Multivariate data set from UCI machine learning repository using the open source machine
learning tool. A comparison of different memory based classifiers used and a practical
guideline for selecting the most suited algorithm for a classification is presented. Apart fromthat some empirical criteria for describing and evaluating the best classifiers are discussed
Optimization of latency of temporal key Integrity protocol (tkip) using graph...ijcseit
The document discusses optimization of latency in the Temporal Key Integrity Protocol (TKIP) using hardware-software co-design and graph theory. It presents a mathematical model to partition TKIP algorithms between hardware and software blocks to minimize latency. Simulation results showed the proposed technique achieved lower latency than a hardware-only implementation, reducing latency from 10us to 8us. The technique models TKIP modules as a graph and uses algorithms to assign modules to hardware or software based on latency calculations.
This document summarizes an empirical study comparing several supervised machine learning approaches for word sense disambiguation: Naive Bayes, decision tree, decision list, and support vector machine (SVM). The study used a dataset of 15 words annotated with senses from WordNet and Senseval-3. Each approach was implemented and evaluated based on its accuracy in identifying the correct sense of each word. The results showed that the decision list approach achieved the highest overall accuracy of 69.12%, followed by SVM at 56.11%, naive Bayes at 58.32%, and decision tree at 45.14%. Thus, the study concluded that decision list performed best on this dataset for the task of word sense disambiguation.
This document discusses a novel method for intrusion awareness using Distributed Situational Awareness (D-SA). It proposes using D-SA and support vector machines (SVM) for network intrusion detection and classification. The method is evaluated using the KDD Cup 1999 intrusion detection dataset. Experimental results show the proposed D-SA method achieves higher detection rates compared to rule-based classification techniques.
Here are the key calculations:
1) Probability that persons p and q will be at the same hotel on a given day d is 1/100 × 1/100 × 10-5 = 10-9, since there are 100 hotels and each person stays in a hotel with probability 10-5 on any given day.
2) Probability that p and q will be at the same hotel on given days d1 and d2 is (10-9) × (10-9) = 10-18, since the events are independent.
This document compares different supervised learning approaches for word sense disambiguation (WSD), including Naive Bayes, Decision Tree, and Decision List classifiers. An experiment is conducted using a dataset of 15 words and their senses from WordNet. The Decision List approach achieves the highest accuracy at 69.12%, followed by Naive Bayes at 58.32% and Decision Tree at 45.14%. While no single approach performed best for all words, overall Decision List provided the most accurate WSD and is presented as the best performing method for this problem among the three approaches studied.
International Journal of Computational Engineering Research(IJCER)ijceronline
This document summarizes a research paper that proposes a novel approach to improve the detection rate and search efficiency of signature-based network intrusion detection systems (NIDS). The approach uses data mining and classification algorithms like C4.5 and ensemble algorithms like MadaBoost to improve detection rates. It also uses a modified signature apriori algorithm to more efficiently search for signatures of related attacks based on known signatures, in order to improve search efficiency. The full paper describes these approaches in more technical detail and evaluates their effectiveness at improving NIDS performance.
This document describes a parameter-less density-based clustering algorithm. It calculates point density by exponentially weighting distances between points. Cluster centers are identified as local maxima in the density surface using gradient ascent from random starting points. The algorithm is compared to DBSCAN and shown to perform better without parameter tuning on stochastic data. Code and examples are provided to cluster data and assign points to centers based on distance.
This document provides an overview of decision tree induction methods and their application to big data. It discusses decision trees as a method for identifying patterns in large datasets that has the advantage of being interpretable. The document describes the basic principles of decision tree induction, different algorithms for constructing decision trees, and measures for evaluating tree performance and structure. It also discusses challenges such as fitting trees to existing expert knowledge and improving classification through feature selection.
Decision trees are a machine learning technique that use a tree-like model to predict outcomes. They break down a dataset into smaller subsets based on attribute values. Decision trees evaluate attributes like outlook, temperature, humidity, and wind to determine the best predictor. The algorithm calculates information gain to determine which attribute best splits the data into the most homogeneous subsets. It selects the attribute with the highest information gain to place at the root node and then recursively builds the tree by splitting on subsequent attributes.
Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...Kishor Datta Gupta
—Recommendation is crucial in both academia andindustry, and various techniques are proposed such as content-based collaborative filtering, matrix factorization, logistic re-gression, factorization machines, neural networks and multi-armed bandits. However, most of the previous studies sufferfrom two limitations: (1) considering the recommendation asa static procedure and ignoring the dynamic interactive naturebetween users and the recommender systems; (2) focusing on theimmediate feedback of recommended items and neglecting thelong-term rewards. To address the two limitations, in this paperwe propose a novel recommendation framework based on deepreinforcement learning, called DRR. The DRR framework treatsrecommendation as a sequential decision making procedure andadopts an “Actor-Critic” reinforcement learning scheme to modelthe interactions between the users and recommender systems,which can consider both the dynamic adaptation and long-term rewards. Further more, a state representation module isincorporated into DRR, which can explicitly capture the interac-tions between items and users. Three instantiation structures aredeveloped. Extensive experiments on four real-world datasets areconducted under both the offline and online evaluation settings.The experimental results demonstrate the proposed DRR methodindeed outperforms the state-of-the-art competitors
This work is proposed the feed forward neural network with symmetric table addition method to design the
neuron synapses algorithm of the sine function approximations, and according to the Taylor series
expansion. Matlab code and LabVIEW are used to build and create the neural network, which has been
designed and trained database set to improve its performance, and gets the best a global convergence with
small value of MSE errors and 97.22% accuracy.
Interval Type-2 Fuzzy Logic Systems (IT2 FLSs) have shown popularity, superiority, and more accuracy in performance in a number of applications in the last decade. This is due to its ability to cope with uncertainty and precisions adequately when compared with its type-1 counterpart. In this paper, an Interval Type-2 Fuzzy Logic System (IT2FLS) is employed for remote vital signs monitoring and predicting of shock level in cardiac patients. Also, the conventional, Type-1 Fuzzy Logic System (T1FLS) is applied to the prediction problems for comparison purpose. The cardiac patients’ health datasets were used to perform empirical comparison on the developed system. The result of study indicated that IT2FLS could coped with more information and handled more uncertainties in health data than T1FLS. The statistical evaluation using performance metrices indicated a minimal error with IT2FLS compared to its counterpart, T1FLS. It was generally observed that the shock level prediction experiment for cardiac patients showed the superiority of IT2FLS paradigm over T1FLS.
On the use of voice activity detection in speech emotion recognitionjournalBEEI
Emotion recognition through speech has many potential applications, however the challenge comes from achieving a high emotion recognition while using limited resources or interference such as noise. In this paper we have explored the possibility of improving speech emotion recognition by utilizing the voice activity detection (VAD) concept. The emotional voice data from the Berlin Emotion Database (EMO-DB) and a custom-made database LQ Audio Dataset are firstly preprocessed by VAD before feature extraction. The features are then passed to the deep neural network for classification. In this paper, we have chosen MFCC to be the sole determinant feature. From the results obtained using VAD and without, we have found that the VAD improved the recognition rate of 5 emotions (happy, angry, sad, fear, and neutral) by 3.7% when recognizing clean signals, while the effect of using VAD when training a network with both clean and noisy signals improved our previous results by 50%.
Introduction to Some Tree based Learning MethodHonglin Yu
Random Forest, Boosted Trees and other ensemble learning methods build multiple models to improve predictive performance over single models. They combine "weak learners" like decision trees into a "strong learner". Random Forest adds randomness by selecting a random subset of features at each split. Boosting trains trees sequentially on weighted data from previous trees. Both reduce variance compared to bagging. Random Forest often outperforms Boosting while being faster to train. Neural networks can also be viewed as an ensemble method by combining simple units.
Types of Machine Learnig Algorithms(CART, ID3)Fatimakhan325
The document summarizes several machine learning algorithms used for data mining:
- Decision trees use nodes and edges to iteratively divide data into groups for classification or prediction.
- Naive Bayes classifiers use Bayes' theorem for text classification, spam filtering, and sentiment analysis due to their multi-class prediction abilities.
- K-nearest neighbors algorithms find the closest K data points to make predictions for classification or regression problems.
- ID3, CART, and k-means clustering are also summarized highlighting their uses, advantages, and disadvantages.
This document summarizes an experimental study on the performance and emissions of a diesel engine fueled with crude rice bran oil methyl ester (CRBOME) and its blends with diesel and kerosene. Tests were conducted on blends containing 20%, 40%, 60%, and 80% CRBOME. Additional tests used blends containing 20%, 40%, 60%, and 80% CRBOME with 5%, 10%, 15%, and 20% kerosene respectively, with the remainder being diesel. The engine was tested at various loads and engine performance measures and emissions were evaluated. Results showed that a blend of 20% CRBOME and diesel had similar performance to diesel alone. Replacing 5% diesel with kerosene
This study examined the scientific attitude of 9th class students based on management, locality, and sex. 300 9th class students were surveyed using a scientific attitude test. The study found that:
1. Management and sex had a significant influence on scientific attitude, with government school students and female students having higher scientific attitudes.
2. Locality did not have a significant influence on scientific attitude.
3. The study concluded that sex, management, and locality should be considered to improve science education and foster scientific attitude among students. Teachers should work to create interest in science for all students.
This document summarizes a research paper on visual cryptography, which is a technique that allows information like images and text to be encrypted in a way that can be decrypted by the human visual system without using computers. It discusses how visual cryptography works by splitting a secret image into random shares, such that overlaying the shares reveals the original secret image. The document then describes the specific SDS algorithm used in the paper for keyless image encryption by sieving, dividing, and shuffling the image pixels into multiple random shares. It concludes by discussing potential applications and areas for further research on visual cryptography.
This document summarizes a journal article about a traffic light control system using radio frequency (RF) for emergency vehicles. The system uses an RF transmitter in emergency vehicles that sends a signal to an RF receiver at an intersection. When the receiver gets the emergency signal, it overrides the normal traffic light sequence and changes the light for the emergency vehicle to green for a set time. This allows emergency vehicles to pass through intersections more quickly. The system was tested up to a range of 20 meters outdoors and 30 meters indoors using a 434MHz RF module and PIC microcontroller. The system aims to help reduce accidents by giving emergency vehicles priority at traffic lights.
This document discusses applying a neural network approach to decision making in a self-organizing computing network (SOCN). It proposes using concepts from fuzzy logic and neural networks to build a computing network that can handle mixed data types, like symbolic and numeric data. The network would have input, hidden, and output layers connected by transfer functions. The hidden cells would self-organize based on training data to learn relationships between input and output cells. This approach aims to allow the network to make decisions on data sets with diverse attribute types in a more effective way than other techniques.
This document summarizes a research paper that proposes a security architecture for cloud computing that dynamically configures cryptographic algorithms and keys based on security policies and inputs like network access risk and data sensitivity. The architecture aims to improve security while reducing costs by only using the necessary level of encryption for each situation. It describes using the Blowfish algorithm instead of AES and adjusting the key size from 128 to 448 bits depending on factors like network type and data size. Results show Blowfish has better performance than AES, especially with larger keys on larger amounts of data. The goal is to provide flexible, efficient security tailored to each user's needs.
El documento discute la relación entre el diseño y la cultura. Explica que el diseño refleja la historia y cultura humana y puede usarse para desarrollar habilidades e instituciones. También señala que el diseño contribuye al bienestar humano a través de la calidad de los productos y servicios. Finalmente, concluye que la relación entre diseño y cultura ha cambiado en los últimos años para fomentar la innovación y el desarrollo.
This document summarizes an article that proposes modifications to the JPEG 2000 image compression standard to achieve higher compression ratios while maintaining acceptable error rates. The proposed Adaptive JPEG 2000 technique involves pre-processing images with a transfer function to make them more suitable for compression by JPEG 2000. This is intended to provide higher compression ratios than the original JPEG 2000 standard while keeping root mean square error within allowed thresholds. The document provides background on JPEG 2000 and lossy image compression techniques, describes the proposed pre-processing approach, and indicates it was tested on single-channel images.
This document provides a comprehensive review of vision-based hand gesture recognition technology. It discusses different approaches to vision-based hand gesture recognition including appearance-based and model-based approaches. Appearance-based approaches model gestures based on image properties and views, while model-based approaches use 3D models to represent hand posture. The document also reviews several papers on specific hand gesture recognition systems and compares their segmentation methods, feature extraction techniques, representations, and classification algorithms. Finally, it discusses applications of vision-based hand gesture recognition including as an alternative to touchscreens and in areas like sign language recognition, gaming, and robot control.
This document describes an experimental study of heat transfer in a rectangular duct with and without internal V-shaped ribs. Experiments were conducted with air flow in turbulent regime (Reynolds numbers 3000-18000) in smooth duct and ducts with continuous or discrete internal V-shaped ribs. Temperature and pressure measurements were taken to determine heat transfer coefficients and friction factors for different configurations. Results showed that continuous ribs enhanced heat transfer more than discrete ribs, but also increased pressure drop more substantially. Heat transfer enhancement was dependent on rib geometry and position.
This document discusses distributed firewalls as an alternative to traditional firewalls. It provides an overview of distributed firewalls, including that they allow security policies to be centrally defined but enforced across individual endpoints. The key advantages of distributed firewalls are that they do not depend on network topology, protect from internal threats, and avoid bottlenecks since there are multiple secure entry points rather than a single point of failure. The document also reviews related work on distributed firewalls and some of their disadvantages, such as increased complexity if the central management system is compromised.
The document describes the design and simulation of a dual-band microstrip patch antenna with a defected ground structure for STM-1 and cellular applications at 4.9 GHz and 7.6 GHz. A rectangular patch antenna was designed on a dielectric substrate above a ground plane. Two slots were etched into the ground plane to create a defected ground structure. Simulation results showed the antenna achieved return losses of -12.75 dB and -13.01 dB at 4.9 GHz and 7.6 GHz respectively, meeting the design requirements. Parameters like slot width and feed length were optimized to improve impedance matching and bandwidth. The antenna design demonstrates a technique for dual-band operation using a defected ground structure.
This document describes a student attendance recording system using face recognition and GSM technology. The system uses a webcam to capture images of students' faces and matches them to images stored in a database using MATLAB. If a match is found, a text message is sent using a GSM board to notify that the student has attended class. The system aims to automate attendance tracking and address issues with conventional paper-based systems. It provides accurate attendance recording and real-time notification without requiring students to manually sign attendance sheets or carry RFID cards. The results found nearly 100% accurate recognition and authentication of student faces compared to the stored images in testing.
This document discusses various techniques for image contrast enhancement, including contrast stretching, grey level slicing, histogram equalization, local enhancement equalization, image subtraction, and spatial filtering. It provides details on how each technique works and compares their performance both qualitatively and quantitatively using metrics like SNR and PSNR. The conclusion is that contrast stretching generally provides the best enhancement among the techniques compared, but other techniques may be better suited for specific applications.
This document reviews various e-learning methodologies. It discusses asynchronous and synchronous learning methods. Asynchronous methods allow learners and instructors flexibility in time and location through tools like email and discussion forums. Synchronous methods require all participants to be online at the same time, using tools like video conferencing and real-time chat. The document also examines interactions between learners and instructors and among learners, noting benefits of both asynchronous flexibility and synchronous collaboration.
This document proposes and evaluates methods for fusing 3D ear and face biometrics at the score level and feature level for personal authentication. Local 3D features are extracted from ear and face data and fused using root mean square distance matching at the feature level. At the score level, matching scores from ear and face modalities are fused using weighted sum rule techniques. Experiments on a database of 990 ear and face images from 60 individuals show that the multimodal biometrics systems using feature level or score level fusion techniques have lower equal error rates compared to unimodal ear or face systems alone, demonstrating improved performance from fusing the biometric modalities.
This document summarizes research into synthesizing a silicon carbide composite material for industrial applications. The composite is made up of silicon carbide particles reinforced with fibers, along with binders and fillers. The synthesis process is described in detail, including mixing the constituents, compacting them in a mold, and sintering the compact at increasing temperatures. The final composite material is then characterized through wear and friction tests to evaluate its potential for applications requiring high temperature resistance, such as brake pads and clutch plates.
H-J Enterprises manufactures air to air bushings for voltages ranging from 15kV to 38kV. The document provides detailed specifications for standard and custom bushing assemblies, including dimensions, materials used, and electrical test results. H-J Enterprises also offers electrical testing of bushings, including basic impulse, partial discharge, and cantilever load testing to certify that bushings meet appropriate standards.
Intrusion Detection and Forensics based on decision tree and Association rule...IJMER
This paper present an approach based on the combination of, two techniques using
decision tree and Association rule mining for Probe attack detection. This approach proves to be
better than the traditional approach of generating rules for fuzzy expert system by clustering methods.
Association rule mining for selecting the best attributes together and decision tree for identifying the
best parameters together to create the rules for fuzzy expert system. After that rules for fuzzy expert
system are generated using association rule mining and decision trees. Decision trees is generated for
dataset and to find the basic parameters for creating the membership functions of fuzzy inference
system. Membership functions are generated for the probe attack. Based on these rules we have
created the fuzzy inference system that is used as an input to neuro-fuzzy system. Fuzzy inference
system is loaded to neuro-fuzzy toolbox as an input and the final ANFIS structure is generated for
outcome of neuro-fuzzy approach. The experiments and evaluations of the proposed method were
done with NSL-KDD intrusion detection dataset. As the experimental results, the proposed approach
based on the combination of, two techniques using decision tree and Association rule mining
efficiently detected probe attacks. Experimental results shows better results for detecting intrusions as
compared to others existing methods
Intrusion Detection System Based on K-Star Classifier and Feature Set ReductionIOSR Journals
Abstract: Network security and Intrusion Detection Systems (IDS’s) is an important security related research
area. This paper applies K-star algorithm with filtering analysis in order to build a network intrusion detection
system. For our experimental analysis and as a case study, we have used the new NSL-KDD dataset, which is a
modified dataset for KDDCup 1999 intrusion detection benchmark dataset. With a split of 66.0% for the
training set and the remainder for the testing set a 2 class classifications has been implemented. WEKA which is
a java based open source software consists of a collection of machine learning algorithms for Data mining tasks
has been used in the testing process. The experimental results show that the proposed approach is very accurate
with low false positive rate and high true positive rate and it takes less learning time in comparison with other
existing approaches used for efficient network intrusion detection.
Keywords: Information Gain, Intrusion Detection System, Instance-based classifier, K-Star, Weka.
The document discusses improving network security using machine learning techniques like the Decision Tree (C4.5) algorithm and Genetic Algorithm. It analyzes using these algorithms to classify network connections as normal or attacks like denial-of-service, user-to-root, and probing. The algorithms are trained on the KDD Cup 99 dataset to generate rules to detect different attack types with over 93% accuracy for denial-of-service attacks. The enhanced C4.5 algorithm that uses a gain ratio criterion is shown to outperform the classical C4.5 algorithm in detecting attacks.
IRJET- Intrusion Detection based on J48 AlgorithmIRJET Journal
This document presents a decision tree-based intrusion detection system that uses the J48 algorithm. The system was tested on the NSL-KDD dataset and achieved an accuracy of 96.50% in detecting intrusions. The system uses the Weka tool to implement the J48 decision tree algorithm and generate a classification output identifying normal network connections and different types of attacks. The proposed approach aims to reduce false positives generated by decision trees and outperforms baseline methods according to various evaluation metrics like precision, recall, and accuracy.
The document proposes a layering based network intrusion detection system to improve detection of network attacks. It selects a small set of important features for each attack type layer, rather than using all features, to build more efficient intrusion detection models. The system is tested on the NSL-KDD intrusion detection dataset using machine learning classifiers like Naive Bayes and Random Forest. The results show the optimal feature selection approach enhances accuracy while reducing computational requirements compared to using all features.
A Novel Classification via Clustering Method for Anomaly Based Network Intrus...IDES Editor
Intrusion detection in the internet is an active
area of research. Intruders can be classified into two
types, namely; external intruders who are unauthorized
users of the computers they attack, and internal
intruders, who have permission to access the system but
with some restrictions. The aim of this paper is to present
a methodology to recognize attacks during the normal
activities in a system. A novel classification via sequential
information bottleneck (sIB) clustering algorithm has
been proposed to build an efficient anomaly based
network intrusion detection model. We have compared
our proposed method with other clustering algorithms
like X-Means, Farthest First, Filtered clusters, DBSCAN,
K-Means, and EM (Expectation-Maximization)
clustering in order to find the suitability of our proposed
algorithm. A subset of KDDCup 1999 intrusion detection
benchmark dataset has been used for the experiment.
Results show that the proposed method is efficient in
terms of detection accuracy, low false positive rate in
comparison to the other existing methods.
Intrusion Detection System for Classification of Attacks with Cross Validationinventionjournals
Now days, due to rapidly uses of internet, the patterns of network attacks are increasing. There are various organizations and institutes are using internet and access or share the sensitive information in network. To protect information from unauthorized or intruders is one of the important issues. In this paper, we have used decision tree techniques like C4.5 and CART as classifier for classification of attacks. We have proposed an ensemble model that is combination of C4.5 and Classification and Regression Tree (CART) as robust classifier for classification of attacks. We have used NSL-KDD data set with binary and multiclass problem with 10-fold cross validation. The proposed ensemble model gives satisfactory accuracy as 99.67% and 99.53% in case of binary class and multiclass NSL-KDD data set respectively.
Nowadays there are several security tools that used to protect computer systems, computer networks, smart devices and etc. against attackers. Intrusion detection system is one of tools used to detect attacks. Intrusion Detection Systems produces large amount of alerts, security experts could not investigate important alerts, also many of that alerts are incorrect or false positives. Alert management systems are set of approaches that used to solve this problem. In this paper a new alert management system is presented. It uses K-nearest neighbor as a core component of the system that classify generated alerts. The suggested system serves precise results against huge amount of generated alerts. Because of low classification time per each alert, the system also could be used in online systems.
Multi Stage Filter Using Enhanced Adaboost for Network Intrusion DetectionIJNSA Journal
Based on the analysis and distribution of network attacks in KDDCup99 dataset and real time traffic, this paper proposes a design of multi stage filter which is an efficient and effective approach in dealing with various categories of attacks in networks. The first stage of the filter is designed using Enhanced Adaboost with Decision tree algorithm to detect the frequent attacks occurs in the network and the second stage of the filter is designed using enhanced Adaboost with Naïve Byes algorithm to detect the moderate attacks occurs in the network. The final stage of the filter is used to detect the infrequent
attack which is designed using the enhanced Adaboost algorithm with Naïve Bayes as a base learner. Performance of this design is tested with the KDDCup99 dataset and is shown to have high detection rate with low false alarm rates.
Anomaly detection by using CFS subset and neural network with WEKA tools Drjabez
This document summarizes a research paper that proposes a new approach for anomaly detection in computer networks using CFS subset selection and neural networks with WEKA tools. The proposed approach uses CFS to select important features and neural networks like MLP, logistic regression and ELM for classification. Experiments on datasets show the proposed approach has lower execution time, higher anomaly detection rates, and lower CPU utilization compared to other machine learning methods. The approach effectively detects different types of attacks in computer networks.
CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)ieijjournal1
This document discusses different classifier selection models for intrusion detection systems. It begins by introducing intrusion detection systems and their importance for network security. It then describes reducing the features of the KDD Cup 99 dataset to improve computational efficiency. Fifteen different classifier algorithms are described, including K-Means, Naive Bayes, Decision Trees, Support Vector Machines, and ensemble methods. Two models are proposed for combining classifier results. Simulation results on the KDD Cup 99 dataset show the true positive rates, false positive rates, correctly classified instances, and training times for each attack category and classifier. The best performing classifiers are identified for different intrusion types.
This document discusses using artificial neural networks for network intrusion detection. Specifically, it proposes a hybrid classification model that uses entropy-based feature selection to reduce the dataset, followed by four neural network techniques (RBFN, SOM, SMO, PART) for classification. It provides details on each neural network technique and the overall methodology, which uses 10-fold cross validation to evaluate performance based on standard criteria. The goal is to build an efficient intrusion detection system with low false alarms and high detection rates.
Intrusion Detection System Using Self Organizing Map AlgorithmsEditor IJCATR
This document presents a study on using self-organizing map (SOM) algorithms for intrusion detection systems. The study explores using SOM, an artificial neural network technique, to map high-dimensional network traffic data onto a 2D space to detect anomalies and network attacks. The authors describe the SOM algorithm and evaluate its performance on detecting different types of attacks in a test data set with an accuracy of 50.07% and a false positive rate of 0.06%. The study demonstrates the potential of SOM for building intrusion detection systems capable of handling complex network traffic data.
Image morphing has been the subject of much attention in recent years. It has proven to be a powerful visual effects tool
in film and television, depicting the fluid transformation of one digital image into another. This paper reviews the growth of this field
and describes recent advances in image morphing in terms of three areas: feature specification, warp generation methods, and
transition control. These areas relate to the ease of use and quality of results. We will describe the role of radial basis functions, thin
plate splines, energy minimization, and multilevel free-form deformations in advancing the state-of-the-art in image morphing. A
comparison of various techniques for morphing one digital image in to another is made. We will compare various morphing techniques
such as Feature based image morphing, Mesh and Thin Plate Splines based image morphing based on different attributes such as
Computational Time, Visual Quality of Morphs obtained and Complexity involved in Selection of features. We will demonstrate the
pros and cons of various techniques so as to allow the user to make an informed decision to suit his particular needs. Recent work on a
generalized framework for morphing among multiple images will be described.
Intrusion Detection System Using Self Organizing Map AlgorithmsEditor IJCATR
With the rapid expansion of computer usage and computer network the security of the computer system has became very
important. Every day new kind of attacks are being faced by industries. Many methods have been proposed for the development of
intrusion detection system using artificial intelligence technique. In this paper we will have a look at an algorithm based on neural
networks that are suitable for Intrusion Detection Systems (IDS). The name of this algorithm is "Self Organizing Maps" (SOM). So
far, many different methods have been used to build a detector that Wide variety of different ways in the covers. Among the methods
used to detect attacks in intrusion detection is done, In this paper we investigate the Self-Organizing Map method.
This document presents a multi-classification approach for detecting network attacks using a layered model. The proposed system consists of two stages - the first stage classifies network records as normal or an attack, while the second stage further classifies any detected attacks into four categories (DoS, Probe, R2L, U2R) using separate layers. Experimental results on the NSL-KDD dataset show the layered approach using the JRip classifier achieved very high classification accuracy of over 99% for each attack category, outperforming existing approaches. The multi-layered model is effective for improving detection of minority attack classes without reducing performance on majority classes.
AN ANN APPROACH FOR NETWORK INTRUSION DETECTION USING ENTROPY BASED FEATURE S...IJNSA Journal
With the increase in Internet users the number of malicious users are also growing day-by-day posing a serious problem in distinguishing between normal and abnormal behavior of users in the network. This has led to the research area of intrusion detection which essentially analyzes the network traffic and tries to determine normal and abnormal patterns of behavior.In this paper, we have analyzed the standard NSL-KDD intrusion dataset using some neural network based techniques for predicting possible intrusions. Four most effective classification methods, namely, Radial Basis Function Network, SelfOrganizing Map, Sequential Minimal Optimization, and Projective Adaptive Resonance Theory have been applied. In order to enhance the performance of the classifiers, three entropy based feature selection methods have been applied as preprocessing of data. Performances of different combinations of classifiers and attribute reduction methods have also been compared.
An Approach of Automatic Data Mining Algorithm for Intrusion Detection and P...IOSR Journals
This document summarizes an approach for using data mining algorithms to detect network intrusions and prevent security threats. It analyzes two datasets - one containing 997 records and another containing 11,438 records - using various classification algorithms in Weka to determine the best performing ones. The algorithms examined include PART, SMO, HyperPipes, Filtered Classifier, Random Forest, Naive Bayes Updateable and KStar. Classification rate and false positive rate are used to evaluate performance. The document also discusses related work on intrusion detection using neural networks, genetic algorithms and other approaches.
This document summarizes a research paper on using support vector machines (SVM) for anomaly detection, specifically for credit card fraud detection. It discusses how SVM is a supervised machine learning technique that can handle large, high-dimensional datasets. The document provides an overview of SVM, comparing it to other techniques like neural networks and clustering. It summarizes the methodology used in the research, which applied SVM to real credit card transaction data. The results showed SVM achieved high accuracy and a low false positive rate for fraud detection. In conclusion, the document states that applying this SVM method could help banks better predict fraudulent credit card transactions.
This document discusses using support vector machines (SVM) for anomaly detection, specifically for credit card fraud detection. It contains the following key points:
1. SVM is a supervised machine learning technique that can be used for anomaly detection tasks like fraud detection. It works by mapping data to a higher dimensional space to find a hyperplane that maximizes separation between classes.
2. The document evaluates SVM for credit card fraud detection using real transaction data and compares it to other techniques like neural networks and clustering. SVM achieved higher accuracy and lower false positive rates than these other methods.
3. A theoretical comparison found that SVM requires no parameter tuning, works well in high dimensions, and has lower computational cost than neural networks and
This document summarizes a research paper that examines pricing strategy in a two-stage supply chain consisting of a supplier and retailer. The supplier offers a credit period to the retailer, who then offers credit to customers. A mathematical model is formulated to maximize total profit for the integrated supply chain system. The model considers three cases based on the relative lengths of the credit periods offered at each stage. Equations are developed to represent the profit functions for the supplier, retailer and overall system in each case. The goal is to determine the optimal selling price that maximizes total integrated profit.
The document discusses melanoma skin cancer detection using a computer-aided diagnosis system based on dermoscopic images. It begins with an introduction to skin cancer and melanoma. It then reviews existing literature on automated melanoma detection systems that use techniques like image preprocessing, segmentation, feature extraction and classification. Features extracted in other studies include asymmetry, border irregularity, color, diameter and texture-based features. The proposed system collects dermoscopic images and performs preprocessing, segmentation, extracts 9 features based on the ABCD rule, and classifies images using a neural network classifier to detect melanoma. It aims to develop an automated diagnosis system to eliminate invasive biopsy procedures.
This document summarizes various techniques for image segmentation that have been studied and proposed in previous research. It discusses edge-based, threshold-based, region-based, clustering-based, and other common segmentation methods. It also reviews applications of segmentation in medical imaging, plant disease detection, and other fields. While no single technique can segment all images perfectly, hybrid and adaptive methods combining multiple approaches may provide better results. Overall, image segmentation remains an important but challenging task in digital image processing and computer vision.
This document presents a test for detecting a single upper outlier in a sample from a Johnson SB distribution when the parameters of the distribution are unknown. The test statistic proposed is based on maximum likelihood estimates of the four parameters (location, scale, and two shape) of the Johnson SB distribution. Critical values of the test statistic are obtained through simulation for different sample sizes. The performance of the test is investigated through simulation, showing it performs well at detecting outliers when the contaminant observation represents a large shift from the original distribution parameters. An example application to census data is also provided.
This document summarizes a research paper that proposes a portable device called the "Disha Device" to improve women's safety. The device has features like live location tracking, audio/video recording, automatic messaging to emergency contacts, a buzzer, flashlight, and pepper spray. It is designed using an Arduino microcontroller connected to GPS and GSM modules. When the button is pressed, it sends an alert message with the woman's location, sets off an alarm, activates the flashlight and pepper spray for self-defense. The goal is to provide women a compact, one-click safety system to help them escape dangerous situations or call for help with just a single press of a button.
- The document describes a study that constructed physical fitness norms for female students attending social welfare schools in Andhra Pradesh, India.
- Researchers tested 339 students in classes 6-10 on speed, strength, agility and flexibility tests. Tests included 50m run, bend and reach, medicine ball throw, broad jump, shuttle run, and vertical jump.
- The results showed that 9th class students had the best average time for the 50m run. 10th class students had the highest flexibility on average. Strength and performance generally improved with increased class level.
This document summarizes research on downdraft gasification of biomass. It discusses how downdraft gasifiers effectively convert solid biomass into a combustible producer gas. The gasification process involves pyrolysis and reactions between hot char and gases that produce CO, H2, and CH4. Downdraft gasifiers are well-suited for biomass gasification due to their simple design and ability to manage the gasification process with low tar production. The document also reviews previous studies on gasifier configuration upgrades and their impact on performance, and the principles of downdraft gasifier operation.
This document summarizes the design and manufacturing of a twin spindle drilling attachment. Key points:
- The attachment allows a drilling machine to simultaneously drill two holes in a single setting, improving productivity over a single spindle setup.
- It uses a sun and planet gear arrangement to transmit power from the main spindle to two drilling spindles.
- Components like gears, shafts, and housing were designed using Creo software and manufactured. Drill chucks, bearings, and bits were purchased.
- The attachment was assembled and installed on a vertical drilling machine. It is aimed at improving productivity in mass production applications by combining two drilling operations into one setup.
The document presents a comparative study of different gantry girder profiles for various crane capacities and gantry spans. Bending moments, shear forces, and section properties are calculated and tabulated for 'I'-section with top and bottom plates, symmetrical plate girder, 'I'-section with 'C'-section top flange, plate girder with rolled 'C'-section top flange, and unsymmetrical plate girder sections. Graphs of steel weight required per meter length are presented. The 'I'-section with 'C'-section top flange profile is found to be optimized for biaxial bending but rolled sections may not be available for all spans.
This document summarizes research on analyzing the first ply failure of laminated composite skew plates under concentrated load using finite element analysis. It first describes how a finite element model was developed using shell elements to analyze skew plates of varying skew angles, laminations, and boundary conditions. Three failure criteria (maximum stress, maximum strain, Tsai-Wu) were used to evaluate first ply failure loads. The minimum load from the criteria was taken as the governing failure load. The research aims to determine the effects of various parameters on first ply failure loads and validate the numerical approach through benchmark problems.
This document summarizes a study that investigated the larvicidal effects of Aegle marmelos (bael tree) leaf extracts on Aedes aegypti mosquitoes. Specifically, it assessed the efficacy of methanol extracts from A. marmelos leaves in killing A. aegypti larvae (at the third instar stage) and altering their midgut proteins. The study found that the leaf extract achieved 50% larval mortality (LC50) at a concentration of 49 ppm. Proteomic analysis of larval midguts revealed changes in protein expression levels after exposure to the extract, suggesting its bioactive compounds can disrupt the midgut. The aim is to identify specific inhibitor proteins in the midg
This document presents a system for classifying electrocardiogram (ECG) signals using a convolutional neural network (CNN). The system first preprocesses raw ECG data by removing noise and segmenting the signals. It then uses a CNN to extract features directly from the ECG data and classify arrhythmias without requiring complex feature engineering. The CNN architecture contains 11 convolutional layers and is optimized using techniques like batch normalization and dropout. The system was tested on ECG datasets and achieved classification accuracy of over 93%, demonstrating its effectiveness at automated ECG classification.
This document presents a new algorithm for extracting and summarizing news from online newspapers. The algorithm first extracts news related to the topic using keyword matching. It then distinguishes different types of news about the same topic. A term frequency-based summarization method is used to generate summaries. Sentences are scored based on term frequency and the highest scoring sentences are selected for the summary. The algorithm was evaluated on news datasets from various newspapers and showed good performance in intrinsic evaluation metrics like precision, recall and F-score. Thus, the proposed method can effectively extract and summarize online news for a given keyword or topic.
1. International Journal of Research in Advent Technology, Vol.2, No.2, April 2014
E-ISSN: 2321-9637
101
A Comparative Performance Evaluation of Machine
Learning-Based NIDS on Benchmark Datasets
Dharmaraj R.Patil1, Tareek M.Pattewar2
Department of Computer Engineering, R.C.Patel Institute of Technology, Shirpur, M.S., India.
Email: dharmaraj.rcpit@gmail.com1
Department of Information Technology, R.C.Patel Institute of Technology, Shirpur, M.S., India.
Email: tareekpattewar@gmail.com2
Abstract- As network-based computer systems play increasingly vital roles in modern society, they have
become the targets of malicious activities, which both industry and research community have brought more
emphasis on solving network intrusion detection problems. Machine learning algorithms have proved to be an
important tool in network intrusion detection problems. In this paper we have presented an application of
AdaBoost-based machine learning algorithm in network intrusion detection. Network intrusion detection deals
with the classification problem and AdaBoost-based algorithm have good classification accuracy. As well as
AdaBoost-based algorithm have high detection rate and low false-alarm rate. This algorithm combines the weak
classifiers for continuous features and weak classifiers for categorical features into a strong classifier. We have
developed the AdaBoost-based NIDS and tested the system on KDDCup’99 and NSL-KDD intrusion detection
datasets. A comparative performance evaluation of the NIDS on both the datasets are shown. The experimental
results show that AdaBoost-based NIDS performance on NSL-KDD dataset is very good as compare to
KDDCup’99 dataset.
Index Terms- Machine Learning, Network Intrusion Detection, AdaBoost Algorithm, Detection Rate,
False-alarm Rate.
1. INTRODUCTION
An intrusion is somebody (“hacker” or “cracker”)
attempting to break into or misuse your system. The
word “misuse” is broad and can reflect something
severe as stealing confidential data to something
minor such as misusing your email system for spam.
An “Intrusion Detection System (IDS)” is a system
for detecting such intrusions. There are two types of
intrusion detection systems namely Host-based
systems base their decisions on the information
obtained from a single host and Network-based
intrusion detection systems obtain data by monitoring
the traffic in the network to which the hosts are
connected [1].
1.1 Host-based Intrusion Detection Systems
Host-based IDS’s are installed on the host they are
intended to monitor. The host can be a server,
workstation or any networked device. HIDS’s install
as a service or daemon or they modify the underlying
operating systems kernel or application to gain first
inspection authority. While a HIDS may include the
ability to sniff network traffic intended for the
monitored host. Application attacks can include
memory modifications, maliciously crafted
application requests, buffer overflows or file-modification
attempts. A HIDS can inspect each
incoming command, looking for signs or
maliciousness or simply track unauthorized file
changes.
1.2 Network-based Intrusion Detection Systems
Network-based IDS’s are work by capturing and
analyzing network packets speeding by on the wire.
Unlike, HIDS NIDS are designed to protect more than
one host. They can protect a group of computer hosts,
like a server farm, or monitor an entire network.
Captured traffic is compared against protocol
specifications and normal traffic trends or the packets
payload data is examined for malicious content. If a
security threat is noted, the event is logged and an
alert is generated.
1.3 Features of Network Intrusion Detection System
Some of the important features of a Network
Intrusion Detection System are as follows [1],
· It should be fault tolerant and run
continuously with minimal human
supervision.
· A Network Intrusion Detection System must
be able to recover from the crashes, either
accidental or caused by malicious activity.
· A Network Intrusion Detection System must
be able to detect any modifications forced on
the IDS by an attacker.
· It should impose minimal overhead on the
system.
2. International Journal of Research in Advent Technology, Vol.2, No.2, April 2014
E-ISSN: 2321-9637
102
· It should be configurable so as to accurately
implement the security policies of the
system.
· It should be easy to use by the operator.
· It should be capable to detect different types
of attacks and must not recognize and
legitimate activity as an attack.
2. MATERIALS AND METHODS
2.1 Boosting
Boosting is general method for improving the
accuracy of any given learning algorithm. Boosting
refers to a general and provably effective method of
producing a very accurate prediction rule by
combining rough and moderately inaccurate rules.
Boosting has its roots in a theoretical framework for
studying machine learning called the “PAC” learning
model [8]. With the help of boosting a “weak”
learning algorithm can be “boosted” into an arbitrarily
accurate “strong” learning algorithm. Here decision
stumps are used as weak learning learners. They can
be combined into a strong learning algorithm for
better classification accuracy [2].
2.2 Introduction to AdaBoost Algorithm
The AdaBoost algorithm, introduced in 1995 by
Freund and Schapire [9], solved many of the practical
difficulties of the earlier boosting algorithms. The
algorithm takes as input a training set (x1, y1)… (xm,
ym) where xi belongs to some domain or instance of
space X, and each label yi is in some label set
Y.Assume Y={-1,+1}. AdaBoost calls a given weak
or base learning algorithm, here decision stump
repeatedly in a series of rounds t=1… T.One of the
main ideas of the algorithm is to maintain a
distribution or set of weights over the training set. The
weights of this distribution on training example i on
round t is denoted Dt(i).Initially all weights are set
equally, but on each round the weights of incorrectly
classified examples are increased so that the weak
learner is forced to focus on the hard examples in the
training set. The weak learner’s job is to find a weak
hypothesis [2],
ht: X {-1, +1} appropriate for the distribution Dt.The
goodness of a weak classifier is measured by its error,
(1)
AdaBoost works by combining several “votes”.
Instead of using support vectors, AdaBoost uses weak
learners.
Fig.1: Neither h1 nor h2 is a perfect learner; AdaBoost combines
them to obtain a “good” learner
Figure illustrates how AdaBoost combines two
learners, h1 and h2. It initially chooses the learner that
classifies more data correctly. In the next step, the
data is re-weighted to increase the “importance” of
misclassified samples. This process continues and at
each step the weight of each weak learner among
other learners is determined.
2.3 Introduction to Weak Classifiers
Here decision stumps are used as weak classifiers.
A decision stump is a decision tress with a root node
and two leaf nodes. For each feature in the input data,
a decision stump is constructed [3].The decision
stumps for categorical features and decision stumps
for continuous features are given as follows,
2.3.1 Decision stumps for categorical features
A categorical feature f can only take finite discrete
values. A decision stump corresponds to a partition of
the range of f into two no overlapping subsets and
.Let X be the feature vector, and Xf be the
component of X, which corresponds to feature f. Then,
the decision stump corresponding to and is
described as follows [3],
(2)
Let and ε−hf denote the false-classification rates
of the decision stump hf for normal and attack
samples, respectively. The optimal subsets and
that correspond to the optimal decision stump ĥf are
determined by minimizing the sum of the false
classification rates for the normal and attack samples
( ) =arg min
( ) . (3)
2.3.2 Decision stumps for continuous features
For a continuous feature f, given a segmentation
value θ, a decision stump hf can be constructed as [3],
(4)
Where Xf denotes the component of feature vector X,
which corresponds to feature f.
2.4 Working of the Algorithm
The algorithm works as follows [3],
3. International Journal of Research in Advent Technology, Vol.2, No.2, April 2014
E-ISSN: 2321-9637
103
1) Initialize weights wi (1) (i = 1. . . n) satisfying
2) Observe the following for (t = 1. . . T).
a) Let εj be the sum of the weighted classification
errors for the weak classifier hj
εj= (5)
where,
I[γ] = (6)
Choose, from constructed weak classifiers, the weak
classifier h (t) that minimizes the sum of the weighted
classification errors
h (t) = arg min εj (7)
hj H
b) Calculate the sum of the weighted classification
errors ε (t) for the chosen weak classifier h (t).
c) Let
α (t) =½ (8)
d) Update the weights by
wi (t + 1) = (9)
where Z (t) is a normalization factor,
Z (t) = (10)
3) The strong classifier is defined by
H(x) = sign (11)
3. ARCHITECTURE OF NIDS USING
ADABOOST-BASED ALGORITHM
Considering the characteristics of the AdaBoost
algorithm and characteristics of intrusion detection
system, the model of the system consists of four parts:
feature extraction, data labeling, and design of weak
classifiers and construction of the strong classifier as
shown in the figure 2 [3].
Fig. 2. Architecture of NIDS using AdaBoost algorithm.
3.1 Feature Extraction
For each network connection, contains 41 features
and can be classified into three groups,
3.1.1 Basic features
This category encapsulates all the attributes that
can be extracted from a TCP/IP connection.
3.1.2. Traffic Features
This category includes features that are computed
with respect to a window interval and is divided into
two groups,
a.”Same Host” features
These features examine only the connections in
the past 2 seconds that have same destination host as
the current connection, and calculate statistics related
to protocol behavior, service etc.
b.”Same Service” features
These features examine only the connections in
the past 2 seconds that have the same service as the
current connection.
3.1.3 Content Features
Unlike most of the DoS and probing attacks, the
R2L and U2R attacks don’t have any intrusion
patterns. This is because the DoS and probing attacks
involves many connections to the same host in a very
short period of time, however the R2L and U2R
attacks are embedded in the data portions of the
packets and normally involved only a single
connection. To detect these kind of attacks, we need
some features to be able to look for suspicious
behavior in the data portion. These features are called
content features.
3.2 Data Labeling
The AdaBoost algorithm labels a set of data as
either normal or an attack. The normal data samples
are labeled as “+1” and attack data samples are
labeled as “-1”.
3.3 Design of Weak Classifiers
For classification of the intrusive data, the
AdaBoost algorithm requires a group of weak
classifiers. The weak classifier’s classification
accuracy is relatively low.
3.4 Construction of Strong Classifier
In AdaBoost algorithm a strong classifier is
constructed by combining the weak classifiers. The
strong classifier has high classification accuracy than
each weak classifier. The strong classifier is then
trained using training sample data. Then a test data
sample is input to the strong classifier to test it as a
“normal” or “attack” sample.
4. KDDCUP’99 AND NSL-KDD DATASETS
4. 1 KDD Cup’99 Dataset
This data set was derived from the 1998 DARPA
Intrusion Detection Evaluation Program held by MIT
Lincoln Labs. The dataset was created and simulated
in a military network environment in which a typical
U.S. Air Force LAN was subjected to simulated
attacks. Raw TCP/IP dump data was gathered. The
data is approximately 4 GB of compressed TCP dump
data which took 7 weeks of network traffic and
comprised about 5 million connection records. For
each TCP/IP connection 41 various quantitative and
qualitative features were extracted.KDD dataset is
divided into training and testing records sets. The
4. International Journal of Research in Advent Technology, Vol.2, No.2, April 2014
E-ISSN: 2321-9637
104
attacks include the four most common categories of
attacks [4], [5] given as follows,
4.1. 1 Denial of Service Attacks (Dos)
It is an attack in which the attacker makes some
computing or memory resource to busy or too full to
handle legitimate requests, or denies legitimate users
access to a machine. e.g. back, Neptune, land etc.
4.1.2 User to Root Attack (U2R)
It is a class of exploit in which the attacker starts
out with access to a normal user account on the
system and is able to exploit some vulnerability to
gain root access to the system. e.g. loadmodule, perl,
ps etc.
4.1.3 Remote to Login Attack (R2L)
This attack occurs when an attacker who has the
ability to send packets to a machine over a network
but who does not have an account on that machine
exploits some vulnerability to gain local access as a
user of that machine. e.g. ftpwrite, httptunnel, imap
etc.
4.1.4 Probing Attack
It is an attempt to gather information about a
network of computers for the apparent purpose of
circumventing its security controls. e.g. nmap, satan,
mscan etc.
4.2 NSL-KDD Dataset
NSL-KDD is a dataset suggested to solve some of
the inherent problems of the KDD’99 dataset [5],
[6].The NSL-KDD dataset has the following
advantages over the original KDD’99 dataset.
i) It is not include redundant records in the training
set, so the classifiers will not be biased towards more
frequent records.
ii) There are no duplicate records in the proposed test
sets, therefore the performance of the learners are not
biased by the methods which have better detection
rates on the frequent records.
iii) The number of selected records from each
difficulty level group is inversely proportional to the
percentage of the records in the original KDD dataset.
As a result the classification rates of distinct machine
learning methods vary in a wider range, which makes
it more efficient to have an accurate evaluation of
different learning techniques.
iv) The number of records in the training and testing
sets are reasonable, which makes it affordable to run
the experiments on the complete set without the need
to randomly select a small portion.
v) Statistical observations one of the most important
deficiencies in the KDD dataset is the huge number of
redundant records, which causes the learning
algorithms to be biased towards the frequent records
and thus prevent them from learning unfrequent
records which are usually more harmful to networks
such as U2R and R2L attacks.
Table I
Statistics of redundant records in the KDD Training Dataset [5]
Original
Records
Distinct
Records
Reduction
Rate
Attacks 3,925,650 262,178 93.32%
Normal 972,781 812,814 16.44%
Total 4,898,431 1,074,992 78.05%
Table II
Statistics of redundant records in the KDD testing Dataset [5]
Original
Records
Distinct
Records
Reduction
Rate
Attacks 250,436 29,378 88.26%
Normal 60,591 47,911 20.92%
Total 311,027 77,289 75.15%
Table I and Table II shows the statistics of the
redundant records in the KDD Cup’99 training and
testing datasets
5. EXPERIMENTAL ANALYSIS
The system is developed on Pentium IV Computer
with 2.6 GHz and 1 GB RAM, using JDK 1.6. We
utilize the KDDCup’99 and NSL-KDD datasets [4],
[6] to test the Network Intrusion Detection System
Using AdaBoost-based machine learning algorithm.
We have taken 10% training and testing data from the
KDDCup’99 data to test the system. The results are
given in figure 4.We have taken 20 % of the NSL-KDD
training dataset as input to train the system and
NSL-KDD testing dataset to test the system. The
results are given in figure 5.The comparative
performance of other learning algorithms with
AdaBoost algorithm on KDDCup’99 and NSL-KDD
datasets are given in figure 4 and figure 5. Figure 6
illustrates the comparative performance of the system
on KDDCup’99 and NSL-KDD datasets. Two indices
are commonly used to judge the accuracy of a
network intrusion detection system. One is detection
rate (DR) [10],
DR = (12)
And the other is false alarm rate:
False Alarm Rate=1- (13)
5. International Journal of Research in Advent Technology, Vol.2, No.2, April 2014
E-ISSN: 2321-9637
105
Fig. 3. Detection results of other learning algorithms with
AdaBoost-based NIDS on the KDDCup’99 test data.
Fig. 4. Detection results of other learning algorithms with
AdaBoost-based NIDS on the NSL-KDD test data.
Fig. 5. Classification performance of AdaBoost –based NIDS on
KDDCup’99 and NSL-KDD test data.
Fig. 6. False-Alarm rates of the system on KDDCup’99 and NSL-KDD
datasets.
The above figures show the classification accuracy of
various learning algorithms and the AdaBoost-based
NIDS on KDDCup’99 and NSL-KDD test dataset.
The initial classification results of AdaBoost-based
NIDS on KDDCup’99 and NSL-KDD test dataset are
86.27% and 90.00% respectively, which are
considerable with other learning algorithms. Figure 7
shows the classification performance of AdaBoost-based
NIDS over KDDCup’99 and NSL-KDD test
dataset .Due to the redundant records in the
KDDCup’99 the system performance is biased and it
gives only 86.27% of detection rate, whereas the
performance of the system is increased to 90.00%
detection rate on the NSL-KDD dataset. Hence NSL-KDD
is the good benchmark dataset to test the
intrusion detection systems which gives the real
performance of the systems without biasing.
6. JUSTIFICATION DIFFERENCE
1. The performance of AdaBoost-based NIDS is
improved over the NSL-KDD dataset. The detection
rate of the system over the KDDCup’99 dataset is
86.27%, which is improved to 90.00% over the NSL-KDD
dataset, which proves that NSL-KDD dataset is
a good benchmark dataset to test the intrusion
detection systems.
2. The false-alarm rate of the system is also improved
over the NSL-KDD dataset. The false-alarm rate over
the KDDCup’99 is 3.71 %, which is decreased to
3.38% over the NSL-KDD dataset as shown in figure
6.
3. The number of records in the training and testing
sets of NSL-KDD dataset are reasonable i.e. 125973
records in the training set and 22544 records in the
testing set, which makes it affordable to run the
system on the complete set without the need to
randomly select a small portion.
7. CONCLUSION
This paper deals with the initial design of the Network
Intrusion Detection System based on AdaBoost-based
machine learning algorithm. We have developed this
system using Java and the KDDCup’99 and NSL-KDD
intrusion dataset. Our initial experimental
results are considerable as compare with other
learning algorithms evaluated on the KDDCup’99 and
NSL-KDD test dataset. The system shows the better
results of detection rate and false-alarm rate on the
NSL-KDD dataset as compare to the KDDCup’99
dataset.
REFERENCES
[1] S Chebrolu, A. Abraham and J.P.
Thomos,“Feature deduction and Ensemble
design of Intrusion Detection Systems”,
Computer Security, Vol.24., Sept.2004.
[2] Y. Freund, R. E. Schapire, “A short Introduction
to Boosting”, Journal of Japanese Society for
Artificial Intelligence, Sept.1999
6. International Journal of Research in Advent Technology, Vol.2, No.2, April 2014
E-ISSN: 2321-9637
106
[3] Weiming Hu, Wei Hu, “AdaBoost-Based
Algorithm for Network Intrusion Detection”,
IEEE Transactions on Systems, Man and
Cybernetics-Part B, Cybernetics- Vol.38, April
2008.
[4]KDDCup 1999 Data,
http://www.kdd.ics.uci.edu/databases/kddcup9
9 /kddup99.html, 1999.
[5] M. Tavallaee, E. Bagheri, W. Lu, and A.
Ghorbani, “A Detailed Analysis of the KDD
CUP 99 Data Set”, Second IEEE Symposium
on Computational Intelligence for Security and
Defense Applications (CISDA), 2009.
[6] “Nsl-kdd data set for network-based intrusion
detection systems.” Available on:
http://nsl.cs.unb.ca/NSL-KDD/, March 2009.
[7] Elkan, Charles, “Results of the KDD’99
classifier learning”, SIGKDD Explorating, 2000.
[8] L. G. Valiant, “A theory of the learnable”,
Communications of the ACM, 27(11):1134–
1142, November 1984.
[9] Yoav Freund and Robert E. Schapire, “A decision
theoretic generalization of on-line learning and
an application to boosting”, Journal of Computer
and System Sciences, 55(1):119–139, August
1997.
[10] W. Hu and W. M. Hu, “HIGCALS: a hierarchical
Graph-theoretic clustering active learning
System”, in Proc. IEEE Int. Conf. Syst., Man,
Cybern, 2006, vol. 5, pp. 3895–3900.
[11] Dorothy E.Denning,”An, “An Intrusion-
Detection Model”, IEEE Transactions on
Software Engineering, Volume SE-13, No. 2,
February 1987, pp. 222-232.
[12] G. Vigna and R. A. Kemmerer, “NetSTAT: A
network-based intrusion detection approach”, in
Proceedings of Computer Security Applications
Conference, December. 1998, pp. 25– 34.
[13] W. Lee, S. J. Stolfo and K. Mok, “A data mining
Framework for building intrusion detection
Models”, in Proceedings of IEEE Symposium
on Security and Privacy, May 1999, pp. 120–
132.
[14] K. Sequeira and M. Zaki, “Admit: Anomaly-
Based Data Mining for Intrusions”, in
Proceedings of the eighth ACM SIGKDD
International Conference on Knowledge
Discovery and Data Mining, ACM Press, 2002,
pp. 386.395.