Previous work introduced the idea of grouping alerts at a Hamming distance of 1 to achieve lossless alert aggregation; such aggregated meta-alerts were shown to increase alert interpretability. However, a mean
of 84023 daily Snort alerts were reduced to a still formidable 14099 meta-alerts. In this work, we address
this limitation by investigating several approaches that all contribute towards reducing the burden on the
analyst and providing timely analysis. We explore minimizing the number of both alerts and data elements
by aggregating at Hamming distances greater than 1. We show how increasing bin sizes can improve
aggregation rates. And we provide a new aggregation algorithm that operates up to an order of magnitude
faster at Hamming distance 1. Lastly, we demonstrate the broad applicability of this approach through
empirical analysis of Windows security alerts, Snort alerts, netflow records, and DNS logs. The result is a
reduction in the cognitive load on analysts by minimizing the overall number of alerts and the number of
data elements that need to be reviewed in order for an analyst to evaluate the set of original alerts.
Detection of Outliers in Large Dataset using Distributed ApproachEditor IJMTER
In this paper, a distributed method is introduced for detecting distance-based outliers in very large
data sets. The approach is based on the concept of outlier detection solving set, which is a small subset of the data
set that can be also employed for predicting novel outliers. The method exploits parallel computation in order to
obtain vast time savings. Indeed, beyond preserving the correctness of the result, the proposed schema exhibits
excellent performances. From the theoretical point of view, for common settings, the temporal cost of our
algorithm is expected to be at least three orders of magnitude faster than the classical nested-loop like approach to
detect outliers. Experimental results show that the algorithm is efficient and that it’s running time scales quite well
for an increasing number of nodes. We discuss also a variant of the basic strategy which reduces the amount of
data to be transferred in order to improve both the communication cost and the overall runtime. Importantly, the
solving set computed in a distributed environment has the same quality as that produced by the corresponding
centralized method.
High performance intrusion detection using modified k mean & naïve bayeseSAT Journals
Abstract
Internet Technology is growing at exponential rate day by day, making data security of computer systems more complex and critical. There has been multiple methodology implemented for the same in recent time as detailed in [1], [3]. Availability of larger bandwidth has made the multiple large computer server network connected worldwide and thus increasing the load on the necessity to secure data and Intrusion detection system (IDS) is one of the most efficient technique to maintain security of computer system. The proposed system is designed in such a way that are helpful in identifying malicious behavior and improper use of computer system. In this report we proposed a hybrid technique for intrusion detection using data mining algorithms. Our main objective is to do complete analysis of intrusion detection Dataset to test the implemented system.In This report we will propose a new methodology in which Modified k-mean is used for clustering whereas Naïve Bayes for the classification. These two data mining techniques will be used for Intrusion detection in large horizontally distributed database.
Keywords: Intrusion Detection, Modified K-Mean, Naïve Bays
Implementation of digital image watermarking techniques using dwt and dwt svd...eSAT Journals
Abstract
These days, in every field there is gigantic utilization of computerized substance. Data took care of on web and mixed media system framework is in advanced structure. Computerized watermarking is only the innovation in which there is inserting of different data in advanced substance, which we need to shield from illicit replicating. Computerized picture watermarking is concealing data in any structure (content, picture, sound and video) in unique picture without corrupting its perceptual quality. On the off chance that of Discrete Wavelet Transform (DWT), deterioration of the first picture is completed to insert the watermark. Moreover, if there should arise an occurrence of cross breed system (DWT-SVD) firstly picture is decayed by and after that watermark is installed in solitary qualities acquired by application of Singular Value Decomposition (SVD). DWT and SVD are utilized in combination to enhance the nature of watermarking. We have the procedures which are looked at on the premise of Peak Signal to Noise Ratio (PSNR) esteem at various benefits of scaling component; high estimation of PSNR is coveted because it displays great intangibility of the strategy.
Novel Methodology of Data Management in Ad Hoc Network Formulated using Nanos...Drjabez
In Ad hoc Network of Nanosensors for Wastage detection, clustering assist in nodal communication and in organization of the data fetched by the nanosensors in the network. The attempt of traditional cluster formation techniques degraded the formation of cluster in a precise manner. The data from the nanosensors which act as the nodes of the network have to be distinctively added into the clusters. The dynamic path selection cluster would achieve this distinct addition by dynamically creating a path to the data as an initial process and then redirecting the data to their appropriate cluster based to the readied scheme.
Detection of Outliers in Large Dataset using Distributed ApproachEditor IJMTER
In this paper, a distributed method is introduced for detecting distance-based outliers in very large
data sets. The approach is based on the concept of outlier detection solving set, which is a small subset of the data
set that can be also employed for predicting novel outliers. The method exploits parallel computation in order to
obtain vast time savings. Indeed, beyond preserving the correctness of the result, the proposed schema exhibits
excellent performances. From the theoretical point of view, for common settings, the temporal cost of our
algorithm is expected to be at least three orders of magnitude faster than the classical nested-loop like approach to
detect outliers. Experimental results show that the algorithm is efficient and that it’s running time scales quite well
for an increasing number of nodes. We discuss also a variant of the basic strategy which reduces the amount of
data to be transferred in order to improve both the communication cost and the overall runtime. Importantly, the
solving set computed in a distributed environment has the same quality as that produced by the corresponding
centralized method.
High performance intrusion detection using modified k mean & naïve bayeseSAT Journals
Abstract
Internet Technology is growing at exponential rate day by day, making data security of computer systems more complex and critical. There has been multiple methodology implemented for the same in recent time as detailed in [1], [3]. Availability of larger bandwidth has made the multiple large computer server network connected worldwide and thus increasing the load on the necessity to secure data and Intrusion detection system (IDS) is one of the most efficient technique to maintain security of computer system. The proposed system is designed in such a way that are helpful in identifying malicious behavior and improper use of computer system. In this report we proposed a hybrid technique for intrusion detection using data mining algorithms. Our main objective is to do complete analysis of intrusion detection Dataset to test the implemented system.In This report we will propose a new methodology in which Modified k-mean is used for clustering whereas Naïve Bayes for the classification. These two data mining techniques will be used for Intrusion detection in large horizontally distributed database.
Keywords: Intrusion Detection, Modified K-Mean, Naïve Bays
Implementation of digital image watermarking techniques using dwt and dwt svd...eSAT Journals
Abstract
These days, in every field there is gigantic utilization of computerized substance. Data took care of on web and mixed media system framework is in advanced structure. Computerized watermarking is only the innovation in which there is inserting of different data in advanced substance, which we need to shield from illicit replicating. Computerized picture watermarking is concealing data in any structure (content, picture, sound and video) in unique picture without corrupting its perceptual quality. On the off chance that of Discrete Wavelet Transform (DWT), deterioration of the first picture is completed to insert the watermark. Moreover, if there should arise an occurrence of cross breed system (DWT-SVD) firstly picture is decayed by and after that watermark is installed in solitary qualities acquired by application of Singular Value Decomposition (SVD). DWT and SVD are utilized in combination to enhance the nature of watermarking. We have the procedures which are looked at on the premise of Peak Signal to Noise Ratio (PSNR) esteem at various benefits of scaling component; high estimation of PSNR is coveted because it displays great intangibility of the strategy.
Novel Methodology of Data Management in Ad Hoc Network Formulated using Nanos...Drjabez
In Ad hoc Network of Nanosensors for Wastage detection, clustering assist in nodal communication and in organization of the data fetched by the nanosensors in the network. The attempt of traditional cluster formation techniques degraded the formation of cluster in a precise manner. The data from the nanosensors which act as the nodes of the network have to be distinctively added into the clusters. The dynamic path selection cluster would achieve this distinct addition by dynamically creating a path to the data as an initial process and then redirecting the data to their appropriate cluster based to the readied scheme.
A COST EFFECTIVE COMPRESSIVE DATA AGGREGATION TECHNIQUE FOR WIRELESS SENSOR N...ijasuc
In wireless sensor network (WSN) there are two main problems in employing conventional compression
techniques. The compression performance depends on the organization of the routes for a larger extent.
The efficiency of an in-network data compression scheme is not solely determined by the compression
ratio, but also depends on the computational and communication overheads. In Compressive Data
Aggregation technique, data is gathered at some intermediate node where its size is reduced by applying
compression technique without losing any information of complete data. In our previous work, we have
developed an adaptive traffic aware aggregation technique in which the aggregation technique can be
changed into structured and structure-free adaptively, depending on the load status of the traffic. In this
paper, as an extension to our previous work, we provide a cost effective compressive data gathering
technique to enhance the traffic load, by using structured data aggregation scheme. We also design a
technique that effectively reduces the computation and communication costs involved in the compressive
data gathering process. The use of compressive data gathering process provides a compressed sensor
reading to reduce global data traffic and distributes energy consumption evenly to prolong the network
lifetime. By simulation results, we show that our proposed technique improves the delivery ratio while
reducing the energy and delay
Data Accuracy Models under Spatio - Temporal Correlation with Adaptive Strate...IDES Editor
Wireless sensor nodes continuously observe and
sense statistical data from the physical environment. But what
degree of accurate data sensed by the sensor nodes
collaboratively is a big issue for wireless sensor networks.
Hence in this paper, we describe accuracy models of sensor
networks for collecting accurate data from the physical
environment under two conditions. First condition: we propose
accuracy model which requires a priori knowledge of statistical
data of the physical environment called Estimated Data
Accuracy (EDA) model. Simulation results shows that EDA
model can sense more accurate data from the physical
environment than the other information accuracy models in
the network. Moreover using EDA model, there exist an
optimal set of sensor nodes which are adequate to perform
approximately the same data accuracy level achieve by the
network. Finally we simulate EDA model under the thread of
malicious attacks in the network due to extreme physical
environment. Second condition: we propose another accuracy
model using Steepest Decent method called Adaptive Data
Accuracy (ADA) model which doesn’t require any a priori
information of input signal statistics. We also show that using
ADA model, there exist an optimal set of sensor nodes which
measures accurate data and are sufficient to perform the same
data accuracy level achieve by the network. Further in ADA
model, we can reduce the amount of data transmission for
these optimal set of sensor nodes using a model called Spatio-
Temporal Data Prediction (STDP) model. STDP model
captures the spatial and temporal correlation of sensing data
to reduce the communication overhead under data reduction
strategies. Finally using STDP model, we illustrate a
mechanism to trace the malicious nodes in the network under
extreme physical environment. Computer simulations
illustrate the performance of EDA, ADA and STDP models
respectively.
An Analytical Model of Latency and Data Aggregation Tradeoff in Cluster Based...ijsrd.com
Sensor networks are collection of sensor nodes which co-operatively send sensed data to base station. As sensor nodes are battery driven, an efficient utilization of power is essential in order to use networks for long duration. Therefore it is needed to reduce data traffic inside sensor networks, thereby reducing the amount of data that is needed to send to base station. The main goal of data aggregation algorithms is to gather and aggregate data in an energy efficient manner so that network lifetime is enhanced. In wireless sensor network, periodic data sampling leads to enormous collection of raw facts, the transmission of which would rapidly deplete the sensor power. A fundamental challenge in the design of wireless sensor networks (WSNs) is to maximize their lifetimes. Data aggregation has emerged as a basic approach in WSNs in order to reduce the number of transmissions of sensor nodes, and hence minimizing the overall power consumption in the network. Data aggregation is affected by several factors, such as the placement of aggregation points, the aggregation function, and the density of sensors in the network. In this paper, an analytical model of wireless sensor network is developed and performance is analyzed for varying degree of aggregation and latency parameters. The overall performance of our proposed methods is evaluated using MATLAB simulator in terms of aggregation cycles, average packet drops, transmission cost and network lifetime. Finally, simulation results establish the validity and efficiency of the approach.
TWO LEVEL DATA FUSION MODEL FOR DATA MINIMIZATION AND EVENT DETECTION IN PERI...pijans
Periodic Wireless Sensor Networks used for many applications such continuous Weather Monitoring, Cattle Monitoring, Water Quality Monitoring and Event Detection Applications. Due to continuous Monitoring, Large Amount of redundant data is transferred over the Network, which causes depletion of energy resources. Data Fusion is used with the Aim of deleting redundant, inconsistent data and provides more accurate data to the Sink. However, redundant data improves the data quality. Therefore, Two Level Data Fusion Model is proposed in this paper to transfer minimized data with the ability of accurately
determines the event with minimum Delay. Data Fusion Model employs Cluster based Data Fusion at Two Levels; First Level at Sensor Node ( SNs) and another level at Cluster Head ( CH )Node. At the first level of Fusion, SNs Send single most common measurement to the CH by using a similarity function. Second Level of Fusion is applied on CH to remove similar multi attribute measurement. Also at CH Node, Multiple Correlation is used to accurately distinguish the abnormal event from the normal event or outliers within minimum delay. Experimental results are performed on real data to validate proposed Data Fusion Model is Better in terms of data transfer over Network, Redundancy, Energy Consumption over the Prefix Filtering Technique (PFF) . This Model is also used to accurately detect early events in case of Emergency.
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
The k- Means clustering algorithm is an old algorithm that has been intensely researched owing to its ease
and simplicity of implementation. Clustering algorithm has a broad attraction and usefulness in
exploratory data analysis. This paper presents results of the experimental study of different approaches to
k- Means clustering, thereby comparing results on different datasets using Original k-Means and other
modified algorithms implemented using MATLAB R2009b. The results are calculated on some performance
measures such as no. of iterations, no. of points misclassified, accuracy, Silhouette validity index and
execution time
Time Orient Multi Attribute Sensor Selection Technique For Data Collection In...inventionjournals
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
IRJET-Security Based Data Transfer and Privacy Storage through Watermark Dete...IRJET Journal
Gowtham.T ,Pradeep Kumar.G " Security Based Data Transfer and Privacy Storage through Watermark Detection ", International Research Journal of Engineering and Technology (IRJET), Volume2,issue-01 April 2015.e-ISSN:2395-0056, p-ISSN:2395-0072. www.irjet.net .published by Fast Track Publications
Abstract
Digital watermarking has been proposed as a technology to ensure copyright protection by embedding an imperceptible, yet detectable signal in visual multimedia content such as images or video. In every field key aspect is the security Privacy is a critical issue when the data owners outsource data storage or processing to a third party computing service. Several attempts has been made for increasing the security related works and avoidance of data loss. Existing system had attain its solution up to its level where it can be further able to attain the parameter refinement. In this paper improvising factor been made on the successive compressive sensing reconstruction part and Peak Signal-to-Noise Ratio (PSNR).Another consideration factor is to increase (CS) rate through de-emphasize the effect of predictive variables that become uncorrelated with the measurement data which eliminates the need of (CS) reconstruction.
Data Retrieval Scheduling For Unsynchronized Channel in Wireless Broadcast Sy...IJERA Editor
Wireless data broadcast is a disseminating data into large number of mobile clients. In many information services, the users may query multiple data items at a time. The environment under consideration is asymmetric in that the information server has much more bandwidth available, as compared to the clients. To maximize the number of downloads given a deadline. It defines a problem called largest number data retrieval (LNDR). To prove the decision problem of LNDR is a NP hard, and to investigate approximation algorithm for it. It also define another problem called minimum cost data retrieval (MCDR), which aims at downloading a set of requested data items with the least response time and energy consumption. Data scheduling problem over unsynchronized channel at server side. In proposed system LNDR and MCDR in push based and pull based broadcast system are used. The proposed approximation algorithms efficiently schedule the data retrieval process of downloading multiple data from multiple channels. Push based and pull based broadcast model are used in unsynchronized channel. When the time needed for channel switching can be ignored, a Maximum Matching optimal algorithm is exhibited for LNDR which requires only polynomial time. The switching time cannot be neglected, finally to provide simulation results to demonstrate the practical efficiency of the proposed algorithms.
An Efficient Frame Embedding Using Haar Wavelet Coefficients And Orthogonal C...IJERA Editor
Digital media, applications, copyright defense, and multimedia security have become a vital aspect of our daily life. Digital watermarking is a technology used for the copyright security of digital applications. In this work we have dealt with a process able to mark digital pictures with a visible and semi invisible hided information, called watermark. This process may be the basis of a complete copyright protection system. Watermarking is implemented here using Haar Wavelet Coefficients and Principal Component analysis. Experimental results show high imperceptibility where there is no noticeable difference between the watermarked video frames and the original frame in case of invisible watermarking, vice-versa for semi visible implementation. The watermark is embedded in lower frequency band of Wavelet Transformed cover image. The combination of the two transform algorithm has been found to improve performance of the watermark algorithm. The robustness of the watermarking scheme is analyzed by means of two distinct performance measures viz. Peak Signal to Noise Ratio (PSNR) and Normalized Coefficient (NC).
Models and approaches for Differential Power AnalysisAndrej Šimko
There are many side-channel attacks, which employ power analysis for discovering secret keys. This paper describes various methods (Hamming weight, Hamming distance, Switching distance and Signed distance) and approaches, like Simple Power Analysis (SPA); different kinds of Differential Power Analysis (DPA) - first order, Single-Exponent Multiple-Data Attack (SEMD), Multiple-Exponent Single-Data Attack (MESD), Zero-Exponent Multiple-Data Attack (ZEMD), High-Order, automated template; Correlation Power Analysis (CPA); Mutual Information Analysis (MIA). At the end, there is a chapter on countermeasures against the side channel attacks.
A time efficient and accurate retrieval of range aggregate queries using fuzz...IJECEIAES
Massive growth in the big data makes difficult to analyse and retrieve the useful information from the set of available data’s. Existing approaches cannot guarantee an efficient retrieval of data from the database. In the existing work stratified sampling is used to partition the tables in terms of stratic variables. However k means clustering algorithm cannot guarantees an efficient retrieval where the choosing centroid in the large volume of data would be difficult. And less knowledge about the stratic variable might leads to the less efficient partitioning of tables. This problem is overcome in the proposed methodology by introducing the FCM clustering instead of k means clustering which can cluster the large volume of data which are similar in nature. Stratification problem is overcome by introducing the post stratification approach which will leads to efficient selection of stratic variable. This methodology leads to an efficient retrieval process in terms of user query within less time and more accuracy.
Wireless sensor network (WSN) iscomposed of tiny sensors and they are powered by energy-constraint
battery. The continual need to utilize limited bandwidth to absorb most of the packets has led to a great
deal of research in the field of data compression. WSN has been exploited to balance the maintenance of
readable information content with acceptable error and the cost of energy consumed by the collecting
nodes during the compression. In this paper, we compared two compression techniques: JPEGand
watermark authentication. We applied both techniques and used error measurement to indicate system
performance regardless of the number of execution instructions. Wefound that it is not enough to use
errormeasurement onlybut the number of executing instructions must also beconsidered. The digital
watermark approach showed a lower average error compared to JPEGapproach. However, there were
fewer instructions in the JPEG approach compared towatermark approach.
Secure data aggregation technique for wireless sensor networks in the presenc...LogicMindtech Nologies
NS2 Projects for M. Tech, NS2 Projects in Vijayanagar, NS2 Projects in Bangalore, M. Tech Projects in Vijayanagar, M. Tech Projects in Bangalore, NS2 IEEE projects in Bangalore, IEEE 2015 NS2 Projects, WSN and MANET Projects, WSN and MANET Projects in Bangalore, WSN and MANET Projects in Vijayangar
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
THRESHOLD BASED DATA REDUCTION FOR PROLONGING LIFE OF WIRELESS SENSOR NETWORKpijans
ABSTRACT
Wireless sensor network is a set of tiny elements i.e. sensors. WSN is used in the field of Health Monitoring, Civil Construction, Military Applications and Agricultural etc., for monitoring environmental parameters.The WSN is having the challenges like less processing power, less memory, less bandwidth and battery
powered. The data monitored through the sensors would be sent to the sink for data processing. The data sent from sensor node can be controlled for saving the energy, as maximum energy is consumed for transmission of data and it is not possible to replace the batteries frequently. In this work threshold based and adaptive threshold based data reduction techniques with energy efficient shortest path are used for minimizing the energy of sensor node and the network. Adaptive approach saves energy and reduce data by 30% to 40% as compared to threshold based approach.
REDUCING THE COGNITIVE LOAD ON ANALYSTS THROUGH HAMMING DISTANCE BASED ALERT ...IJNSA Journal
Previous work introduced the idea of grouping alerts at a Hamming distance of 1 to achieve lossless alert aggregation; such aggregated meta-alerts were shown to increase alert interpretability. However, a mean of 84023 daily Snort alerts were reduced to a still formidable 14099 meta-alerts. In this work, we address this limitation by investigating several approaches that all contribute towards reducing the burden on the analyst and providing timely analysis. We explore minimizing the number of both alerts and data elements by aggregating at Hamming distances greater than 1. We show how increasing bin sizes can improve aggregation rates. And we provide a new aggregation algorithm that operates up to an order of magnitude faster at Hamming distance 1. Lastly, we demonstrate the broad applicability of this approach through empirical analysis of Windows security alerts, Snort alerts, netflow records, and DNS logs. The result is a reduction in the cognitive load on analysts by minimizing the overall number of alerts and the number of data elements that need to be reviewed in order for an analyst to evaluate the set of original alerts.
A NOVEL ALERT CORRELATION TECHNIQUE FOR FILTERING NETWORK ATTACKSIJNSA Journal
An alert correlation is a high-level alert evaluation technique for managing large volumes of irrelevant and redundant intrusion alerts raised by Intrusion Detection Systems (IDSs).Recent trends show that pure intrusion detection no longer can satisfy the security needs of organizations. One problem with existing alert correlation techniques is that they group related alerts together without putting their severity into consideration. This paper proposes a novel alert correlation technique that can filter unnecessary and low impact alerts from a large volume of intrusion. The proposed technique is based on a supervised feature selection method that usesclass type to define the correlation between alerts. Alerts of similar class type are identified using a class label. Class types are further classified based on their metric ranks of low, medium and high level. Findings show that the technique is able detect and report high level intrusions.
A COST EFFECTIVE COMPRESSIVE DATA AGGREGATION TECHNIQUE FOR WIRELESS SENSOR N...ijasuc
In wireless sensor network (WSN) there are two main problems in employing conventional compression
techniques. The compression performance depends on the organization of the routes for a larger extent.
The efficiency of an in-network data compression scheme is not solely determined by the compression
ratio, but also depends on the computational and communication overheads. In Compressive Data
Aggregation technique, data is gathered at some intermediate node where its size is reduced by applying
compression technique without losing any information of complete data. In our previous work, we have
developed an adaptive traffic aware aggregation technique in which the aggregation technique can be
changed into structured and structure-free adaptively, depending on the load status of the traffic. In this
paper, as an extension to our previous work, we provide a cost effective compressive data gathering
technique to enhance the traffic load, by using structured data aggregation scheme. We also design a
technique that effectively reduces the computation and communication costs involved in the compressive
data gathering process. The use of compressive data gathering process provides a compressed sensor
reading to reduce global data traffic and distributes energy consumption evenly to prolong the network
lifetime. By simulation results, we show that our proposed technique improves the delivery ratio while
reducing the energy and delay
Data Accuracy Models under Spatio - Temporal Correlation with Adaptive Strate...IDES Editor
Wireless sensor nodes continuously observe and
sense statistical data from the physical environment. But what
degree of accurate data sensed by the sensor nodes
collaboratively is a big issue for wireless sensor networks.
Hence in this paper, we describe accuracy models of sensor
networks for collecting accurate data from the physical
environment under two conditions. First condition: we propose
accuracy model which requires a priori knowledge of statistical
data of the physical environment called Estimated Data
Accuracy (EDA) model. Simulation results shows that EDA
model can sense more accurate data from the physical
environment than the other information accuracy models in
the network. Moreover using EDA model, there exist an
optimal set of sensor nodes which are adequate to perform
approximately the same data accuracy level achieve by the
network. Finally we simulate EDA model under the thread of
malicious attacks in the network due to extreme physical
environment. Second condition: we propose another accuracy
model using Steepest Decent method called Adaptive Data
Accuracy (ADA) model which doesn’t require any a priori
information of input signal statistics. We also show that using
ADA model, there exist an optimal set of sensor nodes which
measures accurate data and are sufficient to perform the same
data accuracy level achieve by the network. Further in ADA
model, we can reduce the amount of data transmission for
these optimal set of sensor nodes using a model called Spatio-
Temporal Data Prediction (STDP) model. STDP model
captures the spatial and temporal correlation of sensing data
to reduce the communication overhead under data reduction
strategies. Finally using STDP model, we illustrate a
mechanism to trace the malicious nodes in the network under
extreme physical environment. Computer simulations
illustrate the performance of EDA, ADA and STDP models
respectively.
An Analytical Model of Latency and Data Aggregation Tradeoff in Cluster Based...ijsrd.com
Sensor networks are collection of sensor nodes which co-operatively send sensed data to base station. As sensor nodes are battery driven, an efficient utilization of power is essential in order to use networks for long duration. Therefore it is needed to reduce data traffic inside sensor networks, thereby reducing the amount of data that is needed to send to base station. The main goal of data aggregation algorithms is to gather and aggregate data in an energy efficient manner so that network lifetime is enhanced. In wireless sensor network, periodic data sampling leads to enormous collection of raw facts, the transmission of which would rapidly deplete the sensor power. A fundamental challenge in the design of wireless sensor networks (WSNs) is to maximize their lifetimes. Data aggregation has emerged as a basic approach in WSNs in order to reduce the number of transmissions of sensor nodes, and hence minimizing the overall power consumption in the network. Data aggregation is affected by several factors, such as the placement of aggregation points, the aggregation function, and the density of sensors in the network. In this paper, an analytical model of wireless sensor network is developed and performance is analyzed for varying degree of aggregation and latency parameters. The overall performance of our proposed methods is evaluated using MATLAB simulator in terms of aggregation cycles, average packet drops, transmission cost and network lifetime. Finally, simulation results establish the validity and efficiency of the approach.
TWO LEVEL DATA FUSION MODEL FOR DATA MINIMIZATION AND EVENT DETECTION IN PERI...pijans
Periodic Wireless Sensor Networks used for many applications such continuous Weather Monitoring, Cattle Monitoring, Water Quality Monitoring and Event Detection Applications. Due to continuous Monitoring, Large Amount of redundant data is transferred over the Network, which causes depletion of energy resources. Data Fusion is used with the Aim of deleting redundant, inconsistent data and provides more accurate data to the Sink. However, redundant data improves the data quality. Therefore, Two Level Data Fusion Model is proposed in this paper to transfer minimized data with the ability of accurately
determines the event with minimum Delay. Data Fusion Model employs Cluster based Data Fusion at Two Levels; First Level at Sensor Node ( SNs) and another level at Cluster Head ( CH )Node. At the first level of Fusion, SNs Send single most common measurement to the CH by using a similarity function. Second Level of Fusion is applied on CH to remove similar multi attribute measurement. Also at CH Node, Multiple Correlation is used to accurately distinguish the abnormal event from the normal event or outliers within minimum delay. Experimental results are performed on real data to validate proposed Data Fusion Model is Better in terms of data transfer over Network, Redundancy, Energy Consumption over the Prefix Filtering Technique (PFF) . This Model is also used to accurately detect early events in case of Emergency.
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
The k- Means clustering algorithm is an old algorithm that has been intensely researched owing to its ease
and simplicity of implementation. Clustering algorithm has a broad attraction and usefulness in
exploratory data analysis. This paper presents results of the experimental study of different approaches to
k- Means clustering, thereby comparing results on different datasets using Original k-Means and other
modified algorithms implemented using MATLAB R2009b. The results are calculated on some performance
measures such as no. of iterations, no. of points misclassified, accuracy, Silhouette validity index and
execution time
Time Orient Multi Attribute Sensor Selection Technique For Data Collection In...inventionjournals
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
IRJET-Security Based Data Transfer and Privacy Storage through Watermark Dete...IRJET Journal
Gowtham.T ,Pradeep Kumar.G " Security Based Data Transfer and Privacy Storage through Watermark Detection ", International Research Journal of Engineering and Technology (IRJET), Volume2,issue-01 April 2015.e-ISSN:2395-0056, p-ISSN:2395-0072. www.irjet.net .published by Fast Track Publications
Abstract
Digital watermarking has been proposed as a technology to ensure copyright protection by embedding an imperceptible, yet detectable signal in visual multimedia content such as images or video. In every field key aspect is the security Privacy is a critical issue when the data owners outsource data storage or processing to a third party computing service. Several attempts has been made for increasing the security related works and avoidance of data loss. Existing system had attain its solution up to its level where it can be further able to attain the parameter refinement. In this paper improvising factor been made on the successive compressive sensing reconstruction part and Peak Signal-to-Noise Ratio (PSNR).Another consideration factor is to increase (CS) rate through de-emphasize the effect of predictive variables that become uncorrelated with the measurement data which eliminates the need of (CS) reconstruction.
Data Retrieval Scheduling For Unsynchronized Channel in Wireless Broadcast Sy...IJERA Editor
Wireless data broadcast is a disseminating data into large number of mobile clients. In many information services, the users may query multiple data items at a time. The environment under consideration is asymmetric in that the information server has much more bandwidth available, as compared to the clients. To maximize the number of downloads given a deadline. It defines a problem called largest number data retrieval (LNDR). To prove the decision problem of LNDR is a NP hard, and to investigate approximation algorithm for it. It also define another problem called minimum cost data retrieval (MCDR), which aims at downloading a set of requested data items with the least response time and energy consumption. Data scheduling problem over unsynchronized channel at server side. In proposed system LNDR and MCDR in push based and pull based broadcast system are used. The proposed approximation algorithms efficiently schedule the data retrieval process of downloading multiple data from multiple channels. Push based and pull based broadcast model are used in unsynchronized channel. When the time needed for channel switching can be ignored, a Maximum Matching optimal algorithm is exhibited for LNDR which requires only polynomial time. The switching time cannot be neglected, finally to provide simulation results to demonstrate the practical efficiency of the proposed algorithms.
An Efficient Frame Embedding Using Haar Wavelet Coefficients And Orthogonal C...IJERA Editor
Digital media, applications, copyright defense, and multimedia security have become a vital aspect of our daily life. Digital watermarking is a technology used for the copyright security of digital applications. In this work we have dealt with a process able to mark digital pictures with a visible and semi invisible hided information, called watermark. This process may be the basis of a complete copyright protection system. Watermarking is implemented here using Haar Wavelet Coefficients and Principal Component analysis. Experimental results show high imperceptibility where there is no noticeable difference between the watermarked video frames and the original frame in case of invisible watermarking, vice-versa for semi visible implementation. The watermark is embedded in lower frequency band of Wavelet Transformed cover image. The combination of the two transform algorithm has been found to improve performance of the watermark algorithm. The robustness of the watermarking scheme is analyzed by means of two distinct performance measures viz. Peak Signal to Noise Ratio (PSNR) and Normalized Coefficient (NC).
Models and approaches for Differential Power AnalysisAndrej Šimko
There are many side-channel attacks, which employ power analysis for discovering secret keys. This paper describes various methods (Hamming weight, Hamming distance, Switching distance and Signed distance) and approaches, like Simple Power Analysis (SPA); different kinds of Differential Power Analysis (DPA) - first order, Single-Exponent Multiple-Data Attack (SEMD), Multiple-Exponent Single-Data Attack (MESD), Zero-Exponent Multiple-Data Attack (ZEMD), High-Order, automated template; Correlation Power Analysis (CPA); Mutual Information Analysis (MIA). At the end, there is a chapter on countermeasures against the side channel attacks.
A time efficient and accurate retrieval of range aggregate queries using fuzz...IJECEIAES
Massive growth in the big data makes difficult to analyse and retrieve the useful information from the set of available data’s. Existing approaches cannot guarantee an efficient retrieval of data from the database. In the existing work stratified sampling is used to partition the tables in terms of stratic variables. However k means clustering algorithm cannot guarantees an efficient retrieval where the choosing centroid in the large volume of data would be difficult. And less knowledge about the stratic variable might leads to the less efficient partitioning of tables. This problem is overcome in the proposed methodology by introducing the FCM clustering instead of k means clustering which can cluster the large volume of data which are similar in nature. Stratification problem is overcome by introducing the post stratification approach which will leads to efficient selection of stratic variable. This methodology leads to an efficient retrieval process in terms of user query within less time and more accuracy.
Wireless sensor network (WSN) iscomposed of tiny sensors and they are powered by energy-constraint
battery. The continual need to utilize limited bandwidth to absorb most of the packets has led to a great
deal of research in the field of data compression. WSN has been exploited to balance the maintenance of
readable information content with acceptable error and the cost of energy consumed by the collecting
nodes during the compression. In this paper, we compared two compression techniques: JPEGand
watermark authentication. We applied both techniques and used error measurement to indicate system
performance regardless of the number of execution instructions. Wefound that it is not enough to use
errormeasurement onlybut the number of executing instructions must also beconsidered. The digital
watermark approach showed a lower average error compared to JPEGapproach. However, there were
fewer instructions in the JPEG approach compared towatermark approach.
Secure data aggregation technique for wireless sensor networks in the presenc...LogicMindtech Nologies
NS2 Projects for M. Tech, NS2 Projects in Vijayanagar, NS2 Projects in Bangalore, M. Tech Projects in Vijayanagar, M. Tech Projects in Bangalore, NS2 IEEE projects in Bangalore, IEEE 2015 NS2 Projects, WSN and MANET Projects, WSN and MANET Projects in Bangalore, WSN and MANET Projects in Vijayangar
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
THRESHOLD BASED DATA REDUCTION FOR PROLONGING LIFE OF WIRELESS SENSOR NETWORKpijans
ABSTRACT
Wireless sensor network is a set of tiny elements i.e. sensors. WSN is used in the field of Health Monitoring, Civil Construction, Military Applications and Agricultural etc., for monitoring environmental parameters.The WSN is having the challenges like less processing power, less memory, less bandwidth and battery
powered. The data monitored through the sensors would be sent to the sink for data processing. The data sent from sensor node can be controlled for saving the energy, as maximum energy is consumed for transmission of data and it is not possible to replace the batteries frequently. In this work threshold based and adaptive threshold based data reduction techniques with energy efficient shortest path are used for minimizing the energy of sensor node and the network. Adaptive approach saves energy and reduce data by 30% to 40% as compared to threshold based approach.
REDUCING THE COGNITIVE LOAD ON ANALYSTS THROUGH HAMMING DISTANCE BASED ALERT ...IJNSA Journal
Previous work introduced the idea of grouping alerts at a Hamming distance of 1 to achieve lossless alert aggregation; such aggregated meta-alerts were shown to increase alert interpretability. However, a mean of 84023 daily Snort alerts were reduced to a still formidable 14099 meta-alerts. In this work, we address this limitation by investigating several approaches that all contribute towards reducing the burden on the analyst and providing timely analysis. We explore minimizing the number of both alerts and data elements by aggregating at Hamming distances greater than 1. We show how increasing bin sizes can improve aggregation rates. And we provide a new aggregation algorithm that operates up to an order of magnitude faster at Hamming distance 1. Lastly, we demonstrate the broad applicability of this approach through empirical analysis of Windows security alerts, Snort alerts, netflow records, and DNS logs. The result is a reduction in the cognitive load on analysts by minimizing the overall number of alerts and the number of data elements that need to be reviewed in order for an analyst to evaluate the set of original alerts.
A NOVEL ALERT CORRELATION TECHNIQUE FOR FILTERING NETWORK ATTACKSIJNSA Journal
An alert correlation is a high-level alert evaluation technique for managing large volumes of irrelevant and redundant intrusion alerts raised by Intrusion Detection Systems (IDSs).Recent trends show that pure intrusion detection no longer can satisfy the security needs of organizations. One problem with existing alert correlation techniques is that they group related alerts together without putting their severity into consideration. This paper proposes a novel alert correlation technique that can filter unnecessary and low impact alerts from a large volume of intrusion. The proposed technique is based on a supervised feature selection method that usesclass type to define the correlation between alerts. Alerts of similar class type are identified using a class label. Class types are further classified based on their metric ranks of low, medium and high level. Findings show that the technique is able detect and report high level intrusions.
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...IJNSA Journal
Building practical and efficient intrusion detection systems in computer network is important in industrial areas today and machine learning technique provides a set of effective algorithms to detect network
intrusion. To find out appropriate algorithms for building such kinds of systems, it is necessary to evaluate various types of machine learning algorithms based on specific criteria. In this paper, we propose a novel evaluation formula which incorporates 6 indexes into our comprehensive measurement, including precision, recall, root mean square error, training time, sample complexity and practicability, in order to
find algorithms which have high detection rate, low training time, need less training samples and are easy
to use like constructing, understanding and analyzing models. Detailed evaluation process is designed to
get all necessary assessment indicators and 6 kinds of machine learning algorithms are evaluated.
Experimental results illustrate that Logistic Regression shows the best overall performance.
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...IJNSA Journal
Building practical and efficient intrusion detection systems in computer network is important in industrial areas today and machine learning technique provides a set of effective algorithms to detect network intrusion. To find out appropriate algorithms for building such kinds of systems, it is necessary to evaluate various types of machine learning algorithms based on specific criteria. In this paper, we propose a novel evaluation formula which incorporates 6 indexes into our comprehensive measurement, including precision, recall, root mean square error, training time, sample complexity and practicability, in order to find algorithms which have high detection rate, low training time, need less training samples and are easy to use like constructing, understanding and analyzing models. Detailed evaluation process is designed to get all necessary assessment indicators and 6 kinds of machine learning algorithms are evaluated. Experimental results illustrate that Logistic Regression shows the best overall performance.
Defending Reactive Jammers in WSN using a Trigger Identification Service.ijsrd.com
In the last decade, the greatest threat to the wireless sensor network has been Reactive Jamming Attack because it is difficult to be disclosed and defend as well as due to its mass destruction to legitimate sensor communications. As discussed above about the Reactive Jammers Nodes, a new scheme to deactivate them efficiently is by identifying all trigger nodes, where transmissions invoke the jammer nodes, which has been proposed and developed. Due to this identification mechanism, many existing reactive jamming defending schemes can be benefited. This Trigger Identification can also work as an application layer .In this paper, on one side we provide the several optimization problems to provide complete trigger identification service framework for unreliable wireless sensor networks and on the other side we also provide an improved algorithm with regard to two sophisticated jamming models, in order to enhance its robustness for various network scenarios.
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
A method to remove chattering alarms using median filtersISA Interchange
Chattering alarms are the most found nuisance alarms that will probably reduce the usability and result in a confidence crisis of alarm systems for industrial plants. This paper addresses the chattering alarm reduction using median filters. Two rules are formulated to design the window size of median filters. If the alarm probability is estimated using process data, one rule is based on the probability of alarms to satisfy some requirements on the false alarm rate, or missed alarm rate. If there are only historical alarm data available, the other rule is based on percentage reduction of chattering alarms using alarm duration distribution. Experimental results for industrial cases testify that the proposed method is effective.
High performance intrusion detection using modified k mean & naïve bayeseSAT Journals
Abstract
Internet Technology is growing at exponential rate day by day, making data security of computer systems more complex and critical. There has been multiple methodology implemented for the same in recent time as detailed in [1], [3]. Availability of larger bandwidth has made the multiple large computer server network connected worldwide and thus increasing the load on the necessity to secure data and Intrusion detection system (IDS) is one of the most efficient technique to maintain security of computer system. The proposed system is designed in such a way that are helpful in identifying malicious behavior and improper use of computer system. In this report we proposed a hybrid technique for intrusion detection using data mining algorithms. Our main objective is to do complete analysis of intrusion detection Dataset to test the implemented system.In This report we will propose a new methodology in which Modified k-mean is used for clustering whereas Naïve Bayes for the classification. These two data mining techniques will be used for Intrusion detection in large horizontally distributed database.
Keywords: Intrusion Detection, Modified K-Mean, Naïve Bays
ADRISYA: A FLOW BASED ANOMALY DETECTION SYSTEM FOR SLOW AND FAST SCANIJNSA Journal
Attackers perform port scan to find reachability, liveness and running services in a system or network. Current day scanning tools provide different scanning options and capable of evading various security tools like firewall, IDS and IPS. So in order to detect and prevent attacks in the early stages, an accurate detection of scanning activity in real time is very much essential. In this paper we present a flow based protocol behaviour analysis system to detect TCP based slow and fast scan. This system provides scalable, accurate and generic solution to TCP based scanning by means of automatic behaviour analysis of the network traffic. Detection capability of proposed system is compared with SNORT and result proves the high detection rate of the system over SNORT.
A NOVEL SCHEME FOR ACCURATE REMAINING USEFUL LIFE PREDICTION FOR INDUSTRIAL I...ijaia
In the era of the fourth industrial revolution, measuring and ensuring the reliability, efficiency and safety of the industrial systems and components are one of the uppermost key concern. In addition, predicting performance degradation or remaining useful life (RUL) of an equipment over time based on its historical sensor data enables companies to greatly reduce their maintenance cost. In this way, companies can prevent costly unexpected breakdown and become more profitable and competitive in the marketplace. This paper introduces a deep learning-based method by combining CNN(Convolutional Neural Networks) and LSTM (Long Short-Term Memory)neural networks to predict RUL for industrial equipment. The proposed method does not depend upon any degradation trend assumptions and it can learn complex temporal representative and distinguishing patterns in the sensor data. In order to evaluate the efficiency and effectiveness of the proposed method, we evaluated it on two different experiment: RUL estimation and predicting the status of the IoT devices in 2-week period. Experiments are conducted on a publicly available NASA’s turbo fan-engine dataset. Based on the experiment results, the deep learning-based approach achieved high prediction accuracy. Moreover, the results show that the method outperforms standard well-accepted machine learning algorithms and accomplishes competitive performance when compared to the state-of-the art methods
A NOVEL SCHEME FOR ACCURATE REMAINING USEFUL LIFE PREDICTION FOR INDUSTRIAL I...gerogepatton
In the era of the fourth industrial revolution, measuring and ensuring the reliability, efficiency and safety of the industrial systems and components are one of the uppermost key concern. In addition, predicting performance degradation or remaining useful life (RUL) of an equipment over time based on its historical sensor data enables companies to greatly reduce their maintenance cost. In this way, companies can prevent costly unexpected breakdown and become more profitable and competitive in the marketplace. This paper introduces a deep learning-based method by combining CNN(Convolutional Neural Networks) and LSTM (Long Short-Term Memory)neural networks to predict RUL for industrial equipment. The proposed method does not depend upon any degradation trend assumptions and it can learn complex temporal representative and distinguishing patterns in the sensor data. In order to evaluate the efficiency and effectiveness of the proposed method, we evaluated it on two different experiment: RUL estimation and predicting the status of the IoT devices in 2-week period. Experiments are conducted on a publicly available NASA’s turbo fan-engine dataset. Based on the experiment results, the deep learning-based approach achieved high prediction accuracy. Moreover, the results show that the method outperforms standard well-accepted machine learning algorithms and accomplishes competitive performance when compared to the state-of-the art methods
An intrusion detection algorithm for amiIJCI JOURNAL
Nowadays, using the smart metering devices for energy users to manage a wide variety of subscribers,
reading devices for measuring, billing, disconnection and connection of subscribers’ connection
management is an important issue. The performance of these intelligent systems is based on information
transfer in the context of information technology, so reported data from network should be managed to
avoid the malicious activities that including the issues that could affect the quality of service the system. In
this paper for control of the reported data and to ensure the veracity of the obtained information, using
intrusion detection system is proposed based on the support vector machine and principle component
analysis (PCA) to recognize and identify the intrusions and attacks in the smart grid. Here, the operation of
intrusion detection systems for different kernel of SVM when using support vector machine (SVM) and PCA
simultaneously is studied. To evaluate the algorithm, based on data KDD99, numerical simulation is done
on five different kernels for an intrusion detection system using support vector machine with PCA
simultaneously. Also comparison analysis is investigated for presented intrusion detection algorithm in
terms of time - response, rate of increase network efficiency and increase system error and differences in
the use or lack of use PCA. The results indicate that correct detection rate and the rate of attack error
detection have best value when PCA is used, and when the core of algorithm is radial type, in SVM
algorithm reduces the time for data analysis and enhances performance of intrusion detection.
Analysis on different Data mining Techniques and algorithms used in IOTIJERA Editor
In this paper, we discusses about five functionalities of data mining in IOT that affects the performance and that
are: Data anomaly detection, Data clustering, Data classification, feature selection, time series prediction. Some
important algorithm has also been reviewed here of each functionalities that show advantages and limitations as
well as some new algorithm that are in research direction. Here we had represent knowledge view of data
mining in IOT.
COPYRIGHTThis thesis is copyright materials protected under the .docxvoversbyobersby
COPYRIGHT
This thesis is copyright materials protected under the Berne Convection, the copyright Act 1999 and other international and national enactments in that behalf, on intellectual property. It may not be reproduced by any means in full or in part except for short extracts in fair dealing so for research or private study, critical scholarly review or discourse with acknowledgment, with written permission of the Dean School of Graduate Studies on behalf of both the author and XXX XXX University.ABSTRACT
With Fast growing internet world the risk of intrusion has also increased, as a result Intrusion Detection System (IDS) is the admired key research field. IDS are used to identify any suspicious activity or patterns in the network or machine, which endeavors the security features or compromise the machine. IDS majorly use all the features of the data. It is a keen observation that all the features are not of equal relevance for the detection of attacks. Moreover every feature does not contribute in enhancing the system performance significantly. The main aim of the work done is to develop an efficient denial of service network intrusion classification model. The specific objectives included: to analyse existing literature in intrusion detection systems; what are the techniques used to model IDS, types of network attacks, performance of various machine learning tools, how are network intrusion detection systems assessed; to find out top network traffic attributes that can be used to model denial of service intrusion detection; to develop a machine learning model for detection of denial of service network intrusion.Methods: The research design was experimental and data was collected by simulation using NSL-KDD dataset. By implementing Correlation Feature Selection (CFS) mechanism using three search algorithms, a smallest set of features is selected with all the features that are selected very frequently. Findings: The smallest subset of features chosen is the most nominal among all the feature subset found. Further, the performances using Artificial neural networks(ANN), decision trees, Support Vector Machines (SVM) and K-Nearest Neighbour (KNN) classifiers is compared for 7 subsets found by filter model and 41 attributes. Results: The outcome indicates a remarkable improvement in the performance metrics used for comparison of the two classifiers. The results show that using 17/18 selected features improves DOS types classification accuracies as compared to using the 41 features in the NSL-KDD dataset. It was further observed that using an ensemble of three classifiers with decision fusion performs better as compared to using a single classifier for DOS type’s classification. Among machine learning tools experimented, ANN achieved best classification accuracies followed by SVM and DT. KNN registered the lowest classification accuracies. Application: The proposed work with such an improved detection rate and lesser classification time and lar.
ESTIMATING HANDLING TIME OF SOFTWARE DEFECTScsandit
The problem of accurately predicting handling time for software defects is of great practical
importance. However, it is difficult to suggest a practical generic algorithm for such estimates,
due in part to the limited information available when opening a defect and the lack of a uniform
standard for defect structure. We suggest an algorithm to address these challenges that is
implementable over different defect management tools. Our algorithm uses machine learning
regression techniques to predict the handling time of defects based on past behaviour of similar
defects. The algorithm relies only on a minimal set of assumptions about the structure of the
input data. We show how an implementation of this algorithm predicts defect handling time with
promising accuracy results
Similar to REDUCING THE COGNITIVE LOAD ON ANALYSTS THROUGH HAMMING DISTANCE BASED ALERT AGGREGATION (20)
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Connector Corner: Automate dynamic content and events by pushing a button
REDUCING THE COGNITIVE LOAD ON ANALYSTS THROUGH HAMMING DISTANCE BASED ALERT AGGREGATION
1. International Journal of Network Security & Its Applications (IJNSA), Vol.6, No.5, September 2014
REDUCING THE COGNITIVE LOAD ON
ANALYSTS
THROUGH HAMMING DISTANCE BASED
ALERT AGGREGATION
Peter Mell1, Richard Harang2
1 National Institute of Standards and Technology, Gaithersburg, MD, USA
2 U.S. Army Research Laboratory, Baltimore, Maryland, USA
ABSTRACT
Previous work introduced the idea of grouping alerts at a Hamming distance of 1 to achieve lossless alert
aggregation; such aggregated meta-alerts were shown to increase alert interpretability. However, a mean
of 84023 daily Snort alerts were reduced to a still formidable 14099 meta-alerts. In this work, we address
this limitation by investigating several approaches that all contribute towards reducing the burden on the
analyst and providing timely analysis. We explore minimizing the number of both alerts and data elements
by aggregating at Hamming distances greater than 1. We show how increasing bin sizes can improve
aggregation rates. And we provide a new aggregation algorithm that operates up to an order of magnitude
faster at Hamming distance 1. Lastly, we demonstrate the broad applicability of this approach through
empirical analysis of Windows security alerts, Snort alerts, netflow records, and DNS logs. The result is a
reduction in the cognitive load on analysts by minimizing the overall number of alerts and the number of
data elements that need to be reviewed in order for an analyst to evaluate the set of original alerts.
KEYWORDS
Alert aggregation, Cognitive load, Hamming Distance, Hypergraphs, Security logs
1. INTRODUCTION
Human review of security logs is a difficult and labor-intensive process. This is especially true in
the area of intrusion detection systems (IDSs) which often suffer from extremely high false posi-tive
rates (see, e.g. [1] [2] [3] [4], among others). This problem is exacerbated in signature-based
systems such as Snort1 [5], where broadly-written rules may trigger repeatedly on innocuous
packets. This large number of false positive results creates a significant workload for IDS ana-lysts,
who must sort through them in order to locate the relatively few true positives. It is thus
desirable to automate abstraction and correlation of the logs so as to enable analysts to make effi-cient
decisions.
The work of [6] mitigated this problem by providing an algorithm to aggregate Snort intrusion
detection alerts with discretely-valued fields by combining those that are at most Hamming dis-tance
1 apart. This approach was shown effective in both reducing the number of resulting meta-alerts
that need to be reviewed by analysts and in increasing their interpretability (see [6] for ex-
1 Any mention of commercial products or reference to commercial organizations is for information only; it does not
imply recommendation or endorsement by the U.S. government nor does it imply that the products mentioned are nec-essarily
the best available for the purpose.
DOI : 10.5121/ijnsa.2014.6503 35
2. International Journal of Network Security & Its Applications (IJNSA), Vol.6, No.5, September 2014
ample meta-alerts). Related alerts were merged, reducing human analysis time and enabling more
rapid identification of significant threats. Furthermore, the reduction of alerts to meta-alerts was
lossless in that the original alerts could be reconstructed from the meta-alerts. This was important
because it provided analysts all alert information when making relevancy decisions. The method
achieved an average alert reduction of 83.2 % applying their method to 30 days of Snort alerts
grouped by hour, with an execution complexity of O(n2) where n represents the number of alerts.
While the reduction on Snort alerts using a Hamming distance of 1 was significant, the remaining
number of meta-alerts was still considerable (although it should be understood that their data was
taken from a large enterprise-scale production network). For example, a mean 24 hour time slice
of 84023 alerts was reduced to a mean of 14099 meta-alerts by aggregating on hourly batches.
Reviewing such a number of meta-alerts on a daily basis still represents a challenge despite the
improvement.
In this work, we address this limitation by investigating several approaches that all contribute
towards reducing the cognitive load on the analyst, by both reducing the overal number of alerts
and reducing the number of data elements that need to be reviewed in order for an analyst to
evaluate the set of original alerts. We also investigate approaches that providing more timely
analysis. In performing this exploration, we uncover several new aggregation capabilities not
available in the original work.
1) We explore how to decrease the number of meta-alerts alerts down to some user defined
maximum by varying the Hamming distance used for aggregation (the original work operated
only at a Hamming distance of 1). We find that increasing the Hamming distance monotonically
decreases the number of meta-alerts, while at the same time increasing the level of abstraction of
those alerts. Also, we empirically discover that there exists an operating point that presents a min-imum
number of overall data elements, consistently at low Hamming distances (universally be-tween
1 and 4 in all of our data sets). This is important because the number of data elements re-flects
the amount of work that must be done by the analyst.
2) We explore how increasing bin sizes reduces the number of meta-alerts (something not
tested in [6]). For example, for Hamming distance 1 aggregation using the original algorithm, the
aggregation rate for Snort alerts rises from a mean of 83.2 % for hourly bins to 96.1 % for daily
bins. We can thus reduce the mean daily time slice of 14099 meta-alerts from [6] down to a mean
of 3276 meta-alerts. This is a much more manageable workload for human review by a large en-terprise.
3) Lastly, we present a new O(nlogn) hypergraph based aggregation algorithm that can run up
to an order of magnitude faster at small Hamming distances, thus facilitating on-demand analysis
of larger time slices. Furthermore, its construction allows for a streaming mode where alerts are
analyzed upon arrival and aggregated meta-alerts dumped on demand, a capability that cannot be
replicated using the intrinsically batch-oriented approach in [6]. The new algorithm provides
slightly better aggregation (a 4.9 % improvement for daily bins of Snort alerts at a Hamming dis-tance
36
of 1).
The original research also focused solely on a single Snort IDS data set. In this work, we demon-strate
more general applicability of the Hamming distance aggregation approach by applying it to
three new data types: Windows security alerts, Cisco version 5 Netflow, and Domain Name Ser-vice
(DNS) request logs. We also reevaluate the Snort data set from [6] for comparative purposes.
Lastly, we improve upon the original work by providing a Public domain Python 2.7.3 implemen-tation
for our new aggregation algorithm (available at [7]).
In summary, the primary contributions of this paper are:
3. International Journal of Network Security & Its Applications (IJNSA), Vol.6, No.5, September 2014
An analysis of how increasing Hamming distance decreases the number of meta-alerts. Also
we empirically discover that there exist operating points that that yield a minimum number of
overall data elements at consistently low Hamming distances (typically on the order of
where m is the number of columns in the data).
The discovery that increased bin sizes reduces the number of meta-alerts.
Provision of a new O(nlogn) hypergraph based aggregation algorithm that has streaming mode
capabilities, better aggregation, and up to an order of magnitude improvement in runtime at
small Hamming distances.
Evidence of the applicability of Hamming distance aggregation to four data sets: Snort, Win-dows
37
security events, Netflow, and DNS.
Provision of public domain Python 2.7.3 code for our new aggregation algorithm.
The impact of these results is as follows. The results that minimize the number of alerts to review
reduce the context switching that an analyst must perform between reading distinct alerts. The
results that minimize the number of data elements to review reduce the overall amount of data
that an analyst must ingest to understand the set of original alerts (and a certain view of their
relationships). The results that increase the speed of analysis (and offer a streaming mode
operation) reduce the latency in which an analyst receives the data to review. The provision of
working code enables immediate testing and use by operational groups.
The development of the work is as follows. In section 2, we discuss related work. In section 3, we
provide a definition and an example of Hamming distance aggregation. In section 4 we present a
new Hamming distance aggregation algorithm and section 5 discusses our expansion to the
previously published algorithm. In section 6 we describe the input data sets and our experiment;
section 7 provides the results. Although covered in [6], section 8 reviews the intuition behind the
Hamming Distance aggregation approach. We also provide our experience with its use and feed-back
from operational analysts. Section 9 summarizes our conclusions.
2. RELATED WORK
The problem of alert aggregation has been addressed from a variety of perspectives. A more gen-eral
overview is presented in [6], however in brief, alert aggregation approaches fall into two
broad categories.
The first category involves expert knowledge used to construct a system for classifying,
correlating, and ranking alerts based on external knowedge of existing attacks [3] [8] [9] [10]
[11] [12], network vulnerabilities [3] [9], or both [10]. Rules may also be aggregated using user-defined
similarity metrics based on expert knowledge [3], and some alerts may be completely
ignored outright if prior knowledge suggests that they are irrelevant [13].
The second category encompasses probabilistic and data mining approaches, which are used to
group and aggregate alerts without requiring the construction and maintenance of external data
stores or schema. IDS alerts are often discrete in nature, however, which often poses a challenge
to such methods [14]. Nevertheless, a variety of approaches [1] [15] [16] have been developed in
this area that have been empirically demonstrated to be effective, even when they rely on
assumptions (such as a Gaussian distribution of continuous features) that while functional in
practice are clearly not accurate in a theoretical sense.
Recent work somewhat related is found in [17]. They identify frequent patterns in alert logs by
use of the Frequent Pattern Outlier Factor (as developed by [18]), classifying variable-length
alerts by identical subsequences of values and ranking them by the frequency of the subsequences
4. International Journal of Network Security Its Applications (IJNSA), Vol.6, No.5, September 2014
in the data. The top arbitrary quantile of alerts with the least common patterns are presented as
‘candidate true alerts’ and the rest discarded. While our work also classifies on identical subse-quences,
our goal is aggregation of similar alerts into a reduced set of meta-alerts without discard-ing
38
any data.
3. OVERVIEW OF HAMMING DISTANCE AGGREGATION
Hamming distance aggregation at an aggregation distance of d may merge a group of alerts into a
meta-alert if the alerts have at most d fields that do not match. An alert may be merged only with
itself to create a meta-alert that covers only that single alert. We define an optimal Hamming dis-tance
aggregation as the smallest possible set of meta-alerts that cover all original alerts at some
maximum Hamming distance d.
As an example, consider the alerts in Table 1. Alerts 4 and 5 may be aggregated at Hamming dis-tance
1 into meta alert (B,F,*), where * is equal to the set (J,K), because they disagree only on
one column, 3. Alert 1 cannot be aggregated because it has two columns, 2 and 3, whose values
do not match any other alerts. In this way, alerts 2 and 3 also may not be aggregated with any
other alerts because each disagrees with all other alerts on 2 columns.
Table 1. Example set of alerts to be aggregated
Alert
Number
Column
1
Column
2
Column 3
1 A C G
2 A D H
3 A E I
4 B F J
5 B F K
4. HYPERGRAPH-BASED ALGORITHM
This section provides a new hypergraph based Hamming distance aggregation algorithm that op-erates
in O(nlogn) through leveraging a set cover approximation algorithm.
4.1 Leveraging Set Cover
In the set cover problem as described in [19] we are provided “a finite set X and a family F of
subsets of X, such that every element of X belongs to at least one subset in F.” The problem is to
find the minimal subset, S, of F such that S includes all elements of X. Thus, the union of the sub-sets
in S must be equal to X. More formally, given X and F as described above where
, we wish to find a minimal
5. . The set cover problem is known
to be NP-complete [20] and is thus not currently tractable in polynomial time.
It can be proven that Hamming distance aggregation can be reduced to an instance of the set cov-er
problem (proof omitted for space). Conceptually, each alert is encoded as an element of X.
Each possible grouping of alerts separated by a Hamming distance of at most d is encoded as an
element of F. Identification of all such possible groups can be done in O(n) time where n is the
number of alerts, as described later in this paper. Through this, any Hamming distance aggrega-tion
problem can be reduced to a set cover problem in polynomial time. It therefore follows that
any algorithm that can solve the set cover problem may be adapted to solve the Hamming dis-tance
problem. There exists a widely cited greedy approximation algorithm for set cover [21], and
6. International Journal of Network Security Its Applications (IJNSA), Vol.6, No.5, September 2014
in this paper we propose to use this approach to approximate optimal Hamming distance aggrega-tion.
39
The greedy approximation algorithm for set cover is as follows [19]:
Set Cover Approximation (X,F)
1. Remaining=X
2. Soln_set= None
3. While Remaining is not None
a. Select an element, S, of F that maximally covers the elements in Remaining
b. Remove from Remaining the elements in S
c. Add to Soln_set the subset S
4. Return Soln_set
As analyzed in [19] there exists an implementation that has time complexity O(
7. ); with
Hamming distance aggregation the size of S may be equal to n, the number of alerts. Furthermore,
the number of elements in F is always greater than or equal to n since in the worst case each alert
may be merged only with itself to form a meta-alert. Using this analysis, the greedy set cover
approximation algorithm has a time complexity of O(n2) when applied to Hamming distance ag-gregation.
A major contribution of our work is to use unique characteristics of Hamming distance
aggregation to reduce this time complexity to O(nlogn).
4.2 Algorithm Description
In this section, we describe a high level approach using hypergraphs to aggregate alerts at varying
Hamming distances. We then prove that the algorithm always extracts a meta-alert representing
the largest available grouping of unaggregated alerts during each iteration, thus implementing the
greedy approximation to the set cover problem. In the next section, we provide an efficient im-plementation
and evaluate the algorithmic complexity.
In order to reliably extract all meta-alerts using a Hamming distance of d, we construct a hyper-graph
over the space of alerts, where each alert represents a node in the hypergraph, and each
edge connects all nodes that are at most Hamming distance d from each other. If our alert set to be
aggregated has n alerts with m, columns, we thus construct a hypergraph with n nodes, each node
having mCd edges connecting it with other nodes in the graph (where mCd is the binomial coef-ficient).
Each hyperedge thus represents a potential meta-alert and is labeled with the value of the
meta-alert. Note that the field(s) that differ between the alerts covered by a hyperedge is denoted
in a meta-alert by the wildcard ‘*’ character. The hyperedge label then represents all alerts cov-ered
by it.
Using the example of the previously discussed Table 1 for illustrative purposes with a Ham-ming
distance of 1, alert 1 would be entered as a node labeled with the tuple (A,C,G), and be cov-ered
by three hyperedges: (*,C,G), (A,*,G), and (A,C,*). Similarly, alert 5 in Table 1 would be
entered as a node labeled (B,F,K) and be covered by hyperedges (*,F,K), (B,*,K), and (B,F,*).
Note that alert 4 in Table 1 with label (B,F,J) would also be covered by hyperedge (B,F,*), indi-cating
that both nodes covered by that hyperedge could be aggregated into a single meta-alert.
Hyperedge (B,F,*) is the only hyperedge covering more than one node. The complete Hamming
distance 1 aggregation hypergraph for the alerts is provided in
Figure 1.
The power of this construction is that the largest meta-alert at some Hamming distance d can be
easily determined by finding the largest hyperedge and examining the covered nodes.
8. International Journal of Network Security Its Applications (IJN
IJNSA), Vol.6, No.5, September
Figure 1. . Hypergraph for Alert Aggregation Example
We now present our high level algorithm for Hamming distance 1 aggregation
the greedy set cover approximation algorithm:
that implements
1. In the initial collection step, we assemble the alerts into a hypergraph.
2. To extract the meta-alert covering the most available alerts, we identify the hyperedge with
the largest incident node count; this edge and all nodes incident upon this edge are removed
from the graph and processed into a meta-meta
alert. In case of a tie, we arbitrarily choose a candi-
cand
date meta-alert. . This handling of ties is consistent with published implementations of the
greedy edy set cover approximation algorithm
[19].
3. The statistics of the hypergraph are updated to reflect the removed nodes. In particular, the
incident node count for each hyperedge must be updated to reflect the removal of the nod
associated with the meta-alerts.
4. Proceed from step 2 in the updated hypergraph.
This algorithm is straightforward
the remaining data for each iteration. However, its performance is signi
search for the next hyperedge to convert to a meta
hypergraph after the removal of the largest meta
to our algorithm to address these com
nodes
is and will always produce the single largest meta-alert possible in
4.3 Implementation Details
significantly impacted by the
meta-alert as well as the process of updating the
meta-alert. The next section discusses optimizations
computational challenges and presents time complexity results.
Assume that there are n alerts with
tance d. Our implementation of the high level algorithm is given below.
m data fields per alert being aggregated at a Hamming di
Step 1: Hypergraph Construction
The hypergraph is constructed by iterating through each alert (e.g., (A,C,G)) and generating every
possible hyperedge label (e.g. for
d=1, (*,C,G), (A,*,G), and (A,C,*)) in O(n×(mCd
hyperedge label is used as a key
for inserting the alert into a hash table referred to as ‘hdict’
hdict keys represent the hyperedges and the values
are themselves smaller hash tables
representing the nodes covered by the respective hyperedge. Thus, the total complexity of hype
graph construction is O(n×(mCd)
)).
SA), 2014
40
ficantly putational dis-
d)) time. Each
hdict’. The
hyper-
9. International Journal of Network Security Its Applications (IJNSA), Vol.6, No.5, September 2014
41
Step 2: Singleton Removal
This step extracts as meta-alerts all alerts (or groups of identical alerts) that cannot be aggregated
with any other alerts because no hyperedge connects them to another distinct alert. This can be
done in O(n×(mCd)) time.
Step 3: Sorted Hyperedge Tree
Next we create a red-black tree to sort the hyperedges by size. For all keys in the hdict, we look
up the number of alerts covered by each hyperedge in O(n×(mCd)). This creates all key-value
pairs of the form (hyperedge label, number of covered alerts). Each key-value pair is inserted into
the tree sorted by the number of covered alerts in O(n×(mCd)×log(n×(mCd))).
Step 4: Extract Meta-Alerts
We now iteratively identify the largest hyperedge, create an associated meta-alert, and call step 5
to update the data structures to reflect removal of the hyperedge and all covered alerts. This can
be done in O(n×log(n×(mCd))). In each meta-alert we store the hyperedge label and the values
corresponding to the ‘*’ fields (we don’t store each alert separately), unless all alerts in the hyper-edge
are identical in which case we simply store a single copy of the alert along with the number
of times it is repeated.
Step 5: Data Structure Update
For each meta-alert extracted in step 4, this step must update the data structures to make them
ready for identification and removal of the next largest hyperedge. Doing this involves three
parts: 1) removing the chosen hyperedge from the tree, 2) removing the chosen hyperedge from
hdict, and 3) removing the covered alerts from all other hyperedge entries in hdict and then updat-ing
the hyperedge sizes in the tree. An amortized analysis yields O(n×(mCd)×log(n×(mCd))).
All five steps together then produce a time complexity for the entire algorithm of
O(n×(mCd)×log(n×(mCd))). We treat m as a constant since for a particular data set, the number of
fields will be fixed. For d=1 then, we get O(nlogn) which empirically gives us up to an order of
magnitude improvement over the O(n2) Hamming distance 1 aggregation in [6].
Note that, in contrast to the aggregation process described in [6], which evaluates a single column
at a time across all alerts, the present method can operate in an incremental method. At any point
in the collection process, the construction of the hypergraph in step 1 may be briefly suspended
and an arbitrary number of meta-alerts may be extracted, at each stage updating the hypergraph to
account for the removal of the alerts covered by the meta-alert (a single iteration of step 3, fol-lowed
by alternating 4 and 5 for as many meta-alerts as desired). The updated hypergraph may
then immediately begin accepting additional alerts.
To limit the mCd growth from overwhelming the execution time, we have coded in a threshold
value. If mCd exceeds the threshold then we permanently wildcard the field with the greatest en-tropy
and then try to run the algorithm again with (m-1)C(d-1). If this value still exceeds the thre-shold
then we repeat the process iteratively, walking down the mCd curve until we fall below the
threshold. This thresholding mechanism is crucial for enabling fast operation at mid-range Ham-ming
distances.
10. International Journal of Network Security Its Applications (IJNSA), Vol.6, No.5, September 2014
42
5. COLUMN-BASED ALGORITHM
The prior approach presented in [6] was designed to work only at a Hamming distance of 1. Like
our algorithm, it iteratively attempts to extract the largest available meta-alert until no alerts are
left. It identifies the largest available meta-alert through iteratively identifying columns having
maximal sets of alerts with the same value in order to identify meta-alerts. More specifically, it
finds the alert column with the maximum number of identical values and creates a subset of the
alert database based on alerts with that value in that column. This procedure is done repeatedly
and the set of alerts to be aggregated into a particular meta-alert narrows with each iteration.
When one final column is left, unique values have been identified for all other columns. The algo-rithm
has now identified a meta-alert with an alert Hamming distance of 1.
For our work, we expand this algorithm to aggregate over variable Hamming distances. We do
this by stopping the column identification and subset operation once m-d columns have been cho-sen
and the values “fixed” for those fields to create the next meta-alert. Unfortunately, this algo-rithm
will not always extract the largest grouping from the set of available alerts as proven in
Theorem 1 below. Thus, it does not implement its greedy set cover approach perfectly and the
size of the successively extracted meta-alerts is may not be monotonically decreasing. In our em-pirical
studies, the size was never monotonically decreasing and thus non-optimal with respect to
following the set cover approximation algorithm.
Theorem 1: The column-based algorithm may not choose the largest available alert grouping.
Proof by counterexample: Assume that the algorithm will always select the largest available alert
grouping at Hamming distance 1 and consider the data shown previously in Table 1. It will first
pick column 1 because it has three “A” values and it will subset alerts 1, 2, and 3 because they
have that value. It will then look at columns 2 and 3 for those alerts and will be unable to identify
any alerts to aggregate because they are all at Hamming distance 2. The resulting meta-alert for
this iteration will contain only a single alert (a random choice of alert 1, 2, or 3) in order to meet
the Hamming distance 1 constraint. However, this is a contradiction to our assumption that the
greedy algorithm always extracts the largest set of alerts first because the actual largest grouping
with a Hamming distance of 1 is a grouping with alerts 4 and 5. Q.E.D
6. EXPERIMENTAL DESIGN
We apply variable Hamming aggregation to four distinct sets of data. To show general applicabil-ity
of the approach, we evaluate DNS request logs, Cisco v5 Netflow data, and Microsoft Win-dows
security logs. For comparative purposes, we evaluate the Snort IDS dataset from [6].
The Windows logs were Event Viewer security logs (.EVTX) monitoring the file system accesses
of 11 workstations over a period of 6 months in 2013. We performed extensive tests on aggregat-ing
other Windows security event types but omit the results due to space limitations (the results
are similar except for variance in the optimal Hamming distance required to minimize the number
of data elements). We chose file system access events for this work as an example Windows alert
type with both a large number of instances in the data set as well as a large number of data fields.
11. International Journal of Network Security Its Applications (IJNSA), Vol.6, No.5, September 2014
43
Table 2. Experiment Data Sources
Data source Number of
fields
Number of
records
Snort alerts 11 2300000
DNS request
7 6310856
logs
Cisco Netflow
records*
13 14066423
Windows file
system events
24 78485
*Typical Cisco v5 Netflow records contain 22 fields; 9 of which (2 mask fields, 2 padding fields,
2 ASN-related fields, 2 interface-related fields, and the nexthop field) could not be populated due
to hardware limitations.
Our experiment compares our hypergraph aggregation algorithm against the column-based algo-rithm
in [6] (that, as discussed previously, we modified to enable aggregation across variable
Hamming distances). We focus on varying the Hamming distance and batch sizes in order to see
the effect on the number of meta-alerts and data fields presented to the analysts while recording
execution times.
All experiments were performed on commodity computers using 3GHz quad-core Intel proces-sors
and 8GB of RAM running Python version 2.7.2.
7. EXPERIMENTAL RESULTS
The hypergraph approach of the present work, even when threshold values are used, reliably pro-duces
aggregation results that are as good as or better than the work of [6]. This is in respect to
both the number of meta-alerts created, as well as the total reduction in data elements. We also
find that increasing batch sizes results in a significant reduction in the number of meta-alerts and
a lesser reduction in the number of data elements provided to the analysts (for both algorithms).
Universally across our 4 data sets, the optimal Hamming distance for reduction of data elements
was small, varying from 1 to 4. With respect to execution time, our hypergraph based algorithm
can be orders of magnitude faster at lower Hamming distances. It is these Hamming distances that
should be most useful as the alerts are abstracted the least and the number of data fields is minim-al.
In some low Hamming distance cases, the column based approach was incomputable, making
use of the hypergraph algorithm necessary. Additionally, the preference towards larger batch sizes
further supports use of our O(nlogn) hypergraph approach over the O(n2) column based algo-rithm.
We now discuss the empirical results supporting these findings.
7.1 Reduction in Number of Meta Alerts Through Increasing Hamming Distance
As the Hamming distance of aggregation increases, the total number of meta-alerts drops in a
non-increasing fashion, to the trivial minimum of 1 when the Hamming distance is equal to the
number of columns (since in a data set with m columns, the maximum possible Hamming dis-tance
between two items is m, and hence all items may be collected within a single meta-alert).
Results for Snort, DNS, and Flow data are provided in Figure 2, below. Note that in each case,
there is a Hamming distance at which the number of meta-alerts drops dramatically; in all cases
that we examined, this distance corresponds to the optimal Hamming distance with respect to
total number of data elements presented (see the following section).
12. International Journal of Network Security Its Applications (IJN
Figure 2. Reduction in number of meta
IJNSA), Vol.6, No.5, September
SA), 2014
. meta-alerts as Hamming distance increases.
7.2 Reduction in Number of Data Elements Through
Adjusting Hamming Distance
We empirically discovered that, for each data type, there exists a Hamming distance that min
mini-mizes
mizes the number of data elements presented to the user. Furthermore, these global optimal
Hamming distances are small (between 1 and 4) 4
for all data types as shown in Table
3, even
though the number of fields varies between 7 and 24. This is important because these Hamming
distances are those for where the meta-meta
alerts are abstracted the least and are most likely t
to be ope-between
rationally useful. Note the correlation in
Table 3 between the optimal Hamming distance and the
number of fields.
Table 3. Optimal Hamming Distance for Hypergraph Algorithm Data Element Reduction
Data source
Snort alerts
DNS request
logs
Flow records
EVTX records
. Number of
fields
Optimal
Hamming dis-tance
11 2
7 1
13 3
24 4
Discussed further in later sections, the column based approach of
tage with respect to execution time at these lower Hamming distances
We now look at data element
[6] has a significant disadva
distances.
disadvan-show
reduction curves and compare the two algorithms.
show the proportion of displayed data elements in the aggregated
Figure 3, Figure 4, and Figure 5
results as compared to the original
Snort alerts, DNS requests, and flow files, , respectively
respectively. In all
cases with the exception of flow data, note that t
hat when the values are distinguishable, the hype
hyper-based
graph-based algorithm displays fewer elements than the column
hypergraph approach is thresholding the total number of permitted hyperedges. In the case of
flow data, the column-based algorithm outperforms the hypergraph approach for several values of
Hamming distance; this is attributable to the thresholding, and the advantage to the column
algorithm vanishes if the threshold is expanded to 500 (not shown). It is also w
for the flow data, the total aggregation time required at Hamming distance 4 (the optimal distance
for the column-based algorithm) for the
hypergraph approach, using a batch size of 10
approach (discussed in more detail below), this further limits its usefulness in this case.
rithm column-based approach, even though the
proach tance; column-based
orth worth noting that
column-based algorithm is 123.6s, versus 60.7
sus 60.7s for the
10000. Due to the poor scaling of the column
000. column-based
44
13. International Journal of Network Security Its Applications (IJN
IJNSA), Vol.6, No.5, September
Figure 3. Data element reduction for Snort alerts using a batch size of 10000.
thresholded at 250 hyperedges per node.
Hypergraph aggregation
Figure 4. Data element reduction for DNS requests using a batch size of 10000. Hypergraph aggregation
thresholded at 250 hyperedges per node.
. Figure 5. Data element reduction for flow data using a batch size of 10000. Hypergraph aggregation thr
sholded at 250 hyperedges per node.
. 7.3 Reduction in Number of Meta Alerts Through Increasing
Batch Sizes
thre-
The reduction in number of meta-meta
alerts can also be influenced by the number of alerts available
for aggregation. As additional alerts are added to the set available for aggregation, the likelihood
that two alerts will fall within the minimum required Hamming distance of each other increases,
and so the average number of meta-meta
alerts may be expected to approach some limit determined by
the inherent redundancy of the data.
Figure 6 displays this effect for the DNS, Snort alert, and
Netflow logs, , each aggregated under their respective optimal Hamming distances, using a thr
thre-algorithm
shold of 250; comparisons with the column-column
based algorithm are not provided due to prohibitive
running times. . Note that at a batch size of 1000 alerts the algorithm is still capable of substantial
SA), 2014
45
14. International Journal of Network Security Its Applications (IJN
SA), 2014
reduction of alerts; however as more alerts are added to the collection the number of meta
remaining relative to the original number of alerts decreases sharply for both the Snort alert and
the Netflow data sets, and a slight but measurable amount for the DNS request record data set.
Figure 6. Ratio of meta-alerts to alerts
meta-alerts
Thus, following the selection of an optimal Hamming distance, the overall aggregation perfo
mance of the algorithm may be further improved by increasing the total number of
to it. While there is little reason to suspect that the column
improvements with respect to aggregation effectiveness (and limited experimentation
– not shown
– confirms that at the batch sizes tested the same monoto
complexity of that algorithm generally rapidly reduces the utility of this approach. The improved
time complexity of the hypergraph algorithm allows much greater flexibility in the selection of
the batch size to aggregate, thus enabling further gains in aggregation.
7.4 Reduction in Number of Data Elements Through Increasing Batch Sizes
perfor-alerts
provided
While selecting the correct hamming distance can allow for optimization of
elements presented to an analyst, the
aggregation can also have a significant impact.
number of data elements for comparatively small batch sizes across three data types, while
8 shows the impact for a larger range of batch sizes for Snort alerts.
In minimizing the number of data elements presented to an analyst, our aggregation system
should use as large a batch size as is feasible given time and memory constraints a
Hamming distance for the data type in question. The
empirically to change with variations in
batch.
Figure 7. Data
IJNSA), Vol.6, No.5, September
for DNS, Snort, and Flow data types at optimal Hamming distances.
alerts
column-based algorithm would not see similar
monotonic effect appears to hold), the high time
the number of data
total number of alerts that are presented to the algorithm for
Figure 7 shows the impact of batch size on the
optimal Hamming distance does not
batch size, and so may be rapidly select
. Element Reduction as Batch Size Increases
46
nic Figure
at the optimal
appear
selected via a test
15. International Journal of Network Security Its Applications (IJN
IJNSA), Vol.6, No.5, September
SA), 2014
Figure 8. Data Element Reduction for Snort alerts (Hamming distance 2) with
eduction Larger Batch
graph
Algorithm Thresholded at 250)
7.5 Execution Time Comparison
We compared the execution time of the column
atch Sizes; Hyper-against
column-based approach of [6] against our hypergraph
algorithm using all four data sets. We focused on both varying the number of alerts processed as
well as the Hamming distance used. In
Figure 9, , we show the execution time of both algorithms
on an increasing number of Snort alerts.
We chose a Hamming distance of 2 because that
was empirically discovered to optimize the
number of data fields presented to analysis (
discussed earlier). The column-based approach takes
increasingly more time, as was expected given its O(
n2) complexity, compared to the O
hypergraph approach.
Figure 9. . Aggregation of Snort Alerts at Hamming distance 2
The difference between the algorithms is even more pronounced at Hamming distance 1. Revisi
ing the analysis of [6], at Hamming distance of 1 (not sho
an order of magnitude slower for daily alert bins (containing a mean of 84023 alerts).
, shown) the column-based algorithm is over
In Figure 10 we explore this variance in
execution time relative to the Hamming distance
the EVTX data. The column-based approach performs extremely well at higher Hamming di
tances but the execution n time increases dramatically at low Hamming distances. It fails even to
complete at Hamming distances less than 3
due to the EVTX data having 24 columns,
among our data sets. . The column-column
based approach does complete at all Hamming distances for
other data sets, but can be over an order of magnitude slower. The hypergraph algorithm performs
very fast at low Hamming distances but slows down in the mid-mid
ranges, preventing its completion
for some data sets due to the combinatorial term in the com
complexity analysis unless thresholding is
used. Using thresholding, the hypergraph algorithm
can execute fast for all data sets and Ha
47
ally O(nlogn)
Revisit-based
using
dis-n
the largest
the
, Ham-
16. International Journal of Network Security Its Applications (IJNSA), Vol.6, No.5, September 2014
ming distances. Note how thresholding causes spikes in the execution time as the combinatorial
term is decreased to values below the threshold. At the minimum threshold value of 1, the execu-tion
48
time becomes so small as to overlap the x-axis in Figure 10.
The lower Hamming distances are ideal candidates for operational use. At these distances, the
alerts are abstracted the least and thus directly provide details to aid human analysis. Given the
inability of the column approach to process certain scenarios (especially those most likely to be
operationally useful), we claim that the hypergraph algorithm is necessary for variable Hamming
distance alert aggregation and not just an incremental improvement over our expanded version of
the column based approach.
Figure 10. Mean Execution Time to Aggregate EVTX Alerts
8. INTUITIVE UNDERSTANDING AND OPERATIONAL USE
While explained in [6], we review here the intuition behind the usefulness of the Hamming Dis-tance
aggregation approach in reducing the cognitive burden to the analysts. We also discuss our
experience with the approach and qualitative results from operational use.
In enterprises with well-funded security teams, there may exist a requirement to review every
alert produced from a certain set of sensors. The example set of operational alerts from [6] meas-ured
2.3 million Snort alerts a month. For each alert, an analyst would have to review the corres-ponding
set of values in the data fields. To evaluate the next alert, the analyst would have to re-peat
this procedure. The alerts are thus evaluated independently and there is a mental context
switch between alerts. Related alerts are often scattered throughout the data set, requiring explicit
searching by the analyst to find them.
Hamming distance aggregation groups together related alerts into meta-alerts (it doesn’t delete
any alerts or data). A grouping of alerts share many data fields in common. An analyst only has to
read and absorb each common data field once for the entire group. In analyzing each individual
alert, only the unique field values need to be perused. In some cases, the analyst can discount an
entire group based on the common values and avoid even read the unique values. This can hap-pen,
for example, in a horizontal attack scenario where the unique values are the individual ad-dresses
attacked. Based on the common values, the analysts can evaluate the group of alerts.
Without Hamming Distance aggregation, such related alerts would be scattered and hidden
among many other alerts.
Analysts often find that Hamming Distance aggregation creates large groupings out of the many
‘trash’ alerts that can be discarded after an evaluation of the common fields. Related attacks may
end up in small groups. Unusual attacks tend not to be grouped and stand out to analysts because
of the small group size. This said, Hamming Distance aggregation does not guarantee to put
17. International Journal of Network Security Its Applications (IJNSA), Vol.6, No.5, September 2014
‘trash’ in large groups and interesting alerts in small groups. A more significant benefit is in
bringing together duplicate but scattered data fields values so that each value can be read only
once by the analyst (as opposed to being re-read every time an instance of the duplicated value is
encountered). Such benefits enhance the ability of large security teams to review, for example, the
2.3 million alerts per month cited in [6].
To validate our results empirically (though qualitatively), our Hamming Distance aggregation
code was provided to operational analysts who used it on actual security logs taken from a large
enterprise network. Their feedback supports the usefulness of the approach through informal
evaluation. Qualitative feedback indicates that Hamming distance based alert reduction at low
Hamming distances both reduces analysis time and enhances interpretation of the alerts. Quantifi-cation
of these observations require a formal human study, were outside the scope of our experi-ments,
and so must be addressed in future work.
49
9. CONCLUSIONS
Variable Hamming Distance alert aggregation can reduce the cognitive load on the analysts with
respect to minimizing the number of alerts and data elements. Our aggregation algorithms enable
efficient human review of large sets of original alerts without removing or abstracting away any
data. While the algorithm in [6] successfully reduced the number of data elements, it only consi-dered
a Hamming distance of 1, limiting its ability to effectively aggregate some data without
manual adjustments. In addition, its limited scaling with respect to the number of data points as
well as the dimensionality of that data restrict its applicability to large data sets, which we dem-onstrate
further reduces its ability to effectively aggregate them. We present an algorithm with
improved worst-case time complexity capable of handling arbitrary Hamming distance aggrega-tion,
as well as an approximate implementation that can significantly reduce the constant terms in
the time and memory complexity to no greater than a specified maximum value. We demonstrate
that even the approximate version of the algorithm has performance with respect to aggregation at
least equal to that of the original algorithm in [6] while improving time requirements for optimal
hamming distances, particularly when using large batch sizes. This improvement in time com-plexity
allows for aggregation of larger batches of data, thus further improving the effectiveness
of aggregation in practice. Our algorithm has further benefits in its ability to handle on-line or
streaming data, which we propose to examine further in future work. Future work will also ex-amine
the relationship between processing time, assessment accuracy, and perceived effort on the
part of analysts under different levels of alert aggregation and data element reduction.
10. ACKNOWLEDGMENTS
This research was sponsored by the U.S. National Institute of Standards and Technology and the
Army Research Labs, and was partially accomplished under Army Contract Number W911QX-
07-F-0023. The views and conclusions contained in this document are those of the authors, and
should not be interpreted as representing the official policies, either expressed or implied, of the
Army Research Laboratory or the U.S. Government. The U.S. Government is authorized to re-produce
and distribute reprints for Government purposes, notwithstanding any copyright notation
hereon.
11. REFERENCES
[1] H. Farhadi, M. AmirHaeri and a. M. Khansari, Alert Correlation and Prediction Using Data Mining
andHMM, The ISC International Journal of Information Security, pp. 1-25, 2011.
[2] H. Debar and A. Wespi, Aggregation and Correlation of Intrusion-Detection Alerts, in Recent Ad-vances
in Intrusion Detection, Springer, 2001, pp. 85--103.
[3] F. Cuppens, Managing alerts in a multi-intrusion detection environment, in Proceedings of the 17th
Annual Computer Security Applications Conference, 2001.
18. International Journal of Network Security Its Applications (IJNSA), Vol.6, No.5, September 2014
[4] J. J. Treinen and R. Thurimella, A framework for the application of association rule mining in large
intrusion detection, in Recent Advances in Intrusion Detection, Berlin Heidelber, Springer-Verlag,
2006, pp. 1-18.
[5] M. Roesch, Snort -- lightweight intrusion detection for networks, Proceedings of the 13th USENIX
50
conference on System administration, pp. 229--238, 1999.
[6] R. E. Harang and P. Guarino, Clustering of Snort Alerts to Identify Patterns and Reduce Analyst
Workload, MILCOM proceedings, 2012.
[7] P. Mell, Hyperagg: A Python Program for Efficient Alert Aggregation, 2013. [Online]. Available:
http://csrc.nist.gov/researchcode/hyperagg-mell-20131227.zip
[8] J. Zhou, M. Heckman and B. C. A. B. M. Reynolds, Modeling network intrusion detection alerts for
correlation, ACM Transactions on Information and System Security, vol. 10, no. 1, 2007.
[9] A. Siraj and R. B. Vaughn, Alert Correlation with Abstract Incident Modeling in a Multi-Sensor
Environment, International Journal of Computer Science and Network Security, pp. 8-19, 2007.
[10] A. Siraj and R. Vaughn, A cognitive model for alert correlation in a distributed environment, Intel-ligence
and Security Informatics, pp. 1017--1028, 2005.
[11] B. Morin, L. Me, H. Debar and M. Ducasse, M2D2: A formal data model for IDS alert correlation,
Proceedings of the 5th international conference on Recent advances in intrusion detection, pp. 115--
137, 2002.
[12] P. Ning, D. Xu, C. G. Healey and R. S. Amant, Building Attack Scenarios through Integration of
Complementary Alert, Proceedings of the 11th Annual Network and Distributed System Security
Symposium, 2004.
[13] T. Chyssler, S. Burschka, M. Semling, T. Lingvall and K. Burbeck, Alarm Reduction and Correla-tion
in Intrusion Detection Systems, in DIMVA, Dortmund, Germany, 2004.
[14] B. Morel, Anomaly Based Intrusion Detection and Artificial Intelligence, in Intrusion Detection
Systems.
[15] A. Hofmann and B. Sick, Online Intrusion Alert Aggregation with Generative Data Stream Model-ing,
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, pp. 282-294,
2011.
[16] L. Wang, A. Liu and S. Jajodia, An Efficient and Unified Approach to Correlating,Hypothesizing,
and Predicting Intrusion Alerts, Computer Security--ESORICS 2005, pp. 247-266, 2005.
[17] H. Gabra, A. Bahaa-Eldin and H. Mohamed, Data Mining Based Technique for IDS Alerts Classifi-cation,
arXiv preprint arXiv:1211.1158, 2012.
[18] Z. a. X. X. a. H. Z. a. D. S. He, Fp-outlier: frequent pattern based outlier detection, Computer
Science and Information Systems/ComSIS, vol. 2, no. 1, pp. 103--118, 2005.
[19] T. Cormen, C. Leiserson and R. Rivest, in Introduction to Algorithms, The MIT Press, McGraw-Hill
Book Company, 1994.
[20] V. Chvatal, A Greedy Heuristic for the Set-Covering Problem, Mathematics of Operations Re-search,
vol. 4, no. 3, pp. 233-235, 1979.
[21] P. Slavik, A tight analysis of the greedy algorithm for set cover, in STOC '96 Proceedings of the
twenty-eighth annual ACM symposium on Theory of computing, 1996.