Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
A Hybrid Anomaly Detection
Model using G-LDA
Bhavesh Kasliwal, Shraey Bhatia, Shubham
Saini, I.Sumaiya Thaseen, Ch.Aswani ...
Typical IDS
Data
Collection
Data Pre-
Processing
Intrusion
Identification
Response
This work mainly focused on Intrusion
I...
Architecture
Attribute Selection
“With more data, the simpler solution can be
more accurate than the sophisticated
solution.”
 Selecti...
Selected Attributes
Selected Attributes
logged_in
Serror_rate
srv_serror_rate
Same_srv_rate
diff_srv_rate
dst_host_serror_...
Training Set Selection (using
LDA)
 Latent Dirichlet Allocation is a
generative model that allows sets of
observations to...
Sample LDA Output
Topic 0th:
0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,a...
Genetic Algorithm
Genetic Algorithm
 Applied on Normal and Anomaly
packets separately
 Threshold value taken for providing a
negative weig...
Identifying nature of incoming
packet
 For each selected attribute value Fi in incoming packet
◦ If Fi ∈ Vi
 Si = (A* Fr...
Additional Weight
 Multiplied to the anomaly frequency
 Why ?
 generic anomalies having diverse values
 unlike the nor...
Additional Weight
Results
 Tested against 50000 anomaly and
50000 normal packets from
KDDCup’99 dataset.
 88.5% Accuracy with 6% FPR
Future Work
 Focus on specific anomaly types
 Better Attribute Selection algorithm ?
◦ oneR
◦ Entropy based
◦ Chi-square...
QUESTIONS?
Upcoming SlideShare
Loading in …5
×

A hybrid anomaly detection model using G-LDA

2,424 views

Published on

Presentation for the paper presented at 4th IEEE International Advance Computing Conference
Bhavesh Kasliwal, Shraey Bhatia, Shubham Saini, I.Sumaiya Thaseen, Ch.Aswani Kumar. "A Hybrid Anomaly Detection Model using G-LDA" at 4th IEEE International Advance Computing Conference. (21-22 Feb, 2014)

Published in: Technology
  • Be the first to comment

  • Be the first to like this

A hybrid anomaly detection model using G-LDA

  1. 1. A Hybrid Anomaly Detection Model using G-LDA Bhavesh Kasliwal, Shraey Bhatia, Shubham Saini, I.Sumaiya Thaseen, Ch.Aswani Kumar. VIT University – Chennai
  2. 2. Typical IDS Data Collection Data Pre- Processing Intrusion Identification Response This work mainly focused on Intrusion Identification
  3. 3. Architecture
  4. 4. Attribute Selection “With more data, the simpler solution can be more accurate than the sophisticated solution.”  Selection process based on means and modes of numeric attributes  A contrast between the mode values of anomaly and normal patterns with their corresponding means inclined towards the modes
  5. 5. Selected Attributes Selected Attributes logged_in Serror_rate srv_serror_rate Same_srv_rate diff_srv_rate dst_host_serror_rate dst_host_srv_serror_rate A strong contrast between the trends of a selected and discarded attribute visible
  6. 6. Training Set Selection (using LDA)  Latent Dirichlet Allocation is a generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar.  Apply LDA (separately on anomaly and normal packets) to obtain 200 sets of 10 packets each. Each set dominated by a particular packet type.
  7. 7. Sample LDA Output Topic 0th: 0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly 0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly 0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly 0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly 0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly 0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.26,0,0,0,0,anomaly 0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.26,0,0,0,0,anomaly 0,tcp,telnet,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,125,13,1,1,0,0,0.1,0.06,0,255,0.03,0.07,0,0,1,1,0,0,anom aly 0,tcp,uucp,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,135,9,1,1,0,0,0.07,0.06,0,255,0.04,0.07,0,0,1,1,0,0,anoma ly 0,tcp,vmnet,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,258,10,1,1,0,0,0.04,0.05,0,255,0.04,0.05,0,0,1,1,0,0,ano maly Topic 1th: 0,tcp,finger,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,14,3,1,1,0,0,0.21,0.29,0,255,0.25,0.02,0.01,0,1,1,0,0,ano maly 0,tcp,finger,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,246,20,1,1,0,0,0.08,0.06,0,255,0.08,0.07,0,0,1,1,0,0,ano maly 0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.55,0.01,0.55,0,0,0,0,0,an omaly 0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.56,0.02,0.56,0,0.01,0,0,0 ,anomaly 0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.6,0.01,0.6,0,0,0,0,0,ano
  8. 8. Genetic Algorithm
  9. 9. Genetic Algorithm  Applied on Normal and Anomaly packets separately  Threshold value taken for providing a negative weight  Run for 3 generations  Top 3 values for anomaly and normal packets used
  10. 10. Identifying nature of incoming packet  For each selected attribute value Fi in incoming packet ◦ If Fi ∈ Vi  Si = (A* Frequency of Fi in Anomaly) – (Frequency of Fi in Normal) ◦ Else  Si= 0  C = Σ Si  If C > 0 ◦ Then Anomaly  Else Normal
  11. 11. Additional Weight  Multiplied to the anomaly frequency  Why ?  generic anomalies having diverse values  unlike the normal packets that contain values in a particular range • Trade-off between the accuracy and the false positive rate required
  12. 12. Additional Weight
  13. 13. Results  Tested against 50000 anomaly and 50000 normal packets from KDDCup’99 dataset.  88.5% Accuracy with 6% FPR
  14. 14. Future Work  Focus on specific anomaly types  Better Attribute Selection algorithm ? ◦ oneR ◦ Entropy based ◦ Chi-squared ◦ randomForest  Better classification technique ? ◦ Clustering – Hierarchical , K-Means ◦ Decision Trees
  15. 15. QUESTIONS?

×