Feature selection for detection of peer to-peer botnet traffic

1,599 views
1,313 views

Published on

Slides of my paper presentation at the ACM Compute 2013 conference (22-24 Aug 2013, Vellore, India).

Published in: Technology
3 Comments
3 Likes
Statistics
Notes
No Downloads
Views
Total views
1,599
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
78
Comments
3
Likes
3
Embeds 0
No embeds

No notes for slide

Feature selection for detection of peer to-peer botnet traffic

  1. 1. BITS Pilani Hyderabad Campus Pratik Narang, Jagan Mohan Reddy, Chittaranjan Hota BITS Pilani, Hyderabad Campus narangpratik@gmail.com 23rd August 2013 ACM Compute 2013, Vellore Feature Selection for Detection of Peer-to-Peer Botnet traffic
  2. 2. Outline • Introduction o P2P Networks o P2P Botnets • Work overview • Related Work • Our work o Generating traffic o Feature extraction & selection o Evaluation of feature selection techniques o Future scope of work
  3. 3. What is a P2P Network? A D E F G H F H GA E C C B P2P overlay layer Native IP layer D B AS1 AS2 AS3 AS4 AS5 AS6
  4. 4. Generic P2P Architecture Capability & Configuration Peer Role Selection Operating System NAT/ Firewall Traversal Routing and Forwarding Neighbor Discovery Join/Leave Bootstrap Overlay Messaging API Content Storage Search API
  5. 5. Uses & Misuses 5
  6. 6. Traditional Botnets Bot-Master
  7. 7. Peer-to-Peer Botnets Bot-Master
  8. 8. Work overview  Evaluation of 3 feature selection algorithms-  Correlation-based Feature Selection  Consistency-based Subset Evaluation  Principal Component Analysis  Models built with 3 machine learning algorithms-  Naïve Bayes classifier  Bayes Networks  C4.5 Decision trees  Performance evaluation for the detection of some recent and well-known P2P botnets.
  9. 9. Related work • Early work using feature selection algorithms [1] [2] used the DARPA dataset, which is no longer suitable for today’s security research. • Early approaches for P2P botnet detection [3] applied static, port based analysis- easily defeated by modern botnets. • Recent work [4] [5] has employed machine learning and data mining techniques for detection of P2P botnets.
  10. 10. Our work Machine Learning Algorithms Bayes Network Naïve Bayes C4.5 Decision Trees Feature Selection Correlation-based Feature Selection Consistency-based Subset Evaluation Principal Component Analysis Feature Extraction source min. packet size dest. TCP Push flag count source avg. packet size dest. total volume duration … Flow Extraction <Source IP, Source port, Destination IP, Destination port, Protocol> Network captures jNetPcap Library with Java module
  11. 11. Generating Traffic Botnet traffic generation Internet Info. Sec. Lab Dist. Sys. Lab Multimedia Lab Hostels Wing Data collection for P2P and web traffic Anonymization (Anon tool) Botnet detection module Firewall Core Switch 6509 Distribution Switch 4500 Access Switch 2500 Content Mgmt. Application Servers DB Cluster IDS Ethernet
  12. 12. Dataset Data Application Number of flows Benign data HTTP, HTTPS, SMTP, FTP, POP 30,000 flows P2P apps- eMule, BitTorrent, Mute, Gnutella etc. 50,000 flows Botnet data [4,5] Zero Access 720 flows SkyNet 770 flows Waledac 80,000 flows Storm 2,20,000 flows
  13. 13. Feature Extraction & Selection • A ‘Flow’ defined by: • <Source IP, Source port, Dest. IP, Dest. port, Protocol> • Features extracted from each flow: • Packet count (bi-directional) • Packet size (bytes) (min, max, mean and standard deviation) (bi-directional) • Total volume (bytes) (bi-directional) • Inter-arrival times (min, max, mean and standard deviation) (bi-directional) • TCP Push flag count (bi-directional) • Duration of the flow (no context of direction) • TOTAL - 23 features extracted from each flow
  14. 14. Feature Extraction & Selection • Three Feature Selection techniques used: 1. Correlation-based Feature Selection (CFS) 2. Consistency-based Subset Evaluation (CSE) 3. Principal Component Analysis (PCA) • Evaluated with three algorithms: 1. Naïve Bayes 2. Bayes Network 3. C4.5 Decision Trees
  15. 15. Feature Extraction & Selection Feature Selection Search method No. of features Description CFS Best first search 5 source packet count, source min. packet size, source max. packet size, dest. max. packet size, source inter-arrival time std. CSE Best first search 8 source min. packet size, source max. packet size, dest. max. packet size, source avg. packet size, dest. avg. packet size, source max. inter-arrival time, flow duration, source volume PCA - 12 A linear combination of features
  16. 16. Evaluation of Feature Selection Techniques 0 10 20 30 40 50 60 70 80 90 100 NaiveBayes BayesNet C4.5 85.2 97.08 98.23 81.51 95.92 98.18 80.24 96.2 98.23 82.16 96.67 98.17 Accuracyin% Classification Algorithm Full CFS CSE PCA 93 94 95 96 97 98 99 NaiveBayes BayesNet C4.5 98.9 96.9 98.9 95.2 95.3 98.9 96.1 95.7 99 95.4 96.2 98.9 DetectionRatein% Classification algorithm Full CFS CSE PCA FNTNFPTP TNTP Accuracy    FNTP TP rate  Detection
  17. 17. Evaluation of Feature Selection Techniques 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 NaiveBayes BayesNet C4.5 Normalizedclassificationspeed Classification Algorithm Full CFS CSE PCA 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 NaiveBayes BayesNet C4.5 NormalizedBuildTimes Classification Algorithm FULL CFS CSE PCA
  18. 18. Primary Observations
  19. 19. Future Scope  Ensemble of classifiers (Work in Progress- paper submitted to I-CARE 2013)  Close-to-real-time Detection Tool (Work in progress)  Space-efficient data structures
  20. 20. References 1. A. H. Sung and S. Mukkamala. The feature selection and intrusion detection problems. In Advances in Computer Science-ASIAN 2004. Higher-Level Decision Making, pages 468–482. Springer, 2005. 2. S. Chebrolu, A. Abraham, and J. P. Thomas. Feature deduction and ensemble design of intrusion detection systems. Computers & Security, 24(4):295–307, 2005. 3. R. Schoof and R. Koning. Detecting peer-to-peer botnets. University of Amsterdam, 2007. 4. S. Saad, I. Traore, A. Ghorbani, B. Sayed, D. Zhao, W. Lu, J. Felix, and P. Hakimian. Detecting p2p botnets through network behavior analysis and machine learning. In Privacy, Security and Trust (PST), 2011 Ninth Annual International Conference on, pages 174–180. IEEE, 2011. 5. B. Rahbarinia, R. Perdisci, A. Lanzi, and K. Li. Peerrush: Mining for unwanted p2p traffic. In DIMVA. 2013.
  21. 21. narangpratik@gmail.com Visit our Research Group: www.netclique.in

×