Successfully reported this slideshow.

2014.7.9 detecting p2 p botnets through network behavior analysis and machine learning


Published on

lab presentation

Published in: Software
  • Be the first to comment

  • Be the first to like this

2014.7.9 detecting p2 p botnets through network behavior analysis and machine learning

  1. 1. Detecting P2P Botnets through Network Behavior Analysis and Machine Learning Sherif Saad, Issa Traore et al. 2011 PST (Ninth Annual International Conference on Privacy, Security and Trust)
  2. 2. Outline • Introduction • Related Work • Network Behavior Analysis • Experiment and Evaluation • Conclusion
  3. 3. Introduction • IRC and HTTP-based botnets are vulnerable because they are based on highly centralized architectures. • Currently the new trend in botnet communication is toward Peer-to- Peer architectures. • Bot master can inject commands in to any part of the P2P botnet.
  4. 4. Centralized architecture
  5. 5. Decentralized architecture
  6. 6. Botnet Lifecycle • Leonard et al divided the botnet lifecycle into three phases, namely, Formation, C&C communication, and attack phases. • Most of recent research detects botnet during the formation or the attack phase. • This paper focus on detecting bots during the C&C phase.
  7. 7. Formation Phase Injection, unwanted download binary. Web browsing, etc. Compromised Binary server
  8. 8. C&C Communication Phase Propagate instructions Periodical connection, Update status. Compromised C&C server
  9. 9. Attack Phase DDoS attack, spread spam, or steal personal user information. Compromised computers Victim
  10. 10. Related Work • Several studies have shown that network traffic identification can effectively distinguish between different classes of network applications. • Recently, many of the literature in this field focuses on analyzing P2P botnet.
  11. 11. Using Network Behaviors Analysis To Detect Botnet • It’s possible to detect bots during any phase of their lifecycle. • It’s less expensive compared to other approaches like implement deep-payload-analysis or attempt to capture and study live bots using honeynets.
  12. 12. Detecting Bots During C&C Phase • Allows detecting bots that were missed during the formation phase and before they launch their attack and cause some damages.
  13. 13. Network Behavior Analysis • In general, there are three categories of network traffic identification methods: • Port-based analysis • Protocol-based analysis • Behavior-based analysis • Network traffic information can usually be easily retrieved from various network devices without affecting significantly network performance or service availability.
  14. 14. Network Behavior Analysis • Each of the existing major botnet (for instance Storm and Zeus.) implements their own specific C&C architecture. • Such architectures tend to exhibit distinguishing behaviors that can be captured by analyzing network traffic characteristics. • Identifying specific traffic characteristics can be used to distinguish between botnets traffic and other network application traffic.
  15. 15. Traffic Characteristics • Payload size • Number of packets • Duplicated packets length • Concurrent active ports
  16. 16. Features Selection • Flow-based features • Used to link flows to specific class of network traffic such as P2P traffic or non-P2P traffic. • Host-based features • Occur in the communications between hosts. • Identify host with shared communications patterns. • 17 features extracted.
  17. 17. Flow-Based Features • Source IP, Source Port, Destination IP, Destination Port, Protocol. • Packet Length, Average Packet Length, Length of First Packet. • Total Number of Packets per Flow. • Total Number of Bytes per Flow. • Incoming Packets over Outgoing Packets. • Packets of Same Length over Total Number of Packets in Same Flow. • Total Bytes of All Packets over Total Number of Packets in Same Flow.
  18. 18. Host-Based Features • Ratio of Number of Source Ports to The Number of Destination Ports. • The Number of Connections over The Number of Destination IP. • The Sum of Different Transmission Protocols used per Destination IP over The Total Number of Destination IPs. • The Number of Destination IPs Connected to The Same Open Port in The Monitored Host over The Total Number of Open Ports in The Monitored Host.
  19. 19. Experiment • Datasets • Malware traffic • French chapter of the honeynet project, involving the Storm and the Walowdac botnet. • Such traffic doesn’t generate regular benign traffic that typically would occur in a real world scenario. • Non-malicious traffic • Labeled dataset from the Traffic Lab at Ericsson Research in Hungary. • User-generated normal traffic • The traffic in the dataset should be intermixed as if both kinds of traffic were happening at same time from the same machines.
  20. 20. Malware Network Traffic • The trace file corresponds to the C&C and attack phase of the storm and Walowdac botnet as the bot master used this machine to spread spam.
  21. 21. Malware Network Traffic
  22. 22. Non-Malicious Traffic • Contains over a million packets of general traffic that ranges from web browsing to P2P traffic and gaming such as World of Warcraft. • Every packet was labeled with the originating or the target process running on the test machines.
  23. 23. Non-Malicious Traffic
  24. 24. Datasets Merging • Mapped the IP addresses of the infected machines to two of the machines in benign dataset. • Replayed all of the trace files using TcpReplay tool on the same network interface card. • Use capturing tool, such as wireshark, to listen on network interface and capture the output to a file.
  25. 25. Datasets Merging
  26. 26. Evaluation • Parse the network traffic dataset and extracts 129,453 feature vectors, which are labeled into three classes, namely, Botnet C&C, non-P2P traffic, and normal P2P traffic. • Use 10-fold cross-validation and machine learning tools, like Weka to evaluate their approach.
  27. 27. Evaluation
  28. 28. Evaluation
  29. 29. Conclusion • They design a model using network traffic characteristic to detect P2P botnet (Storm and Walowdac). • They experiment 5 popular MLA to classify malicious traffic.