SlideShare a Scribd company logo
BITS Pilani
Hyderabad Campus
ML Approaches to P2P Botnet
Detection
by
Vansh Khurana
Mentors: Dr. Chittaranjan Hota, Pratik Narang and Team
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Introduction to P2P
• Analyzing available Dataset
• Deep Dive into P2P Botnets
• Classification Algorithms
• System Design
• Prelim Results with Extensive Feature Set
• Curse of Dimensionality
• Ensemble Classifier
• Future Work
Today’s Agenda
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• P2P Networks
• Botnets
• Malicious Activities
Introduction
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Decentralized and distributed network architecture
• Peers act as both suppliers and consumers of resources
• Better resource utilization
• No Central Coordination
• Applications:
• Instant Messaging Systems: Skype
• Digital currency: Bitcoin
• Wireless community networks: Netsukuku
• Foreign Currency Exchange Market Place: CurrencyFair
• Content Delivery: Torrent Applications
• File sharing: Gnutella, DC++
• And many more….
P2P Networks
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• P2P Networks
• Botnets
• Malicious Activities
Introduction
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• A network of compromised machines (bots) controlled by a
bot master
• Typical Formation: Spam Example
• A botnet operator sends out viruses or worms, infecting ordinary users'
computers, whose payload is a malicious application—the bot.
• The bot on the infected PC logs into a particular C&C server.
• A spammer purchases the services of the botnet from the operator.
• The spammer provides the spam messages to the operator, who instructs the
compromised machines via the control panel on the web server, causing them
to send out spam messages
• Should We care? -Absolutely!
• Privacy Invasion – Hacked Accounts, Weak Passwords, Credential reuse, social
engineering attacks
• Financial Theft- 10 days of Torpig data valued at $83K to $8.3M (2009)
Botnets
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• P2P Networks
• Botnets
• Malicious Activities
Introduction
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Responsible for (non-exhaustive list):
• Large-scale network probing (i.e., scanning activities)
• Launching Distributed Denial of Service (DDoS) attacks
• Sending large-scale unsolicited emails (SPAM)
• Click-fraud campaign
• Information theft
• Spyware
• Adware
Shift from a for-fun activity towards a profit-oriented business
Malicious Activities
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Introduction to P2P
• Analyzing available Dataset
• Deep Dive into P2P Botnets
• Classification Algorithms
• System Design
• Prelim Results with Extensive Feature Set
• Curse of Dimensionality
• Ensemble Classifier
• Future Work
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Obtained from University of Georgia: Babak Rahbirinia et al.
• Benign Applications: Emule, Frostwire, uTorrent, Vuze
• ~ 130 gigs of Raw log files
• Malicious Botnet data:
• Storm, ~9 gigs
• Waledac, ~ 1 gig
• Zeus, ~ 100 mb
• Nugache ~100 mb
• Data for Botnets contain only C&C messages
• We build training models to represent real world behaviour –
~80:20
More on Botnets next!
Analyzing available Dataset
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• P2P Botnets: Lifecycle
• Botnets of Interest
• Storm
• Waledac
• Zeus
• Nugache
Deep Dive into P2P Botnets
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Initial infection- Exploit known vulnerability, grant additional capabilities
to the attacker on the target system
• Secondary injection- leverage newly acquired access to execute additional
scripts or programs which then fetch a malicious binary from a known
location
• Connection: bot attempts to establish a connection to the command and
control server through a variety of methods
• Malicious command and control – Doing the damage
• Update and maintenance -bots are commanded to update their binaries,
typically to defend against new attacks or to improve their functionality.
P2P Botnets: Lifecycle
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• P2P Botnets: Lifecycle
• Botnets of Interest
• Storm
• Waledac
• Zeus
• Nugache
Deep Dive into P2P Botnets
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Email Spam Botnet (2007)
• Extent of Damage: 1 million to 50 million computer
systems
• Methodology
• Observed to be defending itself, and attacking computer systems that
scanned for Storm virus-infected computer systems online.
• DDoS counter-attacks, to maintain its own internal integrity
• Fools the antivirus on local system: Actual processes do nothing
• Mostly uses UDP as underlying transport layer protocol
Storm
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• P2P Botnets: Lifecycle
• Botnets of Interest
• Storm
• Waledac
• Zeus
• Nugache
Deep Dive into P2P Botnets
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Email Spam Botnet (2010)
• Extent: ~70k to 90K
• Infection method:
• Email
• Fake Websites
• Bundled with other threats ,such as Trojan.Peacomm, W32.Downadup,
and Trojan.Bredolab
• Typically sends log file every 30 minutes
• Mostly Operates over TCP
• More here:
http://www.symantec.com/security_response/write
up.jsp?docid=2008-122308-1429-99
Waledac
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• P2P Botnets: Lifecycle
• Botnets of Interest
• Storm
• Waledac
• Zeus
• Nugache
Deep Dive into P2P Botnets
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Often used to steal banking information by man-in-the-
browser keystroke logging and form grabbing, web page
injection (2007)
• Spread mainly through drive-by downloads and phishing
schemes
• Extent: 3.6 million in US alone- Operates very stealthily
• Uses both TCP/ UDP protocols // flow concept fails
Zeus
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• P2P Botnets: Lifecycle
• Botnets of Interest
• Storm
• Waledac
• Zeus
• Nugache
Deep Dive into P2P Botnets
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• One of the first sophisticated p2p botnets (2006)
• Created by Jason Michael Milmont, when he was 16!
• TCP port 8 bot, listens on port 8
• Didn’t Use: log files don’t represent theoretical
information
More here:
http://www.symantec.com/security_response/writeup.jsp?
docid=2006-043016-0900-99&tabid=2
Nugache
BITS Pilani
Hyderabad Campus
DEMO 1: Botnet Data Analysis
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Introduction to P2P
• Analyzing available Dataset
• Deep Dive into P2P Botnets
• Classification Algorithms
• System Design
• Prelim Results with Extensive Feature Set
• Curse of Dimensionality
• Ensemble Classifier
• Future Work
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Decision Trees:
• Based on Information Gain
• Fast, ignores irrelevant features
• But can overfit. We use REP Trees and set maximum depth to avoid this
• K- Nearest Neighbour:
• Inherently simple, doesn’t overfit
• Artificial Neural Networks:
• large number of features can be well handled
• Heuristically set hidden layers:
(No. of Features + No. of Class Labels) / 2
• SVM:
• perform extremely complex kernel-based data transformations, and then find an
optimal boundary between the possible outputs based on these transformations.
• pairwise classification approach (one-versus-one)
Classification Algorithms
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Introduction to P2P
• Analyzing available Dataset
• Deep Dive into P2P Botnets
• Classification Algorithms
• System Design
• Prelim Results with Extensive Feature Set
• Curse of Dimensionality
• Ensemble Classifier
• Future Work
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Salient Features
• Background
• Modules
System Design
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• No reliance on Encryption and Deep Packet
Inspection
• Intuitive and Simplistic Model to solve a complex
problem
• Model Bot Behaviour accurately
• Explored Feature Set extensively (~75 Features)
• Explored Non Network based features
• Compression Ratio
• Signal Processing Approach to model network behaviour
• More on this later…
• Most importantly: Achieved Good Results
Salient Features
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Salient Features
• Background
• Modules
System Design
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Shannon’s Source Coding Theorem
• The expected length L of an encoding of X with
associated probability function p(x) is given by:
• Bot data expected to be more uniform, hence should give
more compression
Background
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Discrete Fourier Transform:
• Converts a finite list of equally spaced samples of a function into the list
of coefficients of a finite combination of complex sinusoids, ordered by
their frequencies
• Time domain to frequency domain to extract hidden patterns in botnet communication
• The network communication between a pair of nodes is
treated as a `signal'.
• Given a time sequence X = X(0);X(1) : : :X(w), its Discrete
Fourier Transform (DFT) is given as-
• The first few DFT coefficients contain most of the energy
Background
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Salient Features
• Background
• Modules
System Design
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
Modules
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Introduction to P2P
• Analyzing available Dataset
• Deep Dive into P2P Botnets
• Classification Algorithms
• System Design
• Prelim Results with Extensive Feature Set
• Curse of Dimensionality
• Ensemble Classifier
• Future Work
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Dataset Description
• Used Extensive Feature Set
Prelim results (ACM DEBS)
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
Prelim Results (ACM DEBS)
Overall Precision and Recall
BITS Pilani
Hyderabad Campus
DEMO 2:
System Design & Prelim Results
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Introduction to P2P
• Analyzing available Dataset
• Deep Dive into P2P Botnets
• Classification Algorithms
• System Design
• Prelim Results with Extensive Feature Set
• Curse of Dimensionality
• Ensemble Classifier
• Future Work
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Principal Component Analysis
• Feature Selection Algorithms
Curse of Dimensionality
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Statistical procedure that uses orthogonal transformation to
convert a set of observations of possibly correlated variables
into a set of values of linearly uncorrelated variables
called principal components
• Converted 76 Features to 28 features retaining 95% Variance
• Classifier Accuracy:
• J-48: 97%
• REP Tree: 94.17%
• SVM: 80.61%
• K-NN: 93.29%
• Bayesian Networks: 94.65%
Principal Component
Analysis
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Principal Component Analysis
• Feature Selection Algorithms
Curse of Dimensionality
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Best First Search Based Feature Selection
• Also explored random forest based importance evaluation
• Selected Features are:
• Flow duration, MEDIAN_INTERARRIVAL_TIME,
AVG_PAYLOAD_SIZE,AVG_PAYLOAD_SIZE_SENDING, AVG_PAYLOAD_SIZE_RECVING,
PRIME_WAVE_MAGNITUDE_PAYLOAD, PRIME_WAVE_MAGNITUDE_IAT,
BYTES_SENT_PER_SEC, BYTES_RECVD_PER_SEC, DFT_Payload(1st
and 2nd
co-efficient),
DFT_IAT(1st
and 2nd
Coefficient), Compression
• Classifier Accuracy:
• J-48: 99.7%
• REP Tree: 99.58%
• SVM: 84%
• KNN: 99.5%
• ANN: 92%
Feature Selection
BITS Pilani
Hyderabad Campus
DEMO 3:
PCA and Feature Selection
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Introduction to P2P
• Analyzing available Dataset
• Deep Dive into P2P Botnets
• Classification Algorithms
• System Design
• Prelim Results with Extensive Feature Set
• Curse of Dimensionality
• Ensemble Classifier
• Future Work
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Construct a set of classifiers from the training data
• Predict class label of previously unseen records by
aggregating predictions made by multiple classifiers
Ensemble Classifier
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Ensemble methods work better with ‘unstable
classifiers’
• Classifiers that are sensitive to minor perturbations
in the training set
• Examples:
– Decision trees
– Rule-based
– Artificial neural networks
Ensemble Classifier
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• One way to force a learning algorithm to construct multiple
hypotheses is to run the algorithm several times and provide
it with somewhat different data in each run. This idea is used
in the following methods:
• Majority Voting
• Bagging
• Randomness Injection
• Feature-Selection Ensembles
• Error-Correcting Output Coding.
Ensemble Classifier
Why Majority Voting works?
• Suppose there are 25
base classifiers
– Each classifier has
error rate, ε = 0.35
– Assume errors made
by classifiers are
uncorrelated
– Probability that the
ensemble classifier makes
a wrong prediction:
∑=
−
=−





=≥
25
13
25
06.0)1(
25
)13(
i
ii
i
XP εε
Bagging
• Employs simplest way of combining predictions that
belong to the same type.
• Combining can be realized with voting or averaging
• Each model receives equal weight
• “Idealized” version of bagging:
– Sample several training sets of size n (instead of just
having one training set of size n)
– Build a classifier for each training set
– Combine the classifier’s predictions
• This improves performance in almost all cases if
learning scheme is unstable (i.e. decision trees)
Bagging classifiers
Classifier generation
Let n be the size of the training set.
For each of t iterations:
Sample n instances with replacement from the
training set.
Apply the learning algorithm to the sample.
Store the resulting classifier.
classification
For each of the t classifiers:
Predict class of instance using classifier.
Return class that was predicted most often.
Why does bagging work?
• Bagging reduces variance by voting/
averaging, thus reducing the overall expected
error
– In the case of classification there are pathological
situations where the overall error might increase
– Usually, the more classifiers the better
Stacking
• Uses meta learner instead of voting to
combine predictions of base learners
– Predictions of base learners (level-0 models) are
used as input for meta learner (level-1 model)
• Base learners usually different learning
schemes
• Hard to analyze theoretically: “black magic”
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Introduction to P2P
• Analyzing available Dataset
• Deep Dive into P2P Botnets
• Classification Algorithms
• System Design
• Prelim Results with Extensive Feature Set
• Curse of Dimensionality
• Ensemble Classifier
• Future Work
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Genetic Search and Greedy Classification (June)
• Suggested Improvements and Proposals:
• Clustering to scale up
• Binning to compute Compression Ratio
• Smoothening DFT Curve
• Explore parson’s coding theory
• Repo link (Stay Tuned!)
• https://github.com/vansh21k/P2P-Botnet-Detection-Project
Future Work
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Introduction to P2P
• Analyzing available Dataset
• Deep Dive into P2P Botnets
• Classification Algorithms
• System Design
• Prelim Results with Extensive Feature Set
• Curse of Dimensionality
• Ensemble Classifier
• Future Work
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
[1] H. Hang, X. Wei, M. Faloutsos, and T. Eliassi-Rad. Entelecheia: Detecting p2p botnets in their waiting
stage. In IFIP Networking Conference, 2013, pages 1{9, 2013.
[2] J. Kang and J.-Y. Zhang. Application entropy theory to detect new peer-to-peer botnet with multi-chart
cusum. In Electronic Commerce and Security, 2009.
[3] C. Kanich, N. Weaver, D. McCoy, T. Halvorson,C. Kreibich, K. Levchenko, V. Paxson, G. M. Voelker,
and S. Savage. Show me the money: Characterizing spam-advertised revenue. In USENIX Security
Symposium, pages 15
[4] B. Rahbarinia, R. Perdisci, A. Lanzi, and K. Li. Peerrush: Mining for unwanted p2p trac. In
Detection of Intrusions and Malware, and Vulnerability Assessment, pages 62{82. Springer, 2013.
[5] C. Rossow, D. Andriesse, T. Werner, B. Stone-Gross, D. Plohmann, C. J. Dietrich, and H. Bos. Sok:
P2pwned-modeling and evaluating the resilience of peer-to-peer botnets. In Security and Privacy (SP),
2013 IEEE Symposium on, pages 97{111. IEEE, 2013.
[6] S. Saad, I. Traore, A. Ghorbani, B. Sayed, D. Zhao, W. Lu, J. Felix, and P. Hakimian. Detecting p2p
botnets through network behavior analysis and machine learning. In Privacy, Security and Trust
(PST), 2011 Ninth Annual International Conference on, pages 174{180. IEEE, 2011.
[7] R. Schoof and R. Koning. Detecting peer-to-peer
botnets. University of Amsterdam, 2007. Technical
report.
[8] F. Tegeler, X. Fu, G. Vigna, and C. Kruegel. Botnder: Finding bots in network trac without
deep packet inspection. In Proceedings of the 8th
international conference on Emerging networking
experiments and technologies, pages 349{360. ACM, 2012.
[9] T.-F. Yen and M. K. Reiter. Are your hosts trading or plotting? telling p2p le-sharing and bots apart. In
Distributed Computing Systems (ICDCS), 2010 IEEE 30th International Conference on, pages 241{252.
IEEE, 2010.
References
CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
[10] X. Yu, X. Dong, G. Yu, Y. Qin, D. Yue, and Y. Zhao. Online botnet detection based on incremental discrete
fourier transform. Journal of Networks, 5(5), 2010.
[11] J. Zhang, R. Perdisci, W. Lee, X. Luo, and U. Sarfraz. Building a scalable system for stealthy p2p-botnet
detection. Information Forensics and Security, IEEE Transactions on, 9(1):27{38, 2014.
[12] J. Zhang, R. Perdisci, W. Lee, U. Sarfraz, and X. Luo. Detecting stealthy p2p botnets using statistical trac
ngerprints. In Dependable Systems & Networks (DSN), 2011 IEEE/IFIP 41st International Conference on, pages
21{132. IEEE, 2011.
[13] S. Zhang. Conversation-based p2p botnet detection with decision fusion. Master's thesis, Fredericton:
University of New Brunswick, 2013.
References

More Related Content

Similar to ML Approaches to P2P Botnet Detection

Cybersecurity Opportunities Challenges APNIC
Cybersecurity Opportunities Challenges APNICCybersecurity Opportunities Challenges APNIC
Cybersecurity Opportunities Challenges APNIC
APNIC
 
Cont-Forensic-Analytics-Dipto-14Apr2015-post
Cont-Forensic-Analytics-Dipto-14Apr2015-postCont-Forensic-Analytics-Dipto-14Apr2015-post
Cont-Forensic-Analytics-Dipto-14Apr2015-postDipto Chakravarty
 
Synthetic Data Generation with DoppelGanger
Synthetic Data Generation with DoppelGangerSynthetic Data Generation with DoppelGanger
Synthetic Data Generation with DoppelGanger
QuantUniversity
 
Advanced Dos Detection (Icghc2016)
 Advanced Dos Detection (Icghc2016) Advanced Dos Detection (Icghc2016)
Advanced Dos Detection (Icghc2016)
Manuel J Peter
 
Defense against botnets
Defense against botnetsDefense against botnets
Defense against botnets
Vaibhav Ahlawat
 
Akshaya's Resume
Akshaya's ResumeAkshaya's Resume
Akshaya's Resume
AkshayaKrishnamoorth
 
Akshaya resume
Akshaya resumeAkshaya resume
Akshaya resume
AkshayaKrishnamoorth
 
Segmented Federated Learning
Segmented Federated LearningSegmented Federated Learning
Segmented Federated Learning
YuweiSun5
 
Abhishek presentation october 2013
Abhishek presentation october 2013Abhishek presentation october 2013
Abhishek presentation october 2013Pratik Narang
 
Penetration Testing Services Technical Description Cyber51
Penetration Testing Services Technical Description Cyber51Penetration Testing Services Technical Description Cyber51
Penetration Testing Services Technical Description Cyber51
martinvoelk
 
My PhD thesis defense presentation
My PhD thesis defense presentationMy PhD thesis defense presentation
My PhD thesis defense presentation
Suman Srinivasan
 
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malwareDefcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
DaveEdwards12
 
Combating Cyberattacks through Network Agility and Automation
Combating Cyberattacks through Network Agility and AutomationCombating Cyberattacks through Network Agility and Automation
Combating Cyberattacks through Network Agility and Automation
Sagi Brody
 
IoT Workshop Nashville
IoT Workshop NashvilleIoT Workshop Nashville
IoT Workshop Nashville
Mike Branstein
 
Malicious Domain Profiling
Malicious Domain Profiling Malicious Domain Profiling
Malicious Domain Profiling
E Hacking
 
PeerShark - Detecting Peer-to-Peer Botnets by Tracking Conversations
PeerShark - Detecting Peer-to-Peer Botnets by Tracking ConversationsPeerShark - Detecting Peer-to-Peer Botnets by Tracking Conversations
PeerShark - Detecting Peer-to-Peer Botnets by Tracking Conversations
Pratik Narang
 
Ccna sec 01
Ccna sec 01Ccna sec 01
Ccna sec 01
EduclentMegasoftel
 
Comp tia network_n10-005
Comp tia network_n10-005Comp tia network_n10-005
Comp tia network_n10-005sunil kumar
 
IoT Workshop Louisville
IoT Workshop LouisvilleIoT Workshop Louisville
IoT Workshop Louisville
Mike Branstein
 

Similar to ML Approaches to P2P Botnet Detection (20)

Cybersecurity Opportunities Challenges APNIC
Cybersecurity Opportunities Challenges APNICCybersecurity Opportunities Challenges APNIC
Cybersecurity Opportunities Challenges APNIC
 
Cont-Forensic-Analytics-Dipto-14Apr2015-post
Cont-Forensic-Analytics-Dipto-14Apr2015-postCont-Forensic-Analytics-Dipto-14Apr2015-post
Cont-Forensic-Analytics-Dipto-14Apr2015-post
 
Synthetic Data Generation with DoppelGanger
Synthetic Data Generation with DoppelGangerSynthetic Data Generation with DoppelGanger
Synthetic Data Generation with DoppelGanger
 
Advanced Dos Detection (Icghc2016)
 Advanced Dos Detection (Icghc2016) Advanced Dos Detection (Icghc2016)
Advanced Dos Detection (Icghc2016)
 
Defense against botnets
Defense against botnetsDefense against botnets
Defense against botnets
 
Akshaya's Resume
Akshaya's ResumeAkshaya's Resume
Akshaya's Resume
 
Akshaya resume
Akshaya resumeAkshaya resume
Akshaya resume
 
Segmented Federated Learning
Segmented Federated LearningSegmented Federated Learning
Segmented Federated Learning
 
Abhishek presentation october 2013
Abhishek presentation october 2013Abhishek presentation october 2013
Abhishek presentation october 2013
 
Penetration Testing Services Technical Description Cyber51
Penetration Testing Services Technical Description Cyber51Penetration Testing Services Technical Description Cyber51
Penetration Testing Services Technical Description Cyber51
 
My PhD thesis defense presentation
My PhD thesis defense presentationMy PhD thesis defense presentation
My PhD thesis defense presentation
 
RPKI Tutorial
RPKI Tutorial RPKI Tutorial
RPKI Tutorial
 
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malwareDefcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
 
Combating Cyberattacks through Network Agility and Automation
Combating Cyberattacks through Network Agility and AutomationCombating Cyberattacks through Network Agility and Automation
Combating Cyberattacks through Network Agility and Automation
 
IoT Workshop Nashville
IoT Workshop NashvilleIoT Workshop Nashville
IoT Workshop Nashville
 
Malicious Domain Profiling
Malicious Domain Profiling Malicious Domain Profiling
Malicious Domain Profiling
 
PeerShark - Detecting Peer-to-Peer Botnets by Tracking Conversations
PeerShark - Detecting Peer-to-Peer Botnets by Tracking ConversationsPeerShark - Detecting Peer-to-Peer Botnets by Tracking Conversations
PeerShark - Detecting Peer-to-Peer Botnets by Tracking Conversations
 
Ccna sec 01
Ccna sec 01Ccna sec 01
Ccna sec 01
 
Comp tia network_n10-005
Comp tia network_n10-005Comp tia network_n10-005
Comp tia network_n10-005
 
IoT Workshop Louisville
IoT Workshop LouisvilleIoT Workshop Louisville
IoT Workshop Louisville
 

Recently uploaded

special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Atul Kumar Singh
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
The Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptxThe Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptx
DhatriParmar
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
thanhdowork
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
Marketing internship report file for MBA
Marketing internship report file for MBAMarketing internship report file for MBA
Marketing internship report file for MBA
gb193092
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Chapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdfChapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdf
Kartik Tiwari
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Nguyen Thanh Tu Collection
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
Krisztián Száraz
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
deeptiverma2406
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 

Recently uploaded (20)

special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
The Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptxThe Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptx
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
Marketing internship report file for MBA
Marketing internship report file for MBAMarketing internship report file for MBA
Marketing internship report file for MBA
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Chapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdfChapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdf
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 

ML Approaches to P2P Botnet Detection

  • 1. BITS Pilani Hyderabad Campus ML Approaches to P2P Botnet Detection by Vansh Khurana Mentors: Dr. Chittaranjan Hota, Pratik Narang and Team
  • 2. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • Introduction to P2P • Analyzing available Dataset • Deep Dive into P2P Botnets • Classification Algorithms • System Design • Prelim Results with Extensive Feature Set • Curse of Dimensionality • Ensemble Classifier • Future Work Today’s Agenda
  • 3. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • P2P Networks • Botnets • Malicious Activities Introduction
  • 4. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • Decentralized and distributed network architecture • Peers act as both suppliers and consumers of resources • Better resource utilization • No Central Coordination • Applications: • Instant Messaging Systems: Skype • Digital currency: Bitcoin • Wireless community networks: Netsukuku • Foreign Currency Exchange Market Place: CurrencyFair • Content Delivery: Torrent Applications • File sharing: Gnutella, DC++ • And many more…. P2P Networks
  • 5. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • P2P Networks • Botnets • Malicious Activities Introduction
  • 6. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • A network of compromised machines (bots) controlled by a bot master • Typical Formation: Spam Example • A botnet operator sends out viruses or worms, infecting ordinary users' computers, whose payload is a malicious application—the bot. • The bot on the infected PC logs into a particular C&C server. • A spammer purchases the services of the botnet from the operator. • The spammer provides the spam messages to the operator, who instructs the compromised machines via the control panel on the web server, causing them to send out spam messages • Should We care? -Absolutely! • Privacy Invasion – Hacked Accounts, Weak Passwords, Credential reuse, social engineering attacks • Financial Theft- 10 days of Torpig data valued at $83K to $8.3M (2009) Botnets
  • 7. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • P2P Networks • Botnets • Malicious Activities Introduction
  • 8. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • Responsible for (non-exhaustive list): • Large-scale network probing (i.e., scanning activities) • Launching Distributed Denial of Service (DDoS) attacks • Sending large-scale unsolicited emails (SPAM) • Click-fraud campaign • Information theft • Spyware • Adware Shift from a for-fun activity towards a profit-oriented business Malicious Activities
  • 9. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • Introduction to P2P • Analyzing available Dataset • Deep Dive into P2P Botnets • Classification Algorithms • System Design • Prelim Results with Extensive Feature Set • Curse of Dimensionality • Ensemble Classifier • Future Work
  • 10. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • Obtained from University of Georgia: Babak Rahbirinia et al. • Benign Applications: Emule, Frostwire, uTorrent, Vuze • ~ 130 gigs of Raw log files • Malicious Botnet data: • Storm, ~9 gigs • Waledac, ~ 1 gig • Zeus, ~ 100 mb • Nugache ~100 mb • Data for Botnets contain only C&C messages • We build training models to represent real world behaviour – ~80:20 More on Botnets next! Analyzing available Dataset
  • 11. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • P2P Botnets: Lifecycle • Botnets of Interest • Storm • Waledac • Zeus • Nugache Deep Dive into P2P Botnets
  • 12. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • Initial infection- Exploit known vulnerability, grant additional capabilities to the attacker on the target system • Secondary injection- leverage newly acquired access to execute additional scripts or programs which then fetch a malicious binary from a known location • Connection: bot attempts to establish a connection to the command and control server through a variety of methods • Malicious command and control – Doing the damage • Update and maintenance -bots are commanded to update their binaries, typically to defend against new attacks or to improve their functionality. P2P Botnets: Lifecycle
  • 13. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • P2P Botnets: Lifecycle • Botnets of Interest • Storm • Waledac • Zeus • Nugache Deep Dive into P2P Botnets
  • 14. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • Email Spam Botnet (2007) • Extent of Damage: 1 million to 50 million computer systems • Methodology • Observed to be defending itself, and attacking computer systems that scanned for Storm virus-infected computer systems online. • DDoS counter-attacks, to maintain its own internal integrity • Fools the antivirus on local system: Actual processes do nothing • Mostly uses UDP as underlying transport layer protocol Storm
  • 15. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • P2P Botnets: Lifecycle • Botnets of Interest • Storm • Waledac • Zeus • Nugache Deep Dive into P2P Botnets
  • 16. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • Email Spam Botnet (2010) • Extent: ~70k to 90K • Infection method: • Email • Fake Websites • Bundled with other threats ,such as Trojan.Peacomm, W32.Downadup, and Trojan.Bredolab • Typically sends log file every 30 minutes • Mostly Operates over TCP • More here: http://www.symantec.com/security_response/write up.jsp?docid=2008-122308-1429-99 Waledac
  • 17. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • P2P Botnets: Lifecycle • Botnets of Interest • Storm • Waledac • Zeus • Nugache Deep Dive into P2P Botnets
  • 18. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • Often used to steal banking information by man-in-the- browser keystroke logging and form grabbing, web page injection (2007) • Spread mainly through drive-by downloads and phishing schemes • Extent: 3.6 million in US alone- Operates very stealthily • Uses both TCP/ UDP protocols // flow concept fails Zeus
  • 19. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • P2P Botnets: Lifecycle • Botnets of Interest • Storm • Waledac • Zeus • Nugache Deep Dive into P2P Botnets
  • 20. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • One of the first sophisticated p2p botnets (2006) • Created by Jason Michael Milmont, when he was 16! • TCP port 8 bot, listens on port 8 • Didn’t Use: log files don’t represent theoretical information More here: http://www.symantec.com/security_response/writeup.jsp? docid=2006-043016-0900-99&tabid=2 Nugache
  • 21. BITS Pilani Hyderabad Campus DEMO 1: Botnet Data Analysis
  • 22. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • Introduction to P2P • Analyzing available Dataset • Deep Dive into P2P Botnets • Classification Algorithms • System Design • Prelim Results with Extensive Feature Set • Curse of Dimensionality • Ensemble Classifier • Future Work
  • 23. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • Decision Trees: • Based on Information Gain • Fast, ignores irrelevant features • But can overfit. We use REP Trees and set maximum depth to avoid this • K- Nearest Neighbour: • Inherently simple, doesn’t overfit • Artificial Neural Networks: • large number of features can be well handled • Heuristically set hidden layers: (No. of Features + No. of Class Labels) / 2 • SVM: • perform extremely complex kernel-based data transformations, and then find an optimal boundary between the possible outputs based on these transformations. • pairwise classification approach (one-versus-one) Classification Algorithms
  • 24. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • Introduction to P2P • Analyzing available Dataset • Deep Dive into P2P Botnets • Classification Algorithms • System Design • Prelim Results with Extensive Feature Set • Curse of Dimensionality • Ensemble Classifier • Future Work
  • 25. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • Salient Features • Background • Modules System Design
  • 26. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • No reliance on Encryption and Deep Packet Inspection • Intuitive and Simplistic Model to solve a complex problem • Model Bot Behaviour accurately • Explored Feature Set extensively (~75 Features) • Explored Non Network based features • Compression Ratio • Signal Processing Approach to model network behaviour • More on this later… • Most importantly: Achieved Good Results Salient Features
  • 27. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • Salient Features • Background • Modules System Design
  • 28. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • Shannon’s Source Coding Theorem • The expected length L of an encoding of X with associated probability function p(x) is given by: • Bot data expected to be more uniform, hence should give more compression Background
  • 29. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • Discrete Fourier Transform: • Converts a finite list of equally spaced samples of a function into the list of coefficients of a finite combination of complex sinusoids, ordered by their frequencies • Time domain to frequency domain to extract hidden patterns in botnet communication • The network communication between a pair of nodes is treated as a `signal'. • Given a time sequence X = X(0);X(1) : : :X(w), its Discrete Fourier Transform (DFT) is given as- • The first few DFT coefficients contain most of the energy Background
  • 30. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • Salient Features • Background • Modules System Design
  • 31. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus Modules
  • 32. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • Introduction to P2P • Analyzing available Dataset • Deep Dive into P2P Botnets • Classification Algorithms • System Design • Prelim Results with Extensive Feature Set • Curse of Dimensionality • Ensemble Classifier • Future Work
  • 33. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • Dataset Description • Used Extensive Feature Set Prelim results (ACM DEBS)
  • 34. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus Prelim Results (ACM DEBS) Overall Precision and Recall
  • 35. BITS Pilani Hyderabad Campus DEMO 2: System Design & Prelim Results
  • 36. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • Introduction to P2P • Analyzing available Dataset • Deep Dive into P2P Botnets • Classification Algorithms • System Design • Prelim Results with Extensive Feature Set • Curse of Dimensionality • Ensemble Classifier • Future Work
  • 37. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • Principal Component Analysis • Feature Selection Algorithms Curse of Dimensionality
  • 38. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • Statistical procedure that uses orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components • Converted 76 Features to 28 features retaining 95% Variance • Classifier Accuracy: • J-48: 97% • REP Tree: 94.17% • SVM: 80.61% • K-NN: 93.29% • Bayesian Networks: 94.65% Principal Component Analysis
  • 39. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • Principal Component Analysis • Feature Selection Algorithms Curse of Dimensionality
  • 40. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • Best First Search Based Feature Selection • Also explored random forest based importance evaluation • Selected Features are: • Flow duration, MEDIAN_INTERARRIVAL_TIME, AVG_PAYLOAD_SIZE,AVG_PAYLOAD_SIZE_SENDING, AVG_PAYLOAD_SIZE_RECVING, PRIME_WAVE_MAGNITUDE_PAYLOAD, PRIME_WAVE_MAGNITUDE_IAT, BYTES_SENT_PER_SEC, BYTES_RECVD_PER_SEC, DFT_Payload(1st and 2nd co-efficient), DFT_IAT(1st and 2nd Coefficient), Compression • Classifier Accuracy: • J-48: 99.7% • REP Tree: 99.58% • SVM: 84% • KNN: 99.5% • ANN: 92% Feature Selection
  • 41. BITS Pilani Hyderabad Campus DEMO 3: PCA and Feature Selection
  • 42. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • Introduction to P2P • Analyzing available Dataset • Deep Dive into P2P Botnets • Classification Algorithms • System Design • Prelim Results with Extensive Feature Set • Curse of Dimensionality • Ensemble Classifier • Future Work
  • 43. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • Construct a set of classifiers from the training data • Predict class label of previously unseen records by aggregating predictions made by multiple classifiers Ensemble Classifier
  • 44. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • Ensemble methods work better with ‘unstable classifiers’ • Classifiers that are sensitive to minor perturbations in the training set • Examples: – Decision trees – Rule-based – Artificial neural networks Ensemble Classifier
  • 45. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • One way to force a learning algorithm to construct multiple hypotheses is to run the algorithm several times and provide it with somewhat different data in each run. This idea is used in the following methods: • Majority Voting • Bagging • Randomness Injection • Feature-Selection Ensembles • Error-Correcting Output Coding. Ensemble Classifier
  • 46. Why Majority Voting works? • Suppose there are 25 base classifiers – Each classifier has error rate, ε = 0.35 – Assume errors made by classifiers are uncorrelated – Probability that the ensemble classifier makes a wrong prediction: ∑= − =−      =≥ 25 13 25 06.0)1( 25 )13( i ii i XP εε
  • 47. Bagging • Employs simplest way of combining predictions that belong to the same type. • Combining can be realized with voting or averaging • Each model receives equal weight • “Idealized” version of bagging: – Sample several training sets of size n (instead of just having one training set of size n) – Build a classifier for each training set – Combine the classifier’s predictions • This improves performance in almost all cases if learning scheme is unstable (i.e. decision trees)
  • 48. Bagging classifiers Classifier generation Let n be the size of the training set. For each of t iterations: Sample n instances with replacement from the training set. Apply the learning algorithm to the sample. Store the resulting classifier. classification For each of the t classifiers: Predict class of instance using classifier. Return class that was predicted most often.
  • 49. Why does bagging work? • Bagging reduces variance by voting/ averaging, thus reducing the overall expected error – In the case of classification there are pathological situations where the overall error might increase – Usually, the more classifiers the better
  • 50. Stacking • Uses meta learner instead of voting to combine predictions of base learners – Predictions of base learners (level-0 models) are used as input for meta learner (level-1 model) • Base learners usually different learning schemes • Hard to analyze theoretically: “black magic”
  • 51. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • Introduction to P2P • Analyzing available Dataset • Deep Dive into P2P Botnets • Classification Algorithms • System Design • Prelim Results with Extensive Feature Set • Curse of Dimensionality • Ensemble Classifier • Future Work
  • 52. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • Genetic Search and Greedy Classification (June) • Suggested Improvements and Proposals: • Clustering to scale up • Binning to compute Compression Ratio • Smoothening DFT Curve • Explore parson’s coding theory • Repo link (Stay Tuned!) • https://github.com/vansh21k/P2P-Botnet-Detection-Project Future Work
  • 53. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus • Introduction to P2P • Analyzing available Dataset • Deep Dive into P2P Botnets • Classification Algorithms • System Design • Prelim Results with Extensive Feature Set • Curse of Dimensionality • Ensemble Classifier • Future Work
  • 54. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus [1] H. Hang, X. Wei, M. Faloutsos, and T. Eliassi-Rad. Entelecheia: Detecting p2p botnets in their waiting stage. In IFIP Networking Conference, 2013, pages 1{9, 2013. [2] J. Kang and J.-Y. Zhang. Application entropy theory to detect new peer-to-peer botnet with multi-chart cusum. In Electronic Commerce and Security, 2009. [3] C. Kanich, N. Weaver, D. McCoy, T. Halvorson,C. Kreibich, K. Levchenko, V. Paxson, G. M. Voelker, and S. Savage. Show me the money: Characterizing spam-advertised revenue. In USENIX Security Symposium, pages 15 [4] B. Rahbarinia, R. Perdisci, A. Lanzi, and K. Li. Peerrush: Mining for unwanted p2p trac. In Detection of Intrusions and Malware, and Vulnerability Assessment, pages 62{82. Springer, 2013. [5] C. Rossow, D. Andriesse, T. Werner, B. Stone-Gross, D. Plohmann, C. J. Dietrich, and H. Bos. Sok: P2pwned-modeling and evaluating the resilience of peer-to-peer botnets. In Security and Privacy (SP), 2013 IEEE Symposium on, pages 97{111. IEEE, 2013. [6] S. Saad, I. Traore, A. Ghorbani, B. Sayed, D. Zhao, W. Lu, J. Felix, and P. Hakimian. Detecting p2p botnets through network behavior analysis and machine learning. In Privacy, Security and Trust (PST), 2011 Ninth Annual International Conference on, pages 174{180. IEEE, 2011. [7] R. Schoof and R. Koning. Detecting peer-to-peer botnets. University of Amsterdam, 2007. Technical report. [8] F. Tegeler, X. Fu, G. Vigna, and C. Kruegel. Botnder: Finding bots in network trac without deep packet inspection. In Proceedings of the 8th international conference on Emerging networking experiments and technologies, pages 349{360. ACM, 2012. [9] T.-F. Yen and M. K. Reiter. Are your hosts trading or plotting? telling p2p le-sharing and bots apart. In Distributed Computing Systems (ICDCS), 2010 IEEE 30th International Conference on, pages 241{252. IEEE, 2010. References
  • 55. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus [10] X. Yu, X. Dong, G. Yu, Y. Qin, D. Yue, and Y. Zhao. Online botnet detection based on incremental discrete fourier transform. Journal of Networks, 5(5), 2010. [11] J. Zhang, R. Perdisci, W. Lee, X. Luo, and U. Sarfraz. Building a scalable system for stealthy p2p-botnet detection. Information Forensics and Security, IEEE Transactions on, 9(1):27{38, 2014. [12] J. Zhang, R. Perdisci, W. Lee, U. Sarfraz, and X. Luo. Detecting stealthy p2p botnets using statistical trac ngerprints. In Dependable Systems & Networks (DSN), 2011 IEEE/IFIP 41st International Conference on, pages 21{132. IEEE, 2011. [13] S. Zhang. Conversation-based p2p botnet detection with decision fusion. Master's thesis, Fredericton: University of New Brunswick, 2013. References