The document discusses machine learning approaches to peer-to-peer botnet detection. It begins with introductions to P2P networks, botnets, and malicious activities. It then analyzes available datasets on benign and malicious P2P traffic. Various classification algorithms and a system design for P2P botnet detection are discussed. Preliminary results using an extensive feature set are presented, along with addressing the curse of dimensionality through PCA and feature selection. The document concludes with a discussion of ensemble classifiers.
Matrix - One-year in, Matthew Hodgson, Matrix.orgAlan Quayle
Matrix - One-year in
Matthew Hodgson
Co-founder
Matrix.org
Matrix went into beta in December 2014, and here we are, almost one year later! In this talk, we want to have a look at where we are with Matrix today, and what people in the community have done with Matrix. We will look at the whole ecosystem that has emerged – from glossy client apps, SDKs and servers, to Application Services and bridges to SIP, IMS, XMPP, Slack, Lync etc written by the community. Matrix is also the missing standard HTTP signaling protocol for WebRTC, acting as an open standard for decentralized, persistent communication.
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
Slide yang kupresentasikan di PyCon 2019 (Surabaya, 23/11/2019)
Red-Teaming is a simulation of real world hacking against organization. It has little to no limit of time, location, and method to attack. Only results matter. This talk gives insight about how “hacker” works and how python can be used for sophisticated series of attack.
Phishing Website Detection by Machine Learning Techniques Presentation.pdfVaralakshmiKC
Phishing is a form of social engineering where attackers deceive people into revealing sensitive information or installing malware such as ransomware.Phishing (pronounced: fishing) is an attack that attempts to steal your money, or your identity, by getting you to reveal personal information -- such as credit card numbers, bank information, or passwords -- on websites that pretend to be legitimate.
RPKI (Resource Public Key Infrastructure)Fakrul Alam
Resource Public Key Infrastructure (RPKI), also known as Resource Certification, is a specialized public key infrastructure (PKI) framework designed to secure the Internet's routing infrastructure. RPKI provides a way to connect Internet number resource information (such as Autonomous System numbers and IP Addresses) to a trust anchor. (wikipedia)
Matrix - One-year in, Matthew Hodgson, Matrix.orgAlan Quayle
Matrix - One-year in
Matthew Hodgson
Co-founder
Matrix.org
Matrix went into beta in December 2014, and here we are, almost one year later! In this talk, we want to have a look at where we are with Matrix today, and what people in the community have done with Matrix. We will look at the whole ecosystem that has emerged – from glossy client apps, SDKs and servers, to Application Services and bridges to SIP, IMS, XMPP, Slack, Lync etc written by the community. Matrix is also the missing standard HTTP signaling protocol for WebRTC, acting as an open standard for decentralized, persistent communication.
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
Slide yang kupresentasikan di PyCon 2019 (Surabaya, 23/11/2019)
Red-Teaming is a simulation of real world hacking against organization. It has little to no limit of time, location, and method to attack. Only results matter. This talk gives insight about how “hacker” works and how python can be used for sophisticated series of attack.
Phishing Website Detection by Machine Learning Techniques Presentation.pdfVaralakshmiKC
Phishing is a form of social engineering where attackers deceive people into revealing sensitive information or installing malware such as ransomware.Phishing (pronounced: fishing) is an attack that attempts to steal your money, or your identity, by getting you to reveal personal information -- such as credit card numbers, bank information, or passwords -- on websites that pretend to be legitimate.
RPKI (Resource Public Key Infrastructure)Fakrul Alam
Resource Public Key Infrastructure (RPKI), also known as Resource Certification, is a specialized public key infrastructure (PKI) framework designed to secure the Internet's routing infrastructure. RPKI provides a way to connect Internet number resource information (such as Autonomous System numbers and IP Addresses) to a trust anchor. (wikipedia)
Discussion of cybersecurity opportunities and challenges and how APNIC can assist with RPKI, DNSSEC, and BCP 38 implementation to help secure the Internet's infrastructure.
A conversation with Quants, Thinkers and Innovators all challenged to innovate in turbulent times!
Join QuantUniversity for a complimentary summer speaker series where you will hear from Quants, innovators, startups and Fintech experts on various topics in Quant Investing, Machine Learning, Optimization, Fintech, AI etc.
Topic: Generating Synthetic Data with Generative Adversarial Networks: Opportunities and Challenges
Limited data access continues to be a barrier to data-driven product development. In this talk, we explore if and how generative adversarial networks (GANs) can be used to incentivize data sharing by enabling a generic framework for sharing synthetic datasets with minimal expert knowledge.
We identify key challenges of existing GAN approaches with respect to fidelity (e.g., capturing complex multidimensional correlations, mode collapse) and privacy (i.e., existing guarantees are poorly understood and can sacrifice fidelity).
To address fidelity challenges, we discuss our experiences designing a custom workflow called DoppelGANger and demonstrate that across diverse real-world datasets (e.g., bandwidth measurements, cluster requests, web sessions) and use cases (e.g., structural characterization, predictive modeling, algorithm comparison), DoppelGANger achieves up to 43% better fidelity than baseline models.
With respect to privacy, we identify fundamental challenges with both classical notions of privacy as well as recent advances to improve the privacy properties of GANs, and suggest a potential roadmap for addressing these challenges.
Traditional approaches to cybersecurity issues usually protect users from attacks after the occurrence of specific types of attacks. Besides, patterns of recent cyberattacks tend to be changeable, which add up to unpredictability of them. On the other hand, machine learning, as a new method used to detect intrusion, is attracting more and more attention. Moreover, through the sharing of local training data, the centralized learning approach has proven to improve a model's performance. In this research, a segmented federated learning is proposed, different from a collaborative learning based on single global model in a traditional federated learning model, it keeps multiple global models which allow each segment of participants to conduct collaborative learning separately and rearranges the segmentation of participants dynamically as well. Furthermore, these multiple global models interact with each other for updating parameters, thus being adaptable to various participants' LANs. A dataset covering two months' traffic data from 20 participants' LANs in the LAN-Security Monitoring Project is used. We adopt three types of knowledge-based methods for labeling network events and train a CNN model based on the dataset. At last, we achieve validation accuracies of 0.923, 0.813 and 0.877 individually with these labeling methods.
PhD thesis defense presentation for my topic "Improving Content Delivery and Service Discovery in Networks" for wireless and other networks. Columbia University, 2016.
Combating Cyberattacks through Network Agility and AutomationSagi Brody
As presented January 2018 at PTC18 in Hawaii. This talk covers the use of new network automation technologies and strategies which can be used to combat Cyberattacks including DDoS, Ransomware, and Reflection. The talk specifically discusses how DDoS monitoring and mitigation can be improved via the use of interconnection fabrics to replace traditional GRE tunnels for out-of-band communication; How Disaster Recovery (DRaaS) may be used as an entry point for Cyberattacks, how DRaaS infrastructure may be used to improve production site security, and how Managed Security Service providers can integrate directly with DRaaS infrastructure and Software-Defined-Perimeter solutions to improve automated network failover and failback
KiZAN will bring 25 Raspberry Pi starter kits that run Windows 10 IoT Core. This will enable participants to build a really compelling IoT/Azure/Power BI story in a single day! Interet of Things (IoT) Raspberry Pi starter kit
We’ll start off the day with an introduction to IoT and build IoT devices (hands on). Next, we’ll build a simple temperature sensor, collecting ambient temperature readings, and stream the data to an Azure IoT Hub.
Once the data is in Azure, we’ll analyze it with Azure Stream Analytics, and ship it to an Azure SQL Database.
Finally, we’ll report on the data and build dashboards of our temperature readings using Power BI.
Advanced Persistent Threat (APT) attacks are highly organised and are launched for prolonged periods. APT attacks exhibit discernible attributes or patterns.
KiZAN will bring 25 Raspberry Pi starter kits that run Windows 10 IoT Core. This will enable participants to build a really compelling IoT/Azure/Power BI story in a single day! Interet of Things (IoT) Raspberry Pi starter kit
We’ll start off the day with an introduction to IoT and build IoT devices (hands on). Next, we’ll build a simple temperature sensor, collecting ambient temperature readings, and stream the data to an Azure IoT Hub.
Once the data is in Azure, we’ll analyze it with Azure Stream Analytics, and ship it to an Azure SQL Database.
Finally, we’ll report on the data and build dashboards of our temperature readings using Power BI.
Discussion of cybersecurity opportunities and challenges and how APNIC can assist with RPKI, DNSSEC, and BCP 38 implementation to help secure the Internet's infrastructure.
A conversation with Quants, Thinkers and Innovators all challenged to innovate in turbulent times!
Join QuantUniversity for a complimentary summer speaker series where you will hear from Quants, innovators, startups and Fintech experts on various topics in Quant Investing, Machine Learning, Optimization, Fintech, AI etc.
Topic: Generating Synthetic Data with Generative Adversarial Networks: Opportunities and Challenges
Limited data access continues to be a barrier to data-driven product development. In this talk, we explore if and how generative adversarial networks (GANs) can be used to incentivize data sharing by enabling a generic framework for sharing synthetic datasets with minimal expert knowledge.
We identify key challenges of existing GAN approaches with respect to fidelity (e.g., capturing complex multidimensional correlations, mode collapse) and privacy (i.e., existing guarantees are poorly understood and can sacrifice fidelity).
To address fidelity challenges, we discuss our experiences designing a custom workflow called DoppelGANger and demonstrate that across diverse real-world datasets (e.g., bandwidth measurements, cluster requests, web sessions) and use cases (e.g., structural characterization, predictive modeling, algorithm comparison), DoppelGANger achieves up to 43% better fidelity than baseline models.
With respect to privacy, we identify fundamental challenges with both classical notions of privacy as well as recent advances to improve the privacy properties of GANs, and suggest a potential roadmap for addressing these challenges.
Traditional approaches to cybersecurity issues usually protect users from attacks after the occurrence of specific types of attacks. Besides, patterns of recent cyberattacks tend to be changeable, which add up to unpredictability of them. On the other hand, machine learning, as a new method used to detect intrusion, is attracting more and more attention. Moreover, through the sharing of local training data, the centralized learning approach has proven to improve a model's performance. In this research, a segmented federated learning is proposed, different from a collaborative learning based on single global model in a traditional federated learning model, it keeps multiple global models which allow each segment of participants to conduct collaborative learning separately and rearranges the segmentation of participants dynamically as well. Furthermore, these multiple global models interact with each other for updating parameters, thus being adaptable to various participants' LANs. A dataset covering two months' traffic data from 20 participants' LANs in the LAN-Security Monitoring Project is used. We adopt three types of knowledge-based methods for labeling network events and train a CNN model based on the dataset. At last, we achieve validation accuracies of 0.923, 0.813 and 0.877 individually with these labeling methods.
PhD thesis defense presentation for my topic "Improving Content Delivery and Service Discovery in Networks" for wireless and other networks. Columbia University, 2016.
Combating Cyberattacks through Network Agility and AutomationSagi Brody
As presented January 2018 at PTC18 in Hawaii. This talk covers the use of new network automation technologies and strategies which can be used to combat Cyberattacks including DDoS, Ransomware, and Reflection. The talk specifically discusses how DDoS monitoring and mitigation can be improved via the use of interconnection fabrics to replace traditional GRE tunnels for out-of-band communication; How Disaster Recovery (DRaaS) may be used as an entry point for Cyberattacks, how DRaaS infrastructure may be used to improve production site security, and how Managed Security Service providers can integrate directly with DRaaS infrastructure and Software-Defined-Perimeter solutions to improve automated network failover and failback
KiZAN will bring 25 Raspberry Pi starter kits that run Windows 10 IoT Core. This will enable participants to build a really compelling IoT/Azure/Power BI story in a single day! Interet of Things (IoT) Raspberry Pi starter kit
We’ll start off the day with an introduction to IoT and build IoT devices (hands on). Next, we’ll build a simple temperature sensor, collecting ambient temperature readings, and stream the data to an Azure IoT Hub.
Once the data is in Azure, we’ll analyze it with Azure Stream Analytics, and ship it to an Azure SQL Database.
Finally, we’ll report on the data and build dashboards of our temperature readings using Power BI.
Advanced Persistent Threat (APT) attacks are highly organised and are launched for prolonged periods. APT attacks exhibit discernible attributes or patterns.
KiZAN will bring 25 Raspberry Pi starter kits that run Windows 10 IoT Core. This will enable participants to build a really compelling IoT/Azure/Power BI story in a single day! Interet of Things (IoT) Raspberry Pi starter kit
We’ll start off the day with an introduction to IoT and build IoT devices (hands on). Next, we’ll build a simple temperature sensor, collecting ambient temperature readings, and stream the data to an Azure IoT Hub.
Once the data is in Azure, we’ll analyze it with Azure Stream Analytics, and ship it to an Azure SQL Database.
Finally, we’ll report on the data and build dashboards of our temperature readings using Power BI.
Similar to ML Approaches to P2P Botnet Detection (20)
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Safalta Digital marketing institute in Noida, provide complete applications that encompass a huge range of virtual advertising and marketing additives, which includes search engine optimization, virtual communication advertising, pay-per-click on marketing, content material advertising, internet analytics, and greater. These university courses are designed for students who possess a comprehensive understanding of virtual marketing strategies and attributes.Safalta Digital Marketing Institute in Noida is a first choice for young individuals or students who are looking to start their careers in the field of digital advertising. The institute gives specialized courses designed and certification.
for beginners, providing thorough training in areas such as SEO, digital communication marketing, and PPC training in Noida. After finishing the program, students receive the certifications recognised by top different universitie, setting a strong foundation for a successful career in digital marketing.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
6. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• A network of compromised machines (bots) controlled by a
bot master
• Typical Formation: Spam Example
• A botnet operator sends out viruses or worms, infecting ordinary users'
computers, whose payload is a malicious application—the bot.
• The bot on the infected PC logs into a particular C&C server.
• A spammer purchases the services of the botnet from the operator.
• The spammer provides the spam messages to the operator, who instructs the
compromised machines via the control panel on the web server, causing them
to send out spam messages
• Should We care? -Absolutely!
• Privacy Invasion – Hacked Accounts, Weak Passwords, Credential reuse, social
engineering attacks
• Financial Theft- 10 days of Torpig data valued at $83K to $8.3M (2009)
Botnets
8. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Responsible for (non-exhaustive list):
• Large-scale network probing (i.e., scanning activities)
• Launching Distributed Denial of Service (DDoS) attacks
• Sending large-scale unsolicited emails (SPAM)
• Click-fraud campaign
• Information theft
• Spyware
• Adware
Shift from a for-fun activity towards a profit-oriented business
Malicious Activities
9. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Introduction to P2P
• Analyzing available Dataset
• Deep Dive into P2P Botnets
• Classification Algorithms
• System Design
• Prelim Results with Extensive Feature Set
• Curse of Dimensionality
• Ensemble Classifier
• Future Work
10. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Obtained from University of Georgia: Babak Rahbirinia et al.
• Benign Applications: Emule, Frostwire, uTorrent, Vuze
• ~ 130 gigs of Raw log files
• Malicious Botnet data:
• Storm, ~9 gigs
• Waledac, ~ 1 gig
• Zeus, ~ 100 mb
• Nugache ~100 mb
• Data for Botnets contain only C&C messages
• We build training models to represent real world behaviour –
~80:20
More on Botnets next!
Analyzing available Dataset
11. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• P2P Botnets: Lifecycle
• Botnets of Interest
• Storm
• Waledac
• Zeus
• Nugache
Deep Dive into P2P Botnets
12. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Initial infection- Exploit known vulnerability, grant additional capabilities
to the attacker on the target system
• Secondary injection- leverage newly acquired access to execute additional
scripts or programs which then fetch a malicious binary from a known
location
• Connection: bot attempts to establish a connection to the command and
control server through a variety of methods
• Malicious command and control – Doing the damage
• Update and maintenance -bots are commanded to update their binaries,
typically to defend against new attacks or to improve their functionality.
P2P Botnets: Lifecycle
13. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• P2P Botnets: Lifecycle
• Botnets of Interest
• Storm
• Waledac
• Zeus
• Nugache
Deep Dive into P2P Botnets
14. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Email Spam Botnet (2007)
• Extent of Damage: 1 million to 50 million computer
systems
• Methodology
• Observed to be defending itself, and attacking computer systems that
scanned for Storm virus-infected computer systems online.
• DDoS counter-attacks, to maintain its own internal integrity
• Fools the antivirus on local system: Actual processes do nothing
• Mostly uses UDP as underlying transport layer protocol
Storm
15. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• P2P Botnets: Lifecycle
• Botnets of Interest
• Storm
• Waledac
• Zeus
• Nugache
Deep Dive into P2P Botnets
16. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Email Spam Botnet (2010)
• Extent: ~70k to 90K
• Infection method:
• Email
• Fake Websites
• Bundled with other threats ,such as Trojan.Peacomm, W32.Downadup,
and Trojan.Bredolab
• Typically sends log file every 30 minutes
• Mostly Operates over TCP
• More here:
http://www.symantec.com/security_response/write
up.jsp?docid=2008-122308-1429-99
Waledac
17. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• P2P Botnets: Lifecycle
• Botnets of Interest
• Storm
• Waledac
• Zeus
• Nugache
Deep Dive into P2P Botnets
18. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Often used to steal banking information by man-in-the-
browser keystroke logging and form grabbing, web page
injection (2007)
• Spread mainly through drive-by downloads and phishing
schemes
• Extent: 3.6 million in US alone- Operates very stealthily
• Uses both TCP/ UDP protocols // flow concept fails
Zeus
19. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• P2P Botnets: Lifecycle
• Botnets of Interest
• Storm
• Waledac
• Zeus
• Nugache
Deep Dive into P2P Botnets
20. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• One of the first sophisticated p2p botnets (2006)
• Created by Jason Michael Milmont, when he was 16!
• TCP port 8 bot, listens on port 8
• Didn’t Use: log files don’t represent theoretical
information
More here:
http://www.symantec.com/security_response/writeup.jsp?
docid=2006-043016-0900-99&tabid=2
Nugache
22. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Introduction to P2P
• Analyzing available Dataset
• Deep Dive into P2P Botnets
• Classification Algorithms
• System Design
• Prelim Results with Extensive Feature Set
• Curse of Dimensionality
• Ensemble Classifier
• Future Work
23. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Decision Trees:
• Based on Information Gain
• Fast, ignores irrelevant features
• But can overfit. We use REP Trees and set maximum depth to avoid this
• K- Nearest Neighbour:
• Inherently simple, doesn’t overfit
• Artificial Neural Networks:
• large number of features can be well handled
• Heuristically set hidden layers:
(No. of Features + No. of Class Labels) / 2
• SVM:
• perform extremely complex kernel-based data transformations, and then find an
optimal boundary between the possible outputs based on these transformations.
• pairwise classification approach (one-versus-one)
Classification Algorithms
24. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Introduction to P2P
• Analyzing available Dataset
• Deep Dive into P2P Botnets
• Classification Algorithms
• System Design
• Prelim Results with Extensive Feature Set
• Curse of Dimensionality
• Ensemble Classifier
• Future Work
25. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Salient Features
• Background
• Modules
System Design
26. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• No reliance on Encryption and Deep Packet
Inspection
• Intuitive and Simplistic Model to solve a complex
problem
• Model Bot Behaviour accurately
• Explored Feature Set extensively (~75 Features)
• Explored Non Network based features
• Compression Ratio
• Signal Processing Approach to model network behaviour
• More on this later…
• Most importantly: Achieved Good Results
Salient Features
27. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Salient Features
• Background
• Modules
System Design
28. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Shannon’s Source Coding Theorem
• The expected length L of an encoding of X with
associated probability function p(x) is given by:
• Bot data expected to be more uniform, hence should give
more compression
Background
29. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Discrete Fourier Transform:
• Converts a finite list of equally spaced samples of a function into the list
of coefficients of a finite combination of complex sinusoids, ordered by
their frequencies
• Time domain to frequency domain to extract hidden patterns in botnet communication
• The network communication between a pair of nodes is
treated as a `signal'.
• Given a time sequence X = X(0);X(1) : : :X(w), its Discrete
Fourier Transform (DFT) is given as-
• The first few DFT coefficients contain most of the energy
Background
30. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Salient Features
• Background
• Modules
System Design
32. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Introduction to P2P
• Analyzing available Dataset
• Deep Dive into P2P Botnets
• Classification Algorithms
• System Design
• Prelim Results with Extensive Feature Set
• Curse of Dimensionality
• Ensemble Classifier
• Future Work
33. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Dataset Description
• Used Extensive Feature Set
Prelim results (ACM DEBS)
36. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Introduction to P2P
• Analyzing available Dataset
• Deep Dive into P2P Botnets
• Classification Algorithms
• System Design
• Prelim Results with Extensive Feature Set
• Curse of Dimensionality
• Ensemble Classifier
• Future Work
37. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Principal Component Analysis
• Feature Selection Algorithms
Curse of Dimensionality
38. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Statistical procedure that uses orthogonal transformation to
convert a set of observations of possibly correlated variables
into a set of values of linearly uncorrelated variables
called principal components
• Converted 76 Features to 28 features retaining 95% Variance
• Classifier Accuracy:
• J-48: 97%
• REP Tree: 94.17%
• SVM: 80.61%
• K-NN: 93.29%
• Bayesian Networks: 94.65%
Principal Component
Analysis
39. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Principal Component Analysis
• Feature Selection Algorithms
Curse of Dimensionality
40. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Best First Search Based Feature Selection
• Also explored random forest based importance evaluation
• Selected Features are:
• Flow duration, MEDIAN_INTERARRIVAL_TIME,
AVG_PAYLOAD_SIZE,AVG_PAYLOAD_SIZE_SENDING, AVG_PAYLOAD_SIZE_RECVING,
PRIME_WAVE_MAGNITUDE_PAYLOAD, PRIME_WAVE_MAGNITUDE_IAT,
BYTES_SENT_PER_SEC, BYTES_RECVD_PER_SEC, DFT_Payload(1st
and 2nd
co-efficient),
DFT_IAT(1st
and 2nd
Coefficient), Compression
• Classifier Accuracy:
• J-48: 99.7%
• REP Tree: 99.58%
• SVM: 84%
• KNN: 99.5%
• ANN: 92%
Feature Selection
42. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Introduction to P2P
• Analyzing available Dataset
• Deep Dive into P2P Botnets
• Classification Algorithms
• System Design
• Prelim Results with Extensive Feature Set
• Curse of Dimensionality
• Ensemble Classifier
• Future Work
43. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Construct a set of classifiers from the training data
• Predict class label of previously unseen records by
aggregating predictions made by multiple classifiers
Ensemble Classifier
44. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Ensemble methods work better with ‘unstable
classifiers’
• Classifiers that are sensitive to minor perturbations
in the training set
• Examples:
– Decision trees
– Rule-based
– Artificial neural networks
Ensemble Classifier
45. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• One way to force a learning algorithm to construct multiple
hypotheses is to run the algorithm several times and provide
it with somewhat different data in each run. This idea is used
in the following methods:
• Majority Voting
• Bagging
• Randomness Injection
• Feature-Selection Ensembles
• Error-Correcting Output Coding.
Ensemble Classifier
46. Why Majority Voting works?
• Suppose there are 25
base classifiers
– Each classifier has
error rate, ε = 0.35
– Assume errors made
by classifiers are
uncorrelated
– Probability that the
ensemble classifier makes
a wrong prediction:
∑=
−
=−
=≥
25
13
25
06.0)1(
25
)13(
i
ii
i
XP εε
47. Bagging
• Employs simplest way of combining predictions that
belong to the same type.
• Combining can be realized with voting or averaging
• Each model receives equal weight
• “Idealized” version of bagging:
– Sample several training sets of size n (instead of just
having one training set of size n)
– Build a classifier for each training set
– Combine the classifier’s predictions
• This improves performance in almost all cases if
learning scheme is unstable (i.e. decision trees)
48. Bagging classifiers
Classifier generation
Let n be the size of the training set.
For each of t iterations:
Sample n instances with replacement from the
training set.
Apply the learning algorithm to the sample.
Store the resulting classifier.
classification
For each of the t classifiers:
Predict class of instance using classifier.
Return class that was predicted most often.
49. Why does bagging work?
• Bagging reduces variance by voting/
averaging, thus reducing the overall expected
error
– In the case of classification there are pathological
situations where the overall error might increase
– Usually, the more classifiers the better
50. Stacking
• Uses meta learner instead of voting to
combine predictions of base learners
– Predictions of base learners (level-0 models) are
used as input for meta learner (level-1 model)
• Base learners usually different learning
schemes
• Hard to analyze theoretically: “black magic”
51. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Introduction to P2P
• Analyzing available Dataset
• Deep Dive into P2P Botnets
• Classification Algorithms
• System Design
• Prelim Results with Extensive Feature Set
• Curse of Dimensionality
• Ensemble Classifier
• Future Work
52. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Genetic Search and Greedy Classification (June)
• Suggested Improvements and Proposals:
• Clustering to scale up
• Binning to compute Compression Ratio
• Smoothening DFT Curve
• Explore parson’s coding theory
• Repo link (Stay Tuned!)
• https://github.com/vansh21k/P2P-Botnet-Detection-Project
Future Work
53. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
• Introduction to P2P
• Analyzing available Dataset
• Deep Dive into P2P Botnets
• Classification Algorithms
• System Design
• Prelim Results with Extensive Feature Set
• Curse of Dimensionality
• Ensemble Classifier
• Future Work
54. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
[1] H. Hang, X. Wei, M. Faloutsos, and T. Eliassi-Rad. Entelecheia: Detecting p2p botnets in their waiting
stage. In IFIP Networking Conference, 2013, pages 1{9, 2013.
[2] J. Kang and J.-Y. Zhang. Application entropy theory to detect new peer-to-peer botnet with multi-chart
cusum. In Electronic Commerce and Security, 2009.
[3] C. Kanich, N. Weaver, D. McCoy, T. Halvorson,C. Kreibich, K. Levchenko, V. Paxson, G. M. Voelker,
and S. Savage. Show me the money: Characterizing spam-advertised revenue. In USENIX Security
Symposium, pages 15
[4] B. Rahbarinia, R. Perdisci, A. Lanzi, and K. Li. Peerrush: Mining for unwanted p2p trac. In
Detection of Intrusions and Malware, and Vulnerability Assessment, pages 62{82. Springer, 2013.
[5] C. Rossow, D. Andriesse, T. Werner, B. Stone-Gross, D. Plohmann, C. J. Dietrich, and H. Bos. Sok:
P2pwned-modeling and evaluating the resilience of peer-to-peer botnets. In Security and Privacy (SP),
2013 IEEE Symposium on, pages 97{111. IEEE, 2013.
[6] S. Saad, I. Traore, A. Ghorbani, B. Sayed, D. Zhao, W. Lu, J. Felix, and P. Hakimian. Detecting p2p
botnets through network behavior analysis and machine learning. In Privacy, Security and Trust
(PST), 2011 Ninth Annual International Conference on, pages 174{180. IEEE, 2011.
[7] R. Schoof and R. Koning. Detecting peer-to-peer
botnets. University of Amsterdam, 2007. Technical
report.
[8] F. Tegeler, X. Fu, G. Vigna, and C. Kruegel. Botnder: Finding bots in network trac without
deep packet inspection. In Proceedings of the 8th
international conference on Emerging networking
experiments and technologies, pages 349{360. ACM, 2012.
[9] T.-F. Yen and M. K. Reiter. Are your hosts trading or plotting? telling p2p le-sharing and bots apart. In
Distributed Computing Systems (ICDCS), 2010 IEEE 30th International Conference on, pages 241{252.
IEEE, 2010.
References
55. CS C441 / CS F441 Second Semester 2013-14 BITS Pilani, Hyderabad Campus
[10] X. Yu, X. Dong, G. Yu, Y. Qin, D. Yue, and Y. Zhao. Online botnet detection based on incremental discrete
fourier transform. Journal of Networks, 5(5), 2010.
[11] J. Zhang, R. Perdisci, W. Lee, X. Luo, and U. Sarfraz. Building a scalable system for stealthy p2p-botnet
detection. Information Forensics and Security, IEEE Transactions on, 9(1):27{38, 2014.
[12] J. Zhang, R. Perdisci, W. Lee, U. Sarfraz, and X. Luo. Detecting stealthy p2p botnets using statistical trac
ngerprints. In Dependable Systems & Networks (DSN), 2011 IEEE/IFIP 41st International Conference on, pages
21{132. IEEE, 2011.
[13] S. Zhang. Conversation-based p2p botnet detection with decision fusion. Master's thesis, Fredericton:
University of New Brunswick, 2013.
References