SlideShare a Scribd company logo
1 of 28
Download to read offline
● Authors
○ Ivan
Letteri
○ Giuseppe
Della Penna
○ Pasquale
Caianiello
Feature Selection Strategies
for
HTTP Botnet Detection
University of L’Aquila (Italy)
Roadmap
● Goal
○ Classify the traffic
- Classify the traffic generated by Botnet
using Machine Learning models
Roadmap
● Goal
○ Classify the traffic
● Develop
○ Feature extraction
and detection
- Classify the traffic generated by Botnet
using Machine Learning models
- Develop a system for the extraction of
features for the detection of malicious
traffic
Roadmap
● Goal
○ Classify the traffic
● Develop
○ Feature extraction
and detection
● Challenge
○ HTTP
botnet detection
- Classify the traffic generated by Botnet
using Machine Learning models
- Develop a system for the extraction of
features for the detection of malicious
traffic
- Identify the traffic generated by
camouflaged Bot within the normal
HTTP traffic
Roadmap
● Goal
○ Classify the traffic
● Develop
○ Feature extraction
and detection
● Challenge
○ HTTP
botnet detection
Related Work
● Feature Importance
○ MIFS-ND
○ mRMR
○ Max info index
● Feature Selection
○ Hoque et al.
○ Peng et al.
○ Mitra et al.
- Hoque et al.: packet size-based features, average bytes and variance of bytes per packet
- Peng et al.: length in byte, number of packets, flow duration, TCP flags, length of flow
- Mitra et al.: entropy of packet sizes
“a good data source builds … a good classifier”
roBOTNETwork in a nutshell
● DDoS
● Mining
Cryptocurrency
● Steal sensitive data
● Send Spam
.....
- roBOT NETwork: is a huge network of compromised devices and connected to Internet
- controlled by a single entity called the Botmaster
- for benevolent and malicious purposes
Botnet life-cycle
● Steps
1. Initial infection
2. Secondary Injection
- Initial Infection is the process during which the victim’s machine is compromised
- Secondary Injection the victim downloads, executes and installs a copy of the bot binary code
Botnet life-cycle
● Steps
1. Initial infection
2. Secondary Injection
3. Connection
4. Attack Command
5. Update &
Maintenance
- Connection the bot contacts its C&C server to announce its presence (Rallying mechanism)
- Attack Command, the botmaster send commands giving rise to attacks (DDoS, Spam, phishing, etc...)
- Botmaster: is the last step to keep the bots active and updated
Dataset Construction
● Raw Dataset
○ realistic traffic
○ MCFProject
- Stratosphere Project a behavioral-based intrusion detection system that uses ML
- Packet Capture file (*.pcap) format from API for capturing network traffic
- Pandas a data manipulation library highly optimized for performance
Dataset Construction
● Raw Dataset
○ realistic traffic
○ MCFProject
● Develop
○ Feature extraction
and detection
- Flow <Source IP, Source Port, Destination IP, Destination Port, Protocol>
- Time Windows set to 15 minutes since web sessions typically have such duration
- Filter all data not required is removed from the flow sets (e.g., UDP packets)
Final HTTP-botnet dataset
Raw Dataset
○ realistic traffic
○ MCFProject
Develop
○ Feature extraction
and detection
Balanced Dataset
○ 50% HTTP botnet
○ 50% Bot traffic
Eight Features Selected & Extracted
● 2 entropy features
○ Packet count
○ Time gap
- Entropy packet count aggregates the flows that as the same destination address
- Entropy time gap is derived as the interval between the end of the flow and the beginning of the next
Features based on TCP Packet Ratios
● 2 entropy features
○ Packet count
○ Time gap
● 3 TCP flow features
○ In/Out tcp pkts
○ Ratio TCP
○ OneWay TCP pkt
- I/O ratio helps to identify the communication between a bot and its C&C
- ratio TCP helps to discover DDoS botnet attacks
- OneWay ratio TCP helps to identify a larger-than-usual number of failed or half-open, one-way
Features based on the TCP flags
● 2 entropy features
○ Packet count
○ Time gap
● 3 TCP flow features
○ In/Out tcp pkts
○ Ratio TCP
○ OneWay TCP pkt
● 3 TCP flags features
○ SYN flag active
○ FIN flag active
○ PSH flag active
- SYN flag set, cause a SYN flood attack sending a huge number of SYN requests
- FIN flag set, cause a FIN flood attack bots send a large number of spoofed FIN packets
- PSH flag set, cause a receiver is forced to flush its buffer even if it’s not filled
Exploratory Data Analysis
● Scatter Matrix
○ Covariance
○ Dimensional
reduction
● Values distribution
○ Boxplot
● Correlation matrix
○ data uncertainty
- Scatter Matrix provide an estimation of covariance matrix, and in dimensionality reduction
- Boxplot captures the data distribution of the data efficiently
- Correlation Matrix useful for Mutual Information analysis to measure dependence
Feature Selection via 4 Decision Trees & XGBoost
● Decision Trees
○ Extra Trees
○ Gradient Boost
○ Ada Boost
○ Random Forest
● XGBoost
○ Gradient
Boosting Trees
- DecisionTrees implement feature importance with SciKit Learn library
- XGBoost algo which predict a target variable by combining the estimates of a set of weaker models
Feature Selection through Decision Trees
● Decision Trees
○ Extra Trees
○ Gradient Boost
○ Ada Boost
○ Random Forest
● XGBoost
○ Gradient
Boosting Trees
● Evaluation
○ Feature
Importance
average
- Select out features which we consider less relevant for HTTP botnet detection
- IOratioTcp, nTcpFinal and Hcount are removed
- nTcpPsh seems to be not so important, although only slightly less then Hcount
Feature Selection through Mutual Information
● Partition Information
○ partition entropy
- Information H(i) where let i be a feature and Delta i be the partition induced by i
Feature Selection through Mutual Information
● Partition Information
○ partition entropy
● Conditional
Information
○ conditional
entropy
- Information H(i) where let i be a feature and Delta i be the partition induced by i
- Conditional Information let i, o be features, the conditional entropy is defined as the amount of
uncertainty
Feature Selection through Mutual Information
● Partition Information
○ partition entropy
● Conditional
Information
○ conditional
entropy
● Mutual
Information
- Information H(i) where let i be a feature and Delta i be the partition induced by i
- Conditional Information let i, o be features, the conditional entropy is defined as the amount of ...y
- Mutual Information the uncertainty in the partition ∆i that is removed by knowing ∆o and vice-versa
Feature Selection through Mutual Information
Hgap
IOratioTcp
ratio_Tcp
● Partition Information
○ partition entropy
● Conditional
Information
○ conditional
entropy
● Mutual
Information
- By removing features in order of lower score one by one
- and running our MLP classifier, we get their performance metrics
Experimentation
● MLP
○ 8 neurons IN layer
○ 3 hidden layers
○ 1 neuron OUTlayer
● Activation function
○ ReLU
○ Sigmoid
● Loss function
○ binary cross
entropy
- Hidden layer distribution 24 (3f), 16 (2f), 8 (1f), setting the learning rate to 0.001
- binary cross entropy as the loss function, the Adam optimizer
- 150 training epochs on 70% of dataset
Information Relevance Score
● Classification
Accuracy & Loss
○ MLP performance
metrics
○ Progressive
removing lowest
ranked features
○ Feat. Importance
vs
Mutual Informat.
- mutual information-based technique gets almost the same accuracy of the feature importance-based
- removing the three features with lowest ranking, the accuracy is only 0.03% less than the one obtained
with the feature importance ranking, but there is a further (0.72%) reduction of the loss
Conclusions
● Focus
○ HTTP botnet
detection
- the HTTP botnet grows year after year
Conclusions
● Focus
○ HTTP botnet
detection
● Experimentation
○ Feature Import.
○ Mutual Inform.
- the HTTP botnet grows year after year
- the mutual information strategy win ron the decision trees feature importance
Conclusions
● Focus
○ HTTP botnet
detection
● Experimentation
○ Feature import.
○ Mutual Inform.
● Results
○ Feature
relevance scores
- the HTTP botnet grows year after year
- the mutual information strategy win ron the decision trees feature importance
- how to expose the results observing accuracy and loss metrics by MLP model
Thank You
● for watching
● for listening
● ... your attention
- github.com/IvanLetteri/ - https:// ivanletteri.it
- www.linkedin.com/in/ivan-letteri-6516b427/

More Related Content

Similar to Feature Selection Strategies for HTTP Botnet Traffic Detection

OpenTelemetry For Architects
OpenTelemetry For ArchitectsOpenTelemetry For Architects
OpenTelemetry For ArchitectsKevin Brockhoff
 
Mirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in GoMirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in Golinuxlab_conf
 
Secure 2019 - APT for Everyone - Adversary Simulations based on ATT&CK Framework
Secure 2019 - APT for Everyone - Adversary Simulations based on ATT&CK FrameworkSecure 2019 - APT for Everyone - Adversary Simulations based on ATT&CK Framework
Secure 2019 - APT for Everyone - Adversary Simulations based on ATT&CK FrameworkLeszek Mi?
 
Data Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQLData Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQLEDB
 
Characterizing and Detecting Livestreaming Chatbots
Characterizing and Detecting Livestreaming Chatbots Characterizing and Detecting Livestreaming Chatbots
Characterizing and Detecting Livestreaming Chatbots IIIT Hyderabad
 
OpenTelemetry For Operators
OpenTelemetry For OperatorsOpenTelemetry For Operators
OpenTelemetry For OperatorsKevin Brockhoff
 
Impact of Asymmetry of Internet Traffic for Heuristic Based Classification
Impact of Asymmetry of Internet Traffic for Heuristic Based ClassificationImpact of Asymmetry of Internet Traffic for Heuristic Based Classification
Impact of Asymmetry of Internet Traffic for Heuristic Based ClassificationCSCJournals
 
Uber mobility - High Performance Networking
Uber mobility - High Performance NetworkingUber mobility - High Performance Networking
Uber mobility - High Performance NetworkingDhaval Patel
 
Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataJames Sirota
 
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...Red Hat Developers
 
Network Monitoring System ppt.pdf
Network Monitoring System ppt.pdfNetwork Monitoring System ppt.pdf
Network Monitoring System ppt.pdfkristinatemen
 
network monitoring system ppt
network monitoring system pptnetwork monitoring system ppt
network monitoring system pptashutosh rai
 
Universal metrics with Apache Beam
Universal metrics with Apache BeamUniversal metrics with Apache Beam
Universal metrics with Apache BeamEtienne Chauchot
 
Understanding Business APIs through statistics
Understanding Business APIs through statisticsUnderstanding Business APIs through statistics
Understanding Business APIs through statisticsWSO2
 
HKNOG 6.0 Next Generation Networks - will automation put us out of jobs?
HKNOG 6.0 Next Generation Networks - will automation put us out of jobs?HKNOG 6.0 Next Generation Networks - will automation put us out of jobs?
HKNOG 6.0 Next Generation Networks - will automation put us out of jobs?Tom Paseka
 
Computer Architecture and Organization
Computer Architecture and OrganizationComputer Architecture and Organization
Computer Architecture and Organizationssuserdfc773
 
Elasticsearch Performance Testing and Scaling @ Signal
Elasticsearch Performance Testing and Scaling @ SignalElasticsearch Performance Testing and Scaling @ Signal
Elasticsearch Performance Testing and Scaling @ SignalJoachim Draeger
 
Networking essentials lect3
Networking essentials lect3Networking essentials lect3
Networking essentials lect3Roman Brovko
 
Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDataWorks Summit
 

Similar to Feature Selection Strategies for HTTP Botnet Traffic Detection (20)

OpenTelemetry For Architects
OpenTelemetry For ArchitectsOpenTelemetry For Architects
OpenTelemetry For Architects
 
Mirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in GoMirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in Go
 
Secure 2019 - APT for Everyone - Adversary Simulations based on ATT&CK Framework
Secure 2019 - APT for Everyone - Adversary Simulations based on ATT&CK FrameworkSecure 2019 - APT for Everyone - Adversary Simulations based on ATT&CK Framework
Secure 2019 - APT for Everyone - Adversary Simulations based on ATT&CK Framework
 
Data Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQLData Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQL
 
Characterizing and Detecting Livestreaming Chatbots
Characterizing and Detecting Livestreaming Chatbots Characterizing and Detecting Livestreaming Chatbots
Characterizing and Detecting Livestreaming Chatbots
 
OpenTelemetry For Operators
OpenTelemetry For OperatorsOpenTelemetry For Operators
OpenTelemetry For Operators
 
Impact of Asymmetry of Internet Traffic for Heuristic Based Classification
Impact of Asymmetry of Internet Traffic for Heuristic Based ClassificationImpact of Asymmetry of Internet Traffic for Heuristic Based Classification
Impact of Asymmetry of Internet Traffic for Heuristic Based Classification
 
Uber mobility - High Performance Networking
Uber mobility - High Performance NetworkingUber mobility - High Performance Networking
Uber mobility - High Performance Networking
 
Network traffic analysis course
Network traffic analysis courseNetwork traffic analysis course
Network traffic analysis course
 
Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking Data
 
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
 
Network Monitoring System ppt.pdf
Network Monitoring System ppt.pdfNetwork Monitoring System ppt.pdf
Network Monitoring System ppt.pdf
 
network monitoring system ppt
network monitoring system pptnetwork monitoring system ppt
network monitoring system ppt
 
Universal metrics with Apache Beam
Universal metrics with Apache BeamUniversal metrics with Apache Beam
Universal metrics with Apache Beam
 
Understanding Business APIs through statistics
Understanding Business APIs through statisticsUnderstanding Business APIs through statistics
Understanding Business APIs through statistics
 
HKNOG 6.0 Next Generation Networks - will automation put us out of jobs?
HKNOG 6.0 Next Generation Networks - will automation put us out of jobs?HKNOG 6.0 Next Generation Networks - will automation put us out of jobs?
HKNOG 6.0 Next Generation Networks - will automation put us out of jobs?
 
Computer Architecture and Organization
Computer Architecture and OrganizationComputer Architecture and Organization
Computer Architecture and Organization
 
Elasticsearch Performance Testing and Scaling @ Signal
Elasticsearch Performance Testing and Scaling @ SignalElasticsearch Performance Testing and Scaling @ Signal
Elasticsearch Performance Testing and Scaling @ Signal
 
Networking essentials lect3
Networking essentials lect3Networking essentials lect3
Networking essentials lect3
 
Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking Data
 

Recently uploaded

LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Cytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptxCytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptxVarshiniMK
 
Welcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayWelcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayZachary Labe
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10ROLANARIBATO3
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
TOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptxTOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptxdharshini369nike
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555kikilily0909
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaPraksha3
 
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |aasikanpl
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)DHURKADEVIBASKAR
 

Recently uploaded (20)

LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Cytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptxCytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptx
 
Welcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayWelcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work Day
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
TOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptxTOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptx
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
 
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)
 

Feature Selection Strategies for HTTP Botnet Traffic Detection

  • 1. ● Authors ○ Ivan Letteri ○ Giuseppe Della Penna ○ Pasquale Caianiello Feature Selection Strategies for HTTP Botnet Detection University of L’Aquila (Italy)
  • 2. Roadmap ● Goal ○ Classify the traffic - Classify the traffic generated by Botnet using Machine Learning models
  • 3. Roadmap ● Goal ○ Classify the traffic ● Develop ○ Feature extraction and detection - Classify the traffic generated by Botnet using Machine Learning models - Develop a system for the extraction of features for the detection of malicious traffic
  • 4. Roadmap ● Goal ○ Classify the traffic ● Develop ○ Feature extraction and detection ● Challenge ○ HTTP botnet detection - Classify the traffic generated by Botnet using Machine Learning models - Develop a system for the extraction of features for the detection of malicious traffic - Identify the traffic generated by camouflaged Bot within the normal HTTP traffic
  • 5. Roadmap ● Goal ○ Classify the traffic ● Develop ○ Feature extraction and detection ● Challenge ○ HTTP botnet detection
  • 6. Related Work ● Feature Importance ○ MIFS-ND ○ mRMR ○ Max info index ● Feature Selection ○ Hoque et al. ○ Peng et al. ○ Mitra et al. - Hoque et al.: packet size-based features, average bytes and variance of bytes per packet - Peng et al.: length in byte, number of packets, flow duration, TCP flags, length of flow - Mitra et al.: entropy of packet sizes “a good data source builds … a good classifier”
  • 7. roBOTNETwork in a nutshell ● DDoS ● Mining Cryptocurrency ● Steal sensitive data ● Send Spam ..... - roBOT NETwork: is a huge network of compromised devices and connected to Internet - controlled by a single entity called the Botmaster - for benevolent and malicious purposes
  • 8. Botnet life-cycle ● Steps 1. Initial infection 2. Secondary Injection - Initial Infection is the process during which the victim’s machine is compromised - Secondary Injection the victim downloads, executes and installs a copy of the bot binary code
  • 9. Botnet life-cycle ● Steps 1. Initial infection 2. Secondary Injection 3. Connection 4. Attack Command 5. Update & Maintenance - Connection the bot contacts its C&C server to announce its presence (Rallying mechanism) - Attack Command, the botmaster send commands giving rise to attacks (DDoS, Spam, phishing, etc...) - Botmaster: is the last step to keep the bots active and updated
  • 10. Dataset Construction ● Raw Dataset ○ realistic traffic ○ MCFProject - Stratosphere Project a behavioral-based intrusion detection system that uses ML - Packet Capture file (*.pcap) format from API for capturing network traffic - Pandas a data manipulation library highly optimized for performance
  • 11. Dataset Construction ● Raw Dataset ○ realistic traffic ○ MCFProject ● Develop ○ Feature extraction and detection - Flow <Source IP, Source Port, Destination IP, Destination Port, Protocol> - Time Windows set to 15 minutes since web sessions typically have such duration - Filter all data not required is removed from the flow sets (e.g., UDP packets)
  • 12. Final HTTP-botnet dataset Raw Dataset ○ realistic traffic ○ MCFProject Develop ○ Feature extraction and detection Balanced Dataset ○ 50% HTTP botnet ○ 50% Bot traffic
  • 13. Eight Features Selected & Extracted ● 2 entropy features ○ Packet count ○ Time gap - Entropy packet count aggregates the flows that as the same destination address - Entropy time gap is derived as the interval between the end of the flow and the beginning of the next
  • 14. Features based on TCP Packet Ratios ● 2 entropy features ○ Packet count ○ Time gap ● 3 TCP flow features ○ In/Out tcp pkts ○ Ratio TCP ○ OneWay TCP pkt - I/O ratio helps to identify the communication between a bot and its C&C - ratio TCP helps to discover DDoS botnet attacks - OneWay ratio TCP helps to identify a larger-than-usual number of failed or half-open, one-way
  • 15. Features based on the TCP flags ● 2 entropy features ○ Packet count ○ Time gap ● 3 TCP flow features ○ In/Out tcp pkts ○ Ratio TCP ○ OneWay TCP pkt ● 3 TCP flags features ○ SYN flag active ○ FIN flag active ○ PSH flag active - SYN flag set, cause a SYN flood attack sending a huge number of SYN requests - FIN flag set, cause a FIN flood attack bots send a large number of spoofed FIN packets - PSH flag set, cause a receiver is forced to flush its buffer even if it’s not filled
  • 16. Exploratory Data Analysis ● Scatter Matrix ○ Covariance ○ Dimensional reduction ● Values distribution ○ Boxplot ● Correlation matrix ○ data uncertainty - Scatter Matrix provide an estimation of covariance matrix, and in dimensionality reduction - Boxplot captures the data distribution of the data efficiently - Correlation Matrix useful for Mutual Information analysis to measure dependence
  • 17. Feature Selection via 4 Decision Trees & XGBoost ● Decision Trees ○ Extra Trees ○ Gradient Boost ○ Ada Boost ○ Random Forest ● XGBoost ○ Gradient Boosting Trees - DecisionTrees implement feature importance with SciKit Learn library - XGBoost algo which predict a target variable by combining the estimates of a set of weaker models
  • 18. Feature Selection through Decision Trees ● Decision Trees ○ Extra Trees ○ Gradient Boost ○ Ada Boost ○ Random Forest ● XGBoost ○ Gradient Boosting Trees ● Evaluation ○ Feature Importance average - Select out features which we consider less relevant for HTTP botnet detection - IOratioTcp, nTcpFinal and Hcount are removed - nTcpPsh seems to be not so important, although only slightly less then Hcount
  • 19. Feature Selection through Mutual Information ● Partition Information ○ partition entropy - Information H(i) where let i be a feature and Delta i be the partition induced by i
  • 20. Feature Selection through Mutual Information ● Partition Information ○ partition entropy ● Conditional Information ○ conditional entropy - Information H(i) where let i be a feature and Delta i be the partition induced by i - Conditional Information let i, o be features, the conditional entropy is defined as the amount of uncertainty
  • 21. Feature Selection through Mutual Information ● Partition Information ○ partition entropy ● Conditional Information ○ conditional entropy ● Mutual Information - Information H(i) where let i be a feature and Delta i be the partition induced by i - Conditional Information let i, o be features, the conditional entropy is defined as the amount of ...y - Mutual Information the uncertainty in the partition ∆i that is removed by knowing ∆o and vice-versa
  • 22. Feature Selection through Mutual Information Hgap IOratioTcp ratio_Tcp ● Partition Information ○ partition entropy ● Conditional Information ○ conditional entropy ● Mutual Information - By removing features in order of lower score one by one - and running our MLP classifier, we get their performance metrics
  • 23. Experimentation ● MLP ○ 8 neurons IN layer ○ 3 hidden layers ○ 1 neuron OUTlayer ● Activation function ○ ReLU ○ Sigmoid ● Loss function ○ binary cross entropy - Hidden layer distribution 24 (3f), 16 (2f), 8 (1f), setting the learning rate to 0.001 - binary cross entropy as the loss function, the Adam optimizer - 150 training epochs on 70% of dataset
  • 24. Information Relevance Score ● Classification Accuracy & Loss ○ MLP performance metrics ○ Progressive removing lowest ranked features ○ Feat. Importance vs Mutual Informat. - mutual information-based technique gets almost the same accuracy of the feature importance-based - removing the three features with lowest ranking, the accuracy is only 0.03% less than the one obtained with the feature importance ranking, but there is a further (0.72%) reduction of the loss
  • 25. Conclusions ● Focus ○ HTTP botnet detection - the HTTP botnet grows year after year
  • 26. Conclusions ● Focus ○ HTTP botnet detection ● Experimentation ○ Feature Import. ○ Mutual Inform. - the HTTP botnet grows year after year - the mutual information strategy win ron the decision trees feature importance
  • 27. Conclusions ● Focus ○ HTTP botnet detection ● Experimentation ○ Feature import. ○ Mutual Inform. ● Results ○ Feature relevance scores - the HTTP botnet grows year after year - the mutual information strategy win ron the decision trees feature importance - how to expose the results observing accuracy and loss metrics by MLP model
  • 28. Thank You ● for watching ● for listening ● ... your attention - github.com/IvanLetteri/ - https:// ivanletteri.it - www.linkedin.com/in/ivan-letteri-6516b427/