SlideShare a Scribd company logo
Real Time classification of
malicious URLs
Daniyar Mukhanov, Chandan Gowda
Introduction
- Malicous software in Online Social Network (OSN)
Malicous web sites are top 3 thread to enterprise security
- Koobface virus. Anagram of word “Facebook”
Koobface
Twitter
Cyber criminals can piggyback on events to share malicious URL-s
Aim of paper
Develop a real-time machine classification system to distinguish between malicious
and benign URLs within seconds of the URL being clicked
Training several machine classification models by getting data during two large
sport events:
- Superbowl
- Cricket World Cup
Related Work
- Malware propagation and Social networks
- Classifying malicious web pages
Malware propagation and Social networks
- Low degree of connections is not an obstacle
- Highly clustered networks slows propagation
- Large-scale events are ideal for spreading malware
Classifying malicious webpages
used static analysis of scripts embedded within a Web page
Static code analysis to detect evasive malware
Honeypots to interact with malicious content and anti-virus to analyse the
malicious content
Static code Vs Run-time analysis
Data collection
American Super Bowl; to train data
Cricket World Cup; to test data
- #superbowlXLIX - 122 542 URL containing tweets
- #CWC15 - 7961 URL containing tweets
Identifying malicious URLs
- Client-side honeypot system
- Low interaction honeypots and high interaction honeypots
- The Capture HPC toolkit
- 5 minutes of visit
Architecture for suspicious URL annotation
- Capture HPC operates in VM
- User can specify own omission or inclusion rule
Sampling and Feature Identification
• Data has been collected from twitter with the help of Tweepy.
• Data from one event used to train a classifier and data from another event is
used to test the model’s generalizability.
• Super Bowl training data contained 1000 URLs as Malicious and Benign each.
• Cricket World Cup testing data contained 891 Malicious URLs and 1100
Benign.
Sampling and Feature Identification
- 80% of URLs from Cricket World Cup found to be malicious
Metrics:
- CPU
- Connection established
- Port Number
- Process ID
- Remote IP
- Network Interface
- Bytes sent/received
Baseline Model Selection
Data modelling activity is intended for:
• Extracting features from machine activity that would help predict malicious behaviour during
an interaction with a URL
• To connect the dots between machine activity and malicious behaviour
• Generative Vs Discriminative models
• Data acquired can include logs of machine activity even during idle system state.
• Hence it is likely there is noise as well as malicious behaviour recorded in those logs.
Statistics for Trained and Test Datasets
t
● High variance in mean recorded values
for CPU usage, bytes/packets
sent/received and ports used.
● But Standard Deviation is very similar for
both the data sets.
Baseline Model Selection
• Datasets contained well balanced number of malicious and benign activity logs but
largely benign.
• This could have an impact on the effectiveness of a discriminative classifier.
• Identifying decision boundaries where the inputs may not be linearly separable.
• So in this case, a generative model suits better.
Choosing classifiers
Generative Models
1. Bayesian Classifier
2. Naïve Bayesian Classifier
Discriminative Models
1. J48 Decision Tree
2. Multi Layer Perception Model (MLP)
Baseline Model Results- Generative Models
The low error rates at t=60 in Bayesian model during training phase suggest:
1. The features that we’re using to build the models are predictive of malicious activities
2. Malicious activities are occurring within first 60 seconds of interaction.
3. There are conditional dependencies between variables.
Baseline Model Results- Discriminative Models
• MLP has a precision of 0.720 at t=30, only slightly below its optimum level. But it demonstrates the model’s ability to
reduce false positives early on.
Classifier Performance over time
● This chart depicts correctly classified
instances over a period of time incrementally.
● Discriminative models outperform generative
models.
● This suggests that certain malicious activities
are linearly separable from benign behaviour.
● the model, Naive Bayesian fails to perform
well.
● MLP model outsmarted the rest of the
classifiers.
Model Analysis
● MLP produced 9 hidden nodes and the table
shows weightings given for each
class(Benign/Malicious)
● Here node 9 stands out with higher weight
for malicious behaviour
NODE WEIGHTS BY CLASS
Model Analysis
● Node 9 holds highest value for bytes received
variable.
● Compare it with Node 3 for Bytes sent/received
and Packets sent/received
● This is an interesting find as we know Node 9
was involved with malicious links.
● Most important discovery is in the connection
attribute which is weighted high for Node 1.
● Subsequently Remote IP and Bytes Sent also
receive a massive hike. Suggestive of an attack.
MLP ANALYSIS
Sampled learning
Correctly classified instances with sampled
training data
Conclusion
- Endpoint is not clear from tweets
- MLP model performed best on unseen data 72%
- Bayesian approach performed best in early stages of interaction 66%
- Twitter recently introduced new policies to protect from harm.

More Related Content

Viewers also liked

Real-time Classification of Malicious URLs on Twitter using Machine Activity ...
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...Real-time Classification of Malicious URLs on Twitter using Machine Activity ...
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...
Pete Burnap
 
REAL-TIME DETECTION OF TRAFFIC FROM TWITTER STREAM ANALYSIS
REAL-TIME DETECTION OF TRAFFIC FROM TWITTER STREAM ANALYSISREAL-TIME DETECTION OF TRAFFIC FROM TWITTER STREAM ANALYSIS
REAL-TIME DETECTION OF TRAFFIC FROM TWITTER STREAM ANALYSIS
I3E Technologies
 
Amazon marketplace
Amazon marketplaceAmazon marketplace
Amazon marketplace
Daniyar Mukhanov
 
Weka
WekaWeka
Sharing economy-2
Sharing economy-2Sharing economy-2
Sharing economy-2
Daniyar Mukhanov
 
Weka.arff
Weka.arffWeka.arff
Weka.arff
Daniyar Mukhanov
 
Amazon mp
Amazon mpAmazon mp
Twitter r t under crisis
Twitter r t under crisisTwitter r t under crisis
Twitter r t under crisis
Clement Robert Habimana
 
Fighting spam using social gate keepers
Fighting spam using social gate keepersFighting spam using social gate keepers
Fighting spam using social gate keepers
Clement Robert Habimana
 
Weka_Manual_Sagar
Weka_Manual_SagarWeka_Manual_Sagar
Weka_Manual_Sagar
Sagar Kumar
 
Weka
WekaWeka
Weka
Shuang Wu
 
Weka presentation cmt111
Weka presentation cmt111Weka presentation cmt111
Weka presentation cmt111
Clement Robert Habimana
 
Social influence and political mobilization
Social influence and political mobilizationSocial influence and political mobilization
Social influence and political mobilization
Daniyar Mukhanov
 
Predictive Analytics: It's The Intervention That Matters
Predictive Analytics: It's The Intervention That MattersPredictive Analytics: It's The Intervention That Matters
Predictive Analytics: It's The Intervention That Matters
Health Catalyst
 
An Introduction To Weka
An Introduction To WekaAn Introduction To Weka
An Introduction To Weka
DataminingTools Inc
 

Viewers also liked (16)

Real-time Classification of Malicious URLs on Twitter using Machine Activity ...
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...Real-time Classification of Malicious URLs on Twitter using Machine Activity ...
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...
 
REAL-TIME DETECTION OF TRAFFIC FROM TWITTER STREAM ANALYSIS
REAL-TIME DETECTION OF TRAFFIC FROM TWITTER STREAM ANALYSISREAL-TIME DETECTION OF TRAFFIC FROM TWITTER STREAM ANALYSIS
REAL-TIME DETECTION OF TRAFFIC FROM TWITTER STREAM ANALYSIS
 
Amazon marketplace
Amazon marketplaceAmazon marketplace
Amazon marketplace
 
Weka
WekaWeka
Weka
 
Sharing economy-2
Sharing economy-2Sharing economy-2
Sharing economy-2
 
Weka.arff
Weka.arffWeka.arff
Weka.arff
 
Amazon mp
Amazon mpAmazon mp
Amazon mp
 
Twitter r t under crisis
Twitter r t under crisisTwitter r t under crisis
Twitter r t under crisis
 
Fighting spam using social gate keepers
Fighting spam using social gate keepersFighting spam using social gate keepers
Fighting spam using social gate keepers
 
Weka
WekaWeka
Weka
 
Weka_Manual_Sagar
Weka_Manual_SagarWeka_Manual_Sagar
Weka_Manual_Sagar
 
Weka
WekaWeka
Weka
 
Weka presentation cmt111
Weka presentation cmt111Weka presentation cmt111
Weka presentation cmt111
 
Social influence and political mobilization
Social influence and political mobilizationSocial influence and political mobilization
Social influence and political mobilization
 
Predictive Analytics: It's The Intervention That Matters
Predictive Analytics: It's The Intervention That MattersPredictive Analytics: It's The Intervention That Matters
Predictive Analytics: It's The Intervention That Matters
 
An Introduction To Weka
An Introduction To WekaAn Introduction To Weka
An Introduction To Weka
 

Similar to Real time classification of malicious urls.pptx 2

Navy security contest-bigdataforsecurity
Navy security contest-bigdataforsecurityNavy security contest-bigdataforsecurity
Navy security contest-bigdataforsecurity
stelligence
 
DATI, AI E ROBOTICA @POLITO
DATI, AI E ROBOTICA @POLITODATI, AI E ROBOTICA @POLITO
DATI, AI E ROBOTICA @POLITO
MarcoMellia
 
IRJET - Twitter Spam Detection using Cobweb
IRJET - Twitter Spam Detection using CobwebIRJET - Twitter Spam Detection using Cobweb
IRJET - Twitter Spam Detection using Cobweb
IRJET Journal
 
Machine Learning + Analytics in Splunk
Machine Learning + Analytics in Splunk Machine Learning + Analytics in Splunk
Machine Learning + Analytics in Splunk
Splunk
 
Detection of Phishing Websites using machine Learning Algorithm
Detection of Phishing Websites using machine Learning AlgorithmDetection of Phishing Websites using machine Learning Algorithm
Detection of Phishing Websites using machine Learning Algorithm
IRJET Journal
 
CREDIT CARD FRAUD DETECTION
CREDIT CARD FRAUD DETECTION CREDIT CARD FRAUD DETECTION
CREDIT CARD FRAUD DETECTION
K Srinivas Rao
 
Banking Circle: Money Laundering Beware: A Modern Approach to AML with Machin...
Banking Circle: Money Laundering Beware: A Modern Approach to AML with Machin...Banking Circle: Money Laundering Beware: A Modern Approach to AML with Machin...
Banking Circle: Money Laundering Beware: A Modern Approach to AML with Machin...
Neo4j
 
spamzombieppt
spamzombiepptspamzombieppt
spamzombieppt
kajol agarwal
 
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORK
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORKMALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORK
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORK
ijcseit
 
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORK
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORKMALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORK
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORK
ijcseit
 
Analysis on Fraud Detection Mechanisms Using Machine Learning Techniques
Analysis on Fraud Detection Mechanisms Using Machine Learning TechniquesAnalysis on Fraud Detection Mechanisms Using Machine Learning Techniques
Analysis on Fraud Detection Mechanisms Using Machine Learning Techniques
IRJET Journal
 
PDMLP: PHISHING DETECTION USING MULTILAYER PERCEPTRON
PDMLP: PHISHING DETECTION USING MULTILAYER PERCEPTRONPDMLP: PHISHING DETECTION USING MULTILAYER PERCEPTRON
PDMLP: PHISHING DETECTION USING MULTILAYER PERCEPTRON
IJNSA Journal
 
BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6
Rod Soto
 
IRJET - Chrome Extension for Detecting Phishing Websites
IRJET -  	  Chrome Extension for Detecting Phishing WebsitesIRJET -  	  Chrome Extension for Detecting Phishing Websites
IRJET - Chrome Extension for Detecting Phishing Websites
IRJET Journal
 
Data mining final report
Data mining final reportData mining final report
Data mining final report
Kedar Kumar
 
李育杰/The Growth of a Data Scientist
李育杰/The Growth of a Data Scientist李育杰/The Growth of a Data Scientist
李育杰/The Growth of a Data Scientist
台灣資料科學年會
 
Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...
Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...
Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...
Luigi Vanfretti
 
Validation Is (Not) Easy
Validation Is (Not) EasyValidation Is (Not) Easy
Validation Is (Not) Easy
Dmytro Panchenko
 
A review of machine learning based anomaly detection
A review of machine learning based anomaly detectionA review of machine learning based anomaly detection
A review of machine learning based anomaly detection
Mohamed Elfadly
 
A review of machine learning based anomaly detection
A review of machine learning based anomaly detectionA review of machine learning based anomaly detection
A review of machine learning based anomaly detection
Mohamed Elfadly
 

Similar to Real time classification of malicious urls.pptx 2 (20)

Navy security contest-bigdataforsecurity
Navy security contest-bigdataforsecurityNavy security contest-bigdataforsecurity
Navy security contest-bigdataforsecurity
 
DATI, AI E ROBOTICA @POLITO
DATI, AI E ROBOTICA @POLITODATI, AI E ROBOTICA @POLITO
DATI, AI E ROBOTICA @POLITO
 
IRJET - Twitter Spam Detection using Cobweb
IRJET - Twitter Spam Detection using CobwebIRJET - Twitter Spam Detection using Cobweb
IRJET - Twitter Spam Detection using Cobweb
 
Machine Learning + Analytics in Splunk
Machine Learning + Analytics in Splunk Machine Learning + Analytics in Splunk
Machine Learning + Analytics in Splunk
 
Detection of Phishing Websites using machine Learning Algorithm
Detection of Phishing Websites using machine Learning AlgorithmDetection of Phishing Websites using machine Learning Algorithm
Detection of Phishing Websites using machine Learning Algorithm
 
CREDIT CARD FRAUD DETECTION
CREDIT CARD FRAUD DETECTION CREDIT CARD FRAUD DETECTION
CREDIT CARD FRAUD DETECTION
 
Banking Circle: Money Laundering Beware: A Modern Approach to AML with Machin...
Banking Circle: Money Laundering Beware: A Modern Approach to AML with Machin...Banking Circle: Money Laundering Beware: A Modern Approach to AML with Machin...
Banking Circle: Money Laundering Beware: A Modern Approach to AML with Machin...
 
spamzombieppt
spamzombiepptspamzombieppt
spamzombieppt
 
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORK
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORKMALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORK
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORK
 
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORK
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORKMALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORK
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORK
 
Analysis on Fraud Detection Mechanisms Using Machine Learning Techniques
Analysis on Fraud Detection Mechanisms Using Machine Learning TechniquesAnalysis on Fraud Detection Mechanisms Using Machine Learning Techniques
Analysis on Fraud Detection Mechanisms Using Machine Learning Techniques
 
PDMLP: PHISHING DETECTION USING MULTILAYER PERCEPTRON
PDMLP: PHISHING DETECTION USING MULTILAYER PERCEPTRONPDMLP: PHISHING DETECTION USING MULTILAYER PERCEPTRON
PDMLP: PHISHING DETECTION USING MULTILAYER PERCEPTRON
 
BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6
 
IRJET - Chrome Extension for Detecting Phishing Websites
IRJET -  	  Chrome Extension for Detecting Phishing WebsitesIRJET -  	  Chrome Extension for Detecting Phishing Websites
IRJET - Chrome Extension for Detecting Phishing Websites
 
Data mining final report
Data mining final reportData mining final report
Data mining final report
 
李育杰/The Growth of a Data Scientist
李育杰/The Growth of a Data Scientist李育杰/The Growth of a Data Scientist
李育杰/The Growth of a Data Scientist
 
Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...
Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...
Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...
 
Validation Is (Not) Easy
Validation Is (Not) EasyValidation Is (Not) Easy
Validation Is (Not) Easy
 
A review of machine learning based anomaly detection
A review of machine learning based anomaly detectionA review of machine learning based anomaly detection
A review of machine learning based anomaly detection
 
A review of machine learning based anomaly detection
A review of machine learning based anomaly detectionA review of machine learning based anomaly detection
A review of machine learning based anomaly detection
 

Recently uploaded

Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
siemaillard
 
Leveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit InnovationLeveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit Innovation
TechSoup
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
Nguyen Thanh Tu Collection
 
Electric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger HuntElectric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger Hunt
RamseyBerglund
 
Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.
IsmaelVazquez38
 
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
National Information Standards Organization (NISO)
 
Data Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsxData Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsx
Prof. Dr. K. Adisesha
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
haiqairshad
 
Juneteenth Freedom Day 2024 David Douglas School District
Juneteenth Freedom Day 2024 David Douglas School DistrictJuneteenth Freedom Day 2024 David Douglas School District
Juneteenth Freedom Day 2024 David Douglas School District
David Douglas School District
 
skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)
Mohammad Al-Dhahabi
 
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptxChapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Denish Jangid
 
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
imrankhan141184
 
Educational Technology in the Health Sciences
Educational Technology in the Health SciencesEducational Technology in the Health Sciences
Educational Technology in the Health Sciences
Iris Thiele Isip-Tan
 
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
TechSoup
 
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptxNEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
iammrhaywood
 
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
indexPub
 
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxBeyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
EduSkills OECD
 
Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10
nitinpv4ai
 
Standardized tool for Intelligence test.
Standardized tool for Intelligence test.Standardized tool for Intelligence test.
Standardized tool for Intelligence test.
deepaannamalai16
 
Nutrition Inc FY 2024, 4 - Hour Training
Nutrition Inc FY 2024, 4 - Hour TrainingNutrition Inc FY 2024, 4 - Hour Training
Nutrition Inc FY 2024, 4 - Hour Training
melliereed
 

Recently uploaded (20)

Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
 
Leveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit InnovationLeveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit Innovation
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
 
Electric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger HuntElectric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger Hunt
 
Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.
 
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
 
Data Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsxData Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsx
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
 
Juneteenth Freedom Day 2024 David Douglas School District
Juneteenth Freedom Day 2024 David Douglas School DistrictJuneteenth Freedom Day 2024 David Douglas School District
Juneteenth Freedom Day 2024 David Douglas School District
 
skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)
 
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptxChapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptx
 
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
 
Educational Technology in the Health Sciences
Educational Technology in the Health SciencesEducational Technology in the Health Sciences
Educational Technology in the Health Sciences
 
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
 
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptxNEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
 
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
 
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxBeyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
 
Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10
 
Standardized tool for Intelligence test.
Standardized tool for Intelligence test.Standardized tool for Intelligence test.
Standardized tool for Intelligence test.
 
Nutrition Inc FY 2024, 4 - Hour Training
Nutrition Inc FY 2024, 4 - Hour TrainingNutrition Inc FY 2024, 4 - Hour Training
Nutrition Inc FY 2024, 4 - Hour Training
 

Real time classification of malicious urls.pptx 2

  • 1. Real Time classification of malicious URLs Daniyar Mukhanov, Chandan Gowda
  • 2. Introduction - Malicous software in Online Social Network (OSN) Malicous web sites are top 3 thread to enterprise security - Koobface virus. Anagram of word “Facebook”
  • 4. Twitter Cyber criminals can piggyback on events to share malicious URL-s
  • 5. Aim of paper Develop a real-time machine classification system to distinguish between malicious and benign URLs within seconds of the URL being clicked Training several machine classification models by getting data during two large sport events: - Superbowl - Cricket World Cup
  • 6. Related Work - Malware propagation and Social networks - Classifying malicious web pages
  • 7. Malware propagation and Social networks - Low degree of connections is not an obstacle - Highly clustered networks slows propagation - Large-scale events are ideal for spreading malware
  • 8. Classifying malicious webpages used static analysis of scripts embedded within a Web page Static code analysis to detect evasive malware Honeypots to interact with malicious content and anti-virus to analyse the malicious content Static code Vs Run-time analysis
  • 9. Data collection American Super Bowl; to train data Cricket World Cup; to test data - #superbowlXLIX - 122 542 URL containing tweets - #CWC15 - 7961 URL containing tweets
  • 10. Identifying malicious URLs - Client-side honeypot system - Low interaction honeypots and high interaction honeypots - The Capture HPC toolkit - 5 minutes of visit
  • 11. Architecture for suspicious URL annotation - Capture HPC operates in VM - User can specify own omission or inclusion rule
  • 12. Sampling and Feature Identification • Data has been collected from twitter with the help of Tweepy. • Data from one event used to train a classifier and data from another event is used to test the model’s generalizability. • Super Bowl training data contained 1000 URLs as Malicious and Benign each. • Cricket World Cup testing data contained 891 Malicious URLs and 1100 Benign.
  • 13. Sampling and Feature Identification - 80% of URLs from Cricket World Cup found to be malicious Metrics: - CPU - Connection established - Port Number - Process ID - Remote IP - Network Interface - Bytes sent/received
  • 14. Baseline Model Selection Data modelling activity is intended for: • Extracting features from machine activity that would help predict malicious behaviour during an interaction with a URL • To connect the dots between machine activity and malicious behaviour • Generative Vs Discriminative models • Data acquired can include logs of machine activity even during idle system state. • Hence it is likely there is noise as well as malicious behaviour recorded in those logs.
  • 15. Statistics for Trained and Test Datasets t ● High variance in mean recorded values for CPU usage, bytes/packets sent/received and ports used. ● But Standard Deviation is very similar for both the data sets.
  • 16. Baseline Model Selection • Datasets contained well balanced number of malicious and benign activity logs but largely benign. • This could have an impact on the effectiveness of a discriminative classifier. • Identifying decision boundaries where the inputs may not be linearly separable. • So in this case, a generative model suits better.
  • 17. Choosing classifiers Generative Models 1. Bayesian Classifier 2. Naïve Bayesian Classifier Discriminative Models 1. J48 Decision Tree 2. Multi Layer Perception Model (MLP)
  • 18. Baseline Model Results- Generative Models The low error rates at t=60 in Bayesian model during training phase suggest: 1. The features that we’re using to build the models are predictive of malicious activities 2. Malicious activities are occurring within first 60 seconds of interaction. 3. There are conditional dependencies between variables.
  • 19. Baseline Model Results- Discriminative Models • MLP has a precision of 0.720 at t=30, only slightly below its optimum level. But it demonstrates the model’s ability to reduce false positives early on.
  • 20. Classifier Performance over time ● This chart depicts correctly classified instances over a period of time incrementally. ● Discriminative models outperform generative models. ● This suggests that certain malicious activities are linearly separable from benign behaviour. ● the model, Naive Bayesian fails to perform well. ● MLP model outsmarted the rest of the classifiers.
  • 21. Model Analysis ● MLP produced 9 hidden nodes and the table shows weightings given for each class(Benign/Malicious) ● Here node 9 stands out with higher weight for malicious behaviour NODE WEIGHTS BY CLASS
  • 22. Model Analysis ● Node 9 holds highest value for bytes received variable. ● Compare it with Node 3 for Bytes sent/received and Packets sent/received ● This is an interesting find as we know Node 9 was involved with malicious links. ● Most important discovery is in the connection attribute which is weighted high for Node 1. ● Subsequently Remote IP and Bytes Sent also receive a massive hike. Suggestive of an attack. MLP ANALYSIS
  • 23. Sampled learning Correctly classified instances with sampled training data
  • 24. Conclusion - Endpoint is not clear from tweets - MLP model performed best on unseen data 72% - Bayesian approach performed best in early stages of interaction 66% - Twitter recently introduced new policies to protect from harm.

Editor's Notes

  1. A snapshot of the memory, executables and registry of the honeypot computer is recorded before crawling a site. After visiting the site, the state of memory, executables, and registry is recorded and compared to the previous snapshot. The changes are analyzed to determine if the visited site installed any malware onto the client honeypot computer.