SlideShare a Scribd company logo
1 of 24
Beyond Blacklists: Malicious Url Detection Using
Machine Learning
Who am I ?
• Info security Investigator @ Cisco.
• Completed Mtech from IIT Jodhpur in 2014.
• Areas of interest include machine learning,
computer vision and A.I.
• Email : satyamiitj89@gmail.com
Malicious websites
Phishing : which one is real ??
Visiting Malicious Websites
What we want ?
Problem in a Nutshell
6
 URL features to identify malicious Web sites
 No context, no content
 Different classes of URLs
 Benign, spam, phishing, exploits, scams...
 For now, distinguish benign vs. malicious
facebook.com fblight.com
Information about new websites
State of the Practice
8
 Current approaches
 Blacklists [SORBS, URIBL, SURBL, Spamhaus]
 Learning on hand-tuned features [Garera et al, 2007]
 Limitations
 Cannot predict unlisted sites
 Cannot account for new features
 Arms race: Fast feedback cycle is critical
More automated approach?
URL Classification System
9
Label Example Hypothesis
Data Sets
10
 Malicious URLs
 5,000 from PhishTank (phishing)
 15,000 from Spamscatter (spam, phishing, etc)
 Benign URLs
 15,000 from Yahoo Web directory
 15,000 from DMOZ directory
 Malicious x Benign → 4 Data Sets
 30,000 – 55,000 features per data set
Algorithms
11
 Logistic regression w/ L1-norm regularization
 Other models
 Naive Bayes
 Support vector machines (linear, RBF kernels)
 Implicit feature selection
 Easier to interpret
Feature vector construction
Features to consider?
14
1) Blacklists
2) Simple heuristics
3) Domain name registration
4) Host properties
5) Lexical
(1) Blacklist Queries
15
 List of known malicious sites
 Providers: SORBS, URIBL, SURBL,
Spamhaus
http://www.bfuduuioo1fp.mobi
In blacklist?
Yes
http://fblight.com
No
In blacklist?
http://www.bfuduuioo1fp.mobi
Blacklist queries as features
........................................
........................................
(2) Manually-Selected Features
16
 Considered by previous studies
 IP address in hostname?
 Number of dots in URL
 WHOIS (domain name) registration date
stopgap.cn registered 28
June 2009
http://72.23.5.122/www.bankofamerica.com/
http://www.bankofamerica.com.qytrpbcw.stopgap.cn/
(3) WHOIS Features
17
 Domain name registration
 Date of registration, update, expiration
 Registrant: Who registered domain?
 Registrar: Who manages registration?
http://sleazysalmon.com
http://angryalbacore.com
http://mangymackerel.com
http://yammeringyellowtail.com
Registered on
29 June 2009
By SpamMedia
(4) Host-Based Features
18
 Blacklisted? (SORBS, URIBL, SURBL, Spamhaus)
 WHOIS: registrar, registrant, dates
 IP address: Which ASes/IP prefixes?
 DNS: TTL? PTR record exists/resolves?
 Geography-related: Locale? Connection speed?
75.102.60.0/2269.63.176.0/20
facebook.com fblight.com
(5) Lexical Features
19
 Tokens in URL hostname + path
 Length of URL
 Entropy of the domain name
http://www.bfuduuioo1fp.mobi/ws/ebayisapi.dll
Which feature sets?
20
Blacklist
Manual
WHOIS
Host-based
Lexical
Full
w/o WHOIS/Blacklist
4,000
# Features
13,000
4
3
17,000
30,000
26,000
Beyond Blacklists
21
Blacklist
Full features
Yahoo-PhishTank
Higher detection rate for
given false positive rate
Limitations
22
 False positives
 Sites hosted in disreputable ISP
 Guilt by association
 False negatives
 Compromised sites
 Free hosting sites
 Hosted in reputable ISP
 Future work: Web page content
Conclusion
23
 Detect malicious URLs with high accuracy
 Only using URL
 Diverse feature set helps: 86.5% w/ 18,000+
features
 Proof concept working in lab
 Future work
 Scaling up for deployment
References
 Ma, Justin, et al. "Beyond blacklists: learning
to detect malicious web sites from suspicious
URLs." Proceedings of the 15th ACM SIGKDD
international conference on Knowledge
discovery and data mining. ACM, 2009.
Q & A

More Related Content

What's hot

Classification with R
Classification with RClassification with R
Classification with RNajima Begum
 
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...IRJET Journal
 
A Deep Learning Technique for Web Phishing Detection Combined URL Features an...
A Deep Learning Technique for Web Phishing Detection Combined URL Features an...A Deep Learning Technique for Web Phishing Detection Combined URL Features an...
A Deep Learning Technique for Web Phishing Detection Combined URL Features an...IJCNCJournal
 
Presentation-Detecting Spammers on Social Networks
Presentation-Detecting Spammers on Social NetworksPresentation-Detecting Spammers on Social Networks
Presentation-Detecting Spammers on Social NetworksAshish Arora
 
Paper id 71201915
Paper id 71201915Paper id 71201915
Paper id 71201915IJRAT
 
A Comparative Analysis of Different Feature Set on the Performance of Differe...
A Comparative Analysis of Different Feature Set on the Performance of Differe...A Comparative Analysis of Different Feature Set on the Performance of Differe...
A Comparative Analysis of Different Feature Set on the Performance of Differe...gerogepatton
 
Report - Final_New_phishila
Report - Final_New_phishilaReport - Final_New_phishila
Report - Final_New_phishilaAshwin Palani
 
Rtp rsp16-distil networks-final-deck
Rtp rsp16-distil networks-final-deckRtp rsp16-distil networks-final-deck
Rtp rsp16-distil networks-final-deckG3 Communications
 
Are There Any Domains Impersonating Your Company For Phishing?
Are There Any Domains Impersonating Your Company For Phishing?Are There Any Domains Impersonating Your Company For Phishing?
Are There Any Domains Impersonating Your Company For Phishing?NormShield
 

What's hot (12)

Classification with R
Classification with RClassification with R
Classification with R
 
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
 
A Deep Learning Technique for Web Phishing Detection Combined URL Features an...
A Deep Learning Technique for Web Phishing Detection Combined URL Features an...A Deep Learning Technique for Web Phishing Detection Combined URL Features an...
A Deep Learning Technique for Web Phishing Detection Combined URL Features an...
 
Presentation-Detecting Spammers on Social Networks
Presentation-Detecting Spammers on Social NetworksPresentation-Detecting Spammers on Social Networks
Presentation-Detecting Spammers on Social Networks
 
Paper id 71201915
Paper id 71201915Paper id 71201915
Paper id 71201915
 
A Comparative Analysis of Different Feature Set on the Performance of Differe...
A Comparative Analysis of Different Feature Set on the Performance of Differe...A Comparative Analysis of Different Feature Set on the Performance of Differe...
A Comparative Analysis of Different Feature Set on the Performance of Differe...
 
Do it-yourself-audits
Do it-yourself-auditsDo it-yourself-audits
Do it-yourself-audits
 
Iy2515891593
Iy2515891593Iy2515891593
Iy2515891593
 
Report - Final_New_phishila
Report - Final_New_phishilaReport - Final_New_phishila
Report - Final_New_phishila
 
Owasp eee 2015 csrf
Owasp eee 2015 csrfOwasp eee 2015 csrf
Owasp eee 2015 csrf
 
Rtp rsp16-distil networks-final-deck
Rtp rsp16-distil networks-final-deckRtp rsp16-distil networks-final-deck
Rtp rsp16-distil networks-final-deck
 
Are There Any Domains Impersonating Your Company For Phishing?
Are There Any Domains Impersonating Your Company For Phishing?Are There Any Domains Impersonating Your Company For Phishing?
Are There Any Domains Impersonating Your Company For Phishing?
 

Similar to Malicious url detection using machine learning

Understanding Web Bots and How They Hurt Your Business
Understanding Web Bots and How They Hurt Your BusinessUnderstanding Web Bots and How They Hurt Your Business
Understanding Web Bots and How They Hurt Your BusinessImperva Incapsula
 
Hitbkl 2012
Hitbkl 2012Hitbkl 2012
Hitbkl 2012F _
 
Case Study on Property Portal Data Security
Case Study on Property Portal Data SecurityCase Study on Property Portal Data Security
Case Study on Property Portal Data SecurityProperty Portal Watch
 
Ensuring Property Portal Listing Data Security
Ensuring Property Portal Listing Data SecurityEnsuring Property Portal Listing Data Security
Ensuring Property Portal Listing Data SecurityDistil Networks
 
How to clean up travel website traffic from bots and spammers?
How to clean up travel website traffic from bots and spammers?How to clean up travel website traffic from bots and spammers?
How to clean up travel website traffic from bots and spammers?tnooz
 
Cleaning up website traffic from bots & spammers
Cleaning up website traffic from bots & spammersCleaning up website traffic from bots & spammers
Cleaning up website traffic from bots & spammersDistil Networks
 
Are Bot Operators Eating Your Lunch?
Are Bot Operators Eating Your Lunch?Are Bot Operators Eating Your Lunch?
Are Bot Operators Eating Your Lunch?Distil Networks
 
PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...
PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...
PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...IJCNCJournal
 
PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...
PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...
PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...IJCNCJournal
 
Malware detection-using-machine-learning
Malware detection-using-machine-learningMalware detection-using-machine-learning
Malware detection-using-machine-learningSecurity Bootcamp
 
IRJET- Phishing Website Detection System
IRJET- Phishing Website Detection SystemIRJET- Phishing Website Detection System
IRJET- Phishing Website Detection SystemIRJET Journal
 
DETECTION OF PHISHING WEBSITES USING MACHINE LEARNING
DETECTION OF PHISHING WEBSITES USING MACHINE LEARNINGDETECTION OF PHISHING WEBSITES USING MACHINE LEARNING
DETECTION OF PHISHING WEBSITES USING MACHINE LEARNINGIRJET Journal
 
State of the Art Analysis Approach for Identification of the Malignant URLs
State of the Art Analysis Approach for Identification of the Malignant URLsState of the Art Analysis Approach for Identification of the Malignant URLs
State of the Art Analysis Approach for Identification of the Malignant URLsIOSRjournaljce
 
Scraping in 60 minutes (CIJ Summer School 2019)
Scraping in 60 minutes (CIJ Summer School 2019)Scraping in 60 minutes (CIJ Summer School 2019)
Scraping in 60 minutes (CIJ Summer School 2019)Paul Bradshaw
 
Paper Presentation - "Your Botnet is my Botnet : Analysis of a Botnet Takeover"
Paper Presentation - "Your Botnet is my Botnet : Analysis of a Botnet Takeover"Paper Presentation - "Your Botnet is my Botnet : Analysis of a Botnet Takeover"
Paper Presentation - "Your Botnet is my Botnet : Analysis of a Botnet Takeover"Jishnu Pradeep
 
Borges rprojectcs691y
Borges rprojectcs691yBorges rprojectcs691y
Borges rprojectcs691yrayborg
 
2020-03-05 Custard - SEO vs PWAs
2020-03-05 Custard - SEO vs PWAs2020-03-05 Custard - SEO vs PWAs
2020-03-05 Custard - SEO vs PWAsChris Smith
 
Phishing Website Detection using Classification Algorithms
Phishing Website Detection using Classification AlgorithmsPhishing Website Detection using Classification Algorithms
Phishing Website Detection using Classification AlgorithmsIRJET Journal
 

Similar to Malicious url detection using machine learning (20)

Understanding Web Bots and How They Hurt Your Business
Understanding Web Bots and How They Hurt Your BusinessUnderstanding Web Bots and How They Hurt Your Business
Understanding Web Bots and How They Hurt Your Business
 
Hitbkl 2012
Hitbkl 2012Hitbkl 2012
Hitbkl 2012
 
Case Study on Property Portal Data Security
Case Study on Property Portal Data SecurityCase Study on Property Portal Data Security
Case Study on Property Portal Data Security
 
Ensuring Property Portal Listing Data Security
Ensuring Property Portal Listing Data SecurityEnsuring Property Portal Listing Data Security
Ensuring Property Portal Listing Data Security
 
How to clean up travel website traffic from bots and spammers?
How to clean up travel website traffic from bots and spammers?How to clean up travel website traffic from bots and spammers?
How to clean up travel website traffic from bots and spammers?
 
Cleaning up website traffic from bots & spammers
Cleaning up website traffic from bots & spammersCleaning up website traffic from bots & spammers
Cleaning up website traffic from bots & spammers
 
Are Bot Operators Eating Your Lunch?
Are Bot Operators Eating Your Lunch?Are Bot Operators Eating Your Lunch?
Are Bot Operators Eating Your Lunch?
 
PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...
PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...
PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...
 
PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...
PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...
PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...
 
Malware detection-using-machine-learning
Malware detection-using-machine-learningMalware detection-using-machine-learning
Malware detection-using-machine-learning
 
IRJET- Phishing Website Detection System
IRJET- Phishing Website Detection SystemIRJET- Phishing Website Detection System
IRJET- Phishing Website Detection System
 
DETECTION OF PHISHING WEBSITES USING MACHINE LEARNING
DETECTION OF PHISHING WEBSITES USING MACHINE LEARNINGDETECTION OF PHISHING WEBSITES USING MACHINE LEARNING
DETECTION OF PHISHING WEBSITES USING MACHINE LEARNING
 
State of the Art Analysis Approach for Identification of the Malignant URLs
State of the Art Analysis Approach for Identification of the Malignant URLsState of the Art Analysis Approach for Identification of the Malignant URLs
State of the Art Analysis Approach for Identification of the Malignant URLs
 
17 00 distil rami
17 00 distil rami17 00 distil rami
17 00 distil rami
 
Scraping in 60 minutes (CIJ Summer School 2019)
Scraping in 60 minutes (CIJ Summer School 2019)Scraping in 60 minutes (CIJ Summer School 2019)
Scraping in 60 minutes (CIJ Summer School 2019)
 
Wsdm yu
Wsdm yuWsdm yu
Wsdm yu
 
Paper Presentation - "Your Botnet is my Botnet : Analysis of a Botnet Takeover"
Paper Presentation - "Your Botnet is my Botnet : Analysis of a Botnet Takeover"Paper Presentation - "Your Botnet is my Botnet : Analysis of a Botnet Takeover"
Paper Presentation - "Your Botnet is my Botnet : Analysis of a Botnet Takeover"
 
Borges rprojectcs691y
Borges rprojectcs691yBorges rprojectcs691y
Borges rprojectcs691y
 
2020-03-05 Custard - SEO vs PWAs
2020-03-05 Custard - SEO vs PWAs2020-03-05 Custard - SEO vs PWAs
2020-03-05 Custard - SEO vs PWAs
 
Phishing Website Detection using Classification Algorithms
Phishing Website Detection using Classification AlgorithmsPhishing Website Detection using Classification Algorithms
Phishing Website Detection using Classification Algorithms
 

More from Cysinfo Cyber Security Community

Understanding Malware Persistence Techniques by Monnappa K A
Understanding Malware Persistence Techniques by Monnappa K AUnderstanding Malware Persistence Techniques by Monnappa K A
Understanding Malware Persistence Techniques by Monnappa K ACysinfo Cyber Security Community
 
Understanding & analyzing obfuscated malicious web scripts by Vikram Kharvi
Understanding & analyzing obfuscated malicious web scripts by Vikram KharviUnderstanding & analyzing obfuscated malicious web scripts by Vikram Kharvi
Understanding & analyzing obfuscated malicious web scripts by Vikram KharviCysinfo Cyber Security Community
 
Getting started with cybersecurity through CTFs by Shruti Dixit & Geethna TK
Getting started with cybersecurity through CTFs by Shruti Dixit & Geethna TKGetting started with cybersecurity through CTFs by Shruti Dixit & Geethna TK
Getting started with cybersecurity through CTFs by Shruti Dixit & Geethna TKCysinfo Cyber Security Community
 
A look into the sanitizer family (ASAN & UBSAN) by Akul Pillai
A look into the sanitizer family (ASAN & UBSAN) by Akul PillaiA look into the sanitizer family (ASAN & UBSAN) by Akul Pillai
A look into the sanitizer family (ASAN & UBSAN) by Akul PillaiCysinfo Cyber Security Community
 
Reversing and Decrypting Malware Communications by Monnappa
Reversing and Decrypting Malware Communications by MonnappaReversing and Decrypting Malware Communications by Monnappa
Reversing and Decrypting Malware Communications by MonnappaCysinfo Cyber Security Community
 
Understanding evasive hollow process injection techniques monnappa k a
Understanding evasive hollow process injection techniques   	monnappa k aUnderstanding evasive hollow process injection techniques   	monnappa k a
Understanding evasive hollow process injection techniques monnappa k aCysinfo Cyber Security Community
 
Security challenges in d2d communication by ajithkumar vyasarao
Security challenges in d2d communication  by ajithkumar vyasaraoSecurity challenges in d2d communication  by ajithkumar vyasarao
Security challenges in d2d communication by ajithkumar vyasaraoCysinfo Cyber Security Community
 

More from Cysinfo Cyber Security Community (20)

Understanding Malware Persistence Techniques by Monnappa K A
Understanding Malware Persistence Techniques by Monnappa K AUnderstanding Malware Persistence Techniques by Monnappa K A
Understanding Malware Persistence Techniques by Monnappa K A
 
Understanding & analyzing obfuscated malicious web scripts by Vikram Kharvi
Understanding & analyzing obfuscated malicious web scripts by Vikram KharviUnderstanding & analyzing obfuscated malicious web scripts by Vikram Kharvi
Understanding & analyzing obfuscated malicious web scripts by Vikram Kharvi
 
Getting started with cybersecurity through CTFs by Shruti Dixit & Geethna TK
Getting started with cybersecurity through CTFs by Shruti Dixit & Geethna TKGetting started with cybersecurity through CTFs by Shruti Dixit & Geethna TK
Getting started with cybersecurity through CTFs by Shruti Dixit & Geethna TK
 
Emerging Trends in Cybersecurity by Amar Prusty
Emerging Trends in Cybersecurity by Amar PrustyEmerging Trends in Cybersecurity by Amar Prusty
Emerging Trends in Cybersecurity by Amar Prusty
 
A look into the sanitizer family (ASAN & UBSAN) by Akul Pillai
A look into the sanitizer family (ASAN & UBSAN) by Akul PillaiA look into the sanitizer family (ASAN & UBSAN) by Akul Pillai
A look into the sanitizer family (ASAN & UBSAN) by Akul Pillai
 
Closer look at PHP Unserialization by Ashwin Shenoi
Closer look at PHP Unserialization by Ashwin ShenoiCloser look at PHP Unserialization by Ashwin Shenoi
Closer look at PHP Unserialization by Ashwin Shenoi
 
Unicorn: The Ultimate CPU Emulator by Akshay Ajayan
Unicorn: The Ultimate CPU Emulator by Akshay AjayanUnicorn: The Ultimate CPU Emulator by Akshay Ajayan
Unicorn: The Ultimate CPU Emulator by Akshay Ajayan
 
The Art of Executing JavaScript by Akhil Mahendra
The Art of Executing JavaScript by Akhil MahendraThe Art of Executing JavaScript by Akhil Mahendra
The Art of Executing JavaScript by Akhil Mahendra
 
Reversing and Decrypting Malware Communications by Monnappa
Reversing and Decrypting Malware Communications by MonnappaReversing and Decrypting Malware Communications by Monnappa
Reversing and Decrypting Malware Communications by Monnappa
 
DeViL - Detect Virtual Machine in Linux by Sreelakshmi
DeViL - Detect Virtual Machine in Linux by SreelakshmiDeViL - Detect Virtual Machine in Linux by Sreelakshmi
DeViL - Detect Virtual Machine in Linux by Sreelakshmi
 
Analysis of android apk using adhrit by Abhishek J.M
 Analysis of android apk using adhrit by Abhishek J.M Analysis of android apk using adhrit by Abhishek J.M
Analysis of android apk using adhrit by Abhishek J.M
 
Understanding evasive hollow process injection techniques monnappa k a
Understanding evasive hollow process injection techniques   	monnappa k aUnderstanding evasive hollow process injection techniques   	monnappa k a
Understanding evasive hollow process injection techniques monnappa k a
 
Security challenges in d2d communication by ajithkumar vyasarao
Security challenges in d2d communication  by ajithkumar vyasaraoSecurity challenges in d2d communication  by ajithkumar vyasarao
Security challenges in d2d communication by ajithkumar vyasarao
 
S2 e (selective symbolic execution) -shivkrishna a
S2 e (selective symbolic execution) -shivkrishna aS2 e (selective symbolic execution) -shivkrishna a
S2 e (selective symbolic execution) -shivkrishna a
 
Dynamic binary analysis using angr siddharth muralee
Dynamic binary analysis using angr   siddharth muraleeDynamic binary analysis using angr   siddharth muralee
Dynamic binary analysis using angr siddharth muralee
 
Bit flipping attack on aes cbc - ashutosh ahelleya
Bit flipping attack on aes cbc -	ashutosh ahelleyaBit flipping attack on aes cbc -	ashutosh ahelleya
Bit flipping attack on aes cbc - ashutosh ahelleya
 
Security Analytics using ELK stack
Security Analytics using ELK stack	Security Analytics using ELK stack
Security Analytics using ELK stack
 
Linux Malware Analysis
Linux Malware Analysis	Linux Malware Analysis
Linux Malware Analysis
 
Introduction to Binary Exploitation
Introduction to Binary Exploitation	Introduction to Binary Exploitation
Introduction to Binary Exploitation
 
ATM Malware: Understanding the threat
ATM Malware: Understanding the threat	ATM Malware: Understanding the threat
ATM Malware: Understanding the threat
 

Recently uploaded

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 

Recently uploaded (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 

Malicious url detection using machine learning

  • 1. Beyond Blacklists: Malicious Url Detection Using Machine Learning
  • 2. Who am I ? • Info security Investigator @ Cisco. • Completed Mtech from IIT Jodhpur in 2014. • Areas of interest include machine learning, computer vision and A.I. • Email : satyamiitj89@gmail.com
  • 3. Malicious websites Phishing : which one is real ??
  • 6. Problem in a Nutshell 6  URL features to identify malicious Web sites  No context, no content  Different classes of URLs  Benign, spam, phishing, exploits, scams...  For now, distinguish benign vs. malicious facebook.com fblight.com
  • 8. State of the Practice 8  Current approaches  Blacklists [SORBS, URIBL, SURBL, Spamhaus]  Learning on hand-tuned features [Garera et al, 2007]  Limitations  Cannot predict unlisted sites  Cannot account for new features  Arms race: Fast feedback cycle is critical More automated approach?
  • 10. Data Sets 10  Malicious URLs  5,000 from PhishTank (phishing)  15,000 from Spamscatter (spam, phishing, etc)  Benign URLs  15,000 from Yahoo Web directory  15,000 from DMOZ directory  Malicious x Benign → 4 Data Sets  30,000 – 55,000 features per data set
  • 11. Algorithms 11  Logistic regression w/ L1-norm regularization  Other models  Naive Bayes  Support vector machines (linear, RBF kernels)  Implicit feature selection  Easier to interpret
  • 13. Features to consider? 14 1) Blacklists 2) Simple heuristics 3) Domain name registration 4) Host properties 5) Lexical
  • 14. (1) Blacklist Queries 15  List of known malicious sites  Providers: SORBS, URIBL, SURBL, Spamhaus http://www.bfuduuioo1fp.mobi In blacklist? Yes http://fblight.com No In blacklist? http://www.bfuduuioo1fp.mobi Blacklist queries as features ........................................ ........................................
  • 15. (2) Manually-Selected Features 16  Considered by previous studies  IP address in hostname?  Number of dots in URL  WHOIS (domain name) registration date stopgap.cn registered 28 June 2009 http://72.23.5.122/www.bankofamerica.com/ http://www.bankofamerica.com.qytrpbcw.stopgap.cn/
  • 16. (3) WHOIS Features 17  Domain name registration  Date of registration, update, expiration  Registrant: Who registered domain?  Registrar: Who manages registration? http://sleazysalmon.com http://angryalbacore.com http://mangymackerel.com http://yammeringyellowtail.com Registered on 29 June 2009 By SpamMedia
  • 17. (4) Host-Based Features 18  Blacklisted? (SORBS, URIBL, SURBL, Spamhaus)  WHOIS: registrar, registrant, dates  IP address: Which ASes/IP prefixes?  DNS: TTL? PTR record exists/resolves?  Geography-related: Locale? Connection speed? 75.102.60.0/2269.63.176.0/20 facebook.com fblight.com
  • 18. (5) Lexical Features 19  Tokens in URL hostname + path  Length of URL  Entropy of the domain name http://www.bfuduuioo1fp.mobi/ws/ebayisapi.dll
  • 19. Which feature sets? 20 Blacklist Manual WHOIS Host-based Lexical Full w/o WHOIS/Blacklist 4,000 # Features 13,000 4 3 17,000 30,000 26,000
  • 20. Beyond Blacklists 21 Blacklist Full features Yahoo-PhishTank Higher detection rate for given false positive rate
  • 21. Limitations 22  False positives  Sites hosted in disreputable ISP  Guilt by association  False negatives  Compromised sites  Free hosting sites  Hosted in reputable ISP  Future work: Web page content
  • 22. Conclusion 23  Detect malicious URLs with high accuracy  Only using URL  Diverse feature set helps: 86.5% w/ 18,000+ features  Proof concept working in lab  Future work  Scaling up for deployment
  • 23. References  Ma, Justin, et al. "Beyond blacklists: learning to detect malicious web sites from suspicious URLs." Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2009.
  • 24. Q & A