SlideShare a Scribd company logo
Beyond Blacklists: Malicious Url Detection Using
Machine Learning
Who am I ?
• Info security Investigator @ Cisco.
• Completed Mtech from IIT Jodhpur in 2014.
• Areas of interest include machine learning,
computer vision and A.I.
• Email : satyamiitj89@gmail.com
Malicious websites
Phishing : which one is real ??
Visiting Malicious Websites
What we want ?
Problem in a Nutshell
6
 URL features to identify malicious Web sites
 No context, no content
 Different classes of URLs
 Benign, spam, phishing, exploits, scams...
 For now, distinguish benign vs. malicious
facebook.com fblight.com
Information about new websites
State of the Practice
8
 Current approaches
 Blacklists [SORBS, URIBL, SURBL, Spamhaus]
 Learning on hand-tuned features [Garera et al, 2007]
 Limitations
 Cannot predict unlisted sites
 Cannot account for new features
 Arms race: Fast feedback cycle is critical
More automated approach?
URL Classification System
9
Label Example Hypothesis
Data Sets
10
 Malicious URLs
 5,000 from PhishTank (phishing)
 15,000 from Spamscatter (spam, phishing, etc)
 Benign URLs
 15,000 from Yahoo Web directory
 15,000 from DMOZ directory
 Malicious x Benign → 4 Data Sets
 30,000 – 55,000 features per data set
Algorithms
11
 Logistic regression w/ L1-norm regularization
 Other models
 Naive Bayes
 Support vector machines (linear, RBF kernels)
 Implicit feature selection
 Easier to interpret
Feature vector construction
Features to consider?
14
1) Blacklists
2) Simple heuristics
3) Domain name registration
4) Host properties
5) Lexical
(1) Blacklist Queries
15
 List of known malicious sites
 Providers: SORBS, URIBL, SURBL,
Spamhaus
http://www.bfuduuioo1fp.mobi
In blacklist?
Yes
http://fblight.com
No
In blacklist?
http://www.bfuduuioo1fp.mobi
Blacklist queries as features
........................................
........................................
(2) Manually-Selected Features
16
 Considered by previous studies
 IP address in hostname?
 Number of dots in URL
 WHOIS (domain name) registration date
stopgap.cn registered 28
June 2009
http://72.23.5.122/www.bankofamerica.com/
http://www.bankofamerica.com.qytrpbcw.stopgap.cn/
(3) WHOIS Features
17
 Domain name registration
 Date of registration, update, expiration
 Registrant: Who registered domain?
 Registrar: Who manages registration?
http://sleazysalmon.com
http://angryalbacore.com
http://mangymackerel.com
http://yammeringyellowtail.com
Registered on
29 June 2009
By SpamMedia
(4) Host-Based Features
18
 Blacklisted? (SORBS, URIBL, SURBL, Spamhaus)
 WHOIS: registrar, registrant, dates
 IP address: Which ASes/IP prefixes?
 DNS: TTL? PTR record exists/resolves?
 Geography-related: Locale? Connection speed?
75.102.60.0/2269.63.176.0/20
facebook.com fblight.com
(5) Lexical Features
19
 Tokens in URL hostname + path
 Length of URL
 Entropy of the domain name
http://www.bfuduuioo1fp.mobi/ws/ebayisapi.dll
Which feature sets?
20
Blacklist
Manual
WHOIS
Host-based
Lexical
Full
w/o WHOIS/Blacklist
4,000
# Features
13,000
4
3
17,000
30,000
26,000
Beyond Blacklists
21
Blacklist
Full features
Yahoo-PhishTank
Higher detection rate for
given false positive rate
Limitations
22
 False positives
 Sites hosted in disreputable ISP
 Guilt by association
 False negatives
 Compromised sites
 Free hosting sites
 Hosted in reputable ISP
 Future work: Web page content
Conclusion
23
 Detect malicious URLs with high accuracy
 Only using URL
 Diverse feature set helps: 86.5% w/ 18,000+
features
 Proof concept working in lab
 Future work
 Scaling up for deployment
References
 Ma, Justin, et al. "Beyond blacklists: learning
to detect malicious web sites from suspicious
URLs." Proceedings of the 15th ACM SIGKDD
international conference on Knowledge
discovery and data mining. ACM, 2009.
Q & A

More Related Content

What's hot

Classification with R
Classification with RClassification with R
Classification with R
Najima Begum
 
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
IRJET Journal
 
A Deep Learning Technique for Web Phishing Detection Combined URL Features an...
A Deep Learning Technique for Web Phishing Detection Combined URL Features an...A Deep Learning Technique for Web Phishing Detection Combined URL Features an...
A Deep Learning Technique for Web Phishing Detection Combined URL Features an...
IJCNCJournal
 
Presentation-Detecting Spammers on Social Networks
Presentation-Detecting Spammers on Social NetworksPresentation-Detecting Spammers on Social Networks
Presentation-Detecting Spammers on Social NetworksAshish Arora
 
Paper id 71201915
Paper id 71201915Paper id 71201915
Paper id 71201915
IJRAT
 
A Comparative Analysis of Different Feature Set on the Performance of Differe...
A Comparative Analysis of Different Feature Set on the Performance of Differe...A Comparative Analysis of Different Feature Set on the Performance of Differe...
A Comparative Analysis of Different Feature Set on the Performance of Differe...
gerogepatton
 
Report - Final_New_phishila
Report - Final_New_phishilaReport - Final_New_phishila
Report - Final_New_phishilaAshwin Palani
 
Owasp eee 2015 csrf
Owasp eee 2015 csrfOwasp eee 2015 csrf
Owasp eee 2015 csrf
Aurelijus Stanislovaitis
 
Rtp rsp16-distil networks-final-deck
Rtp rsp16-distil networks-final-deckRtp rsp16-distil networks-final-deck
Rtp rsp16-distil networks-final-deck
G3 Communications
 
Are There Any Domains Impersonating Your Company For Phishing?
Are There Any Domains Impersonating Your Company For Phishing?Are There Any Domains Impersonating Your Company For Phishing?
Are There Any Domains Impersonating Your Company For Phishing?
NormShield
 

What's hot (12)

Classification with R
Classification with RClassification with R
Classification with R
 
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
 
A Deep Learning Technique for Web Phishing Detection Combined URL Features an...
A Deep Learning Technique for Web Phishing Detection Combined URL Features an...A Deep Learning Technique for Web Phishing Detection Combined URL Features an...
A Deep Learning Technique for Web Phishing Detection Combined URL Features an...
 
Presentation-Detecting Spammers on Social Networks
Presentation-Detecting Spammers on Social NetworksPresentation-Detecting Spammers on Social Networks
Presentation-Detecting Spammers on Social Networks
 
Paper id 71201915
Paper id 71201915Paper id 71201915
Paper id 71201915
 
A Comparative Analysis of Different Feature Set on the Performance of Differe...
A Comparative Analysis of Different Feature Set on the Performance of Differe...A Comparative Analysis of Different Feature Set on the Performance of Differe...
A Comparative Analysis of Different Feature Set on the Performance of Differe...
 
Do it-yourself-audits
Do it-yourself-auditsDo it-yourself-audits
Do it-yourself-audits
 
Iy2515891593
Iy2515891593Iy2515891593
Iy2515891593
 
Report - Final_New_phishila
Report - Final_New_phishilaReport - Final_New_phishila
Report - Final_New_phishila
 
Owasp eee 2015 csrf
Owasp eee 2015 csrfOwasp eee 2015 csrf
Owasp eee 2015 csrf
 
Rtp rsp16-distil networks-final-deck
Rtp rsp16-distil networks-final-deckRtp rsp16-distil networks-final-deck
Rtp rsp16-distil networks-final-deck
 
Are There Any Domains Impersonating Your Company For Phishing?
Are There Any Domains Impersonating Your Company For Phishing?Are There Any Domains Impersonating Your Company For Phishing?
Are There Any Domains Impersonating Your Company For Phishing?
 

Similar to Malicious url detection using machine learning

Understanding Web Bots and How They Hurt Your Business
Understanding Web Bots and How They Hurt Your BusinessUnderstanding Web Bots and How They Hurt Your Business
Understanding Web Bots and How They Hurt Your BusinessImperva Incapsula
 
Hitbkl 2012
Hitbkl 2012Hitbkl 2012
Hitbkl 2012F _
 
Case Study on Property Portal Data Security
Case Study on Property Portal Data SecurityCase Study on Property Portal Data Security
Case Study on Property Portal Data Security
Property Portal Watch
 
Ensuring Property Portal Listing Data Security
Ensuring Property Portal Listing Data SecurityEnsuring Property Portal Listing Data Security
Ensuring Property Portal Listing Data Security
Distil Networks
 
How to clean up travel website traffic from bots and spammers?
How to clean up travel website traffic from bots and spammers?How to clean up travel website traffic from bots and spammers?
How to clean up travel website traffic from bots and spammers?
tnooz
 
Cleaning up website traffic from bots & spammers
Cleaning up website traffic from bots & spammersCleaning up website traffic from bots & spammers
Cleaning up website traffic from bots & spammers
Distil Networks
 
Are Bot Operators Eating Your Lunch?
Are Bot Operators Eating Your Lunch?Are Bot Operators Eating Your Lunch?
Are Bot Operators Eating Your Lunch?
Distil Networks
 
PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...
PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...
PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...
IJCNCJournal
 
PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...
PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...
PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...
IJCNCJournal
 
Malware detection-using-machine-learning
Malware detection-using-machine-learningMalware detection-using-machine-learning
Malware detection-using-machine-learning
Security Bootcamp
 
IRJET- Phishing Website Detection System
IRJET- Phishing Website Detection SystemIRJET- Phishing Website Detection System
IRJET- Phishing Website Detection System
IRJET Journal
 
DETECTION OF PHISHING WEBSITES USING MACHINE LEARNING
DETECTION OF PHISHING WEBSITES USING MACHINE LEARNINGDETECTION OF PHISHING WEBSITES USING MACHINE LEARNING
DETECTION OF PHISHING WEBSITES USING MACHINE LEARNING
IRJET Journal
 
State of the Art Analysis Approach for Identification of the Malignant URLs
State of the Art Analysis Approach for Identification of the Malignant URLsState of the Art Analysis Approach for Identification of the Malignant URLs
State of the Art Analysis Approach for Identification of the Malignant URLs
IOSRjournaljce
 
17 00 distil rami
17 00 distil rami17 00 distil rami
17 00 distil rami
Property Portal Watch
 
Scraping in 60 minutes (CIJ Summer School 2019)
Scraping in 60 minutes (CIJ Summer School 2019)Scraping in 60 minutes (CIJ Summer School 2019)
Scraping in 60 minutes (CIJ Summer School 2019)
Paul Bradshaw
 
Paper Presentation - "Your Botnet is my Botnet : Analysis of a Botnet Takeover"
Paper Presentation - "Your Botnet is my Botnet : Analysis of a Botnet Takeover"Paper Presentation - "Your Botnet is my Botnet : Analysis of a Botnet Takeover"
Paper Presentation - "Your Botnet is my Botnet : Analysis of a Botnet Takeover"
Jishnu Pradeep
 
Borges rprojectcs691y
Borges rprojectcs691yBorges rprojectcs691y
Borges rprojectcs691y
rayborg
 
2020-03-05 Custard - SEO vs PWAs
2020-03-05 Custard - SEO vs PWAs2020-03-05 Custard - SEO vs PWAs
2020-03-05 Custard - SEO vs PWAs
Chris Smith
 
Phishing Website Detection using Classification Algorithms
Phishing Website Detection using Classification AlgorithmsPhishing Website Detection using Classification Algorithms
Phishing Website Detection using Classification Algorithms
IRJET Journal
 

Similar to Malicious url detection using machine learning (20)

Understanding Web Bots and How They Hurt Your Business
Understanding Web Bots and How They Hurt Your BusinessUnderstanding Web Bots and How They Hurt Your Business
Understanding Web Bots and How They Hurt Your Business
 
Hitbkl 2012
Hitbkl 2012Hitbkl 2012
Hitbkl 2012
 
Case Study on Property Portal Data Security
Case Study on Property Portal Data SecurityCase Study on Property Portal Data Security
Case Study on Property Portal Data Security
 
Ensuring Property Portal Listing Data Security
Ensuring Property Portal Listing Data SecurityEnsuring Property Portal Listing Data Security
Ensuring Property Portal Listing Data Security
 
How to clean up travel website traffic from bots and spammers?
How to clean up travel website traffic from bots and spammers?How to clean up travel website traffic from bots and spammers?
How to clean up travel website traffic from bots and spammers?
 
Cleaning up website traffic from bots & spammers
Cleaning up website traffic from bots & spammersCleaning up website traffic from bots & spammers
Cleaning up website traffic from bots & spammers
 
Are Bot Operators Eating Your Lunch?
Are Bot Operators Eating Your Lunch?Are Bot Operators Eating Your Lunch?
Are Bot Operators Eating Your Lunch?
 
PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...
PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...
PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...
 
PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...
PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...
PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...
 
Malware detection-using-machine-learning
Malware detection-using-machine-learningMalware detection-using-machine-learning
Malware detection-using-machine-learning
 
IRJET- Phishing Website Detection System
IRJET- Phishing Website Detection SystemIRJET- Phishing Website Detection System
IRJET- Phishing Website Detection System
 
DETECTION OF PHISHING WEBSITES USING MACHINE LEARNING
DETECTION OF PHISHING WEBSITES USING MACHINE LEARNINGDETECTION OF PHISHING WEBSITES USING MACHINE LEARNING
DETECTION OF PHISHING WEBSITES USING MACHINE LEARNING
 
State of the Art Analysis Approach for Identification of the Malignant URLs
State of the Art Analysis Approach for Identification of the Malignant URLsState of the Art Analysis Approach for Identification of the Malignant URLs
State of the Art Analysis Approach for Identification of the Malignant URLs
 
17 00 distil rami
17 00 distil rami17 00 distil rami
17 00 distil rami
 
Scraping in 60 minutes (CIJ Summer School 2019)
Scraping in 60 minutes (CIJ Summer School 2019)Scraping in 60 minutes (CIJ Summer School 2019)
Scraping in 60 minutes (CIJ Summer School 2019)
 
Wsdm yu
Wsdm yuWsdm yu
Wsdm yu
 
Paper Presentation - "Your Botnet is my Botnet : Analysis of a Botnet Takeover"
Paper Presentation - "Your Botnet is my Botnet : Analysis of a Botnet Takeover"Paper Presentation - "Your Botnet is my Botnet : Analysis of a Botnet Takeover"
Paper Presentation - "Your Botnet is my Botnet : Analysis of a Botnet Takeover"
 
Borges rprojectcs691y
Borges rprojectcs691yBorges rprojectcs691y
Borges rprojectcs691y
 
2020-03-05 Custard - SEO vs PWAs
2020-03-05 Custard - SEO vs PWAs2020-03-05 Custard - SEO vs PWAs
2020-03-05 Custard - SEO vs PWAs
 
Phishing Website Detection using Classification Algorithms
Phishing Website Detection using Classification AlgorithmsPhishing Website Detection using Classification Algorithms
Phishing Website Detection using Classification Algorithms
 

More from Cysinfo Cyber Security Community

Understanding Malware Persistence Techniques by Monnappa K A
Understanding Malware Persistence Techniques by Monnappa K AUnderstanding Malware Persistence Techniques by Monnappa K A
Understanding Malware Persistence Techniques by Monnappa K A
Cysinfo Cyber Security Community
 
Understanding & analyzing obfuscated malicious web scripts by Vikram Kharvi
Understanding & analyzing obfuscated malicious web scripts by Vikram KharviUnderstanding & analyzing obfuscated malicious web scripts by Vikram Kharvi
Understanding & analyzing obfuscated malicious web scripts by Vikram Kharvi
Cysinfo Cyber Security Community
 
Getting started with cybersecurity through CTFs by Shruti Dixit & Geethna TK
Getting started with cybersecurity through CTFs by Shruti Dixit & Geethna TKGetting started with cybersecurity through CTFs by Shruti Dixit & Geethna TK
Getting started with cybersecurity through CTFs by Shruti Dixit & Geethna TK
Cysinfo Cyber Security Community
 
Emerging Trends in Cybersecurity by Amar Prusty
Emerging Trends in Cybersecurity by Amar PrustyEmerging Trends in Cybersecurity by Amar Prusty
Emerging Trends in Cybersecurity by Amar Prusty
Cysinfo Cyber Security Community
 
A look into the sanitizer family (ASAN & UBSAN) by Akul Pillai
A look into the sanitizer family (ASAN & UBSAN) by Akul PillaiA look into the sanitizer family (ASAN & UBSAN) by Akul Pillai
A look into the sanitizer family (ASAN & UBSAN) by Akul Pillai
Cysinfo Cyber Security Community
 
Closer look at PHP Unserialization by Ashwin Shenoi
Closer look at PHP Unserialization by Ashwin ShenoiCloser look at PHP Unserialization by Ashwin Shenoi
Closer look at PHP Unserialization by Ashwin Shenoi
Cysinfo Cyber Security Community
 
Unicorn: The Ultimate CPU Emulator by Akshay Ajayan
Unicorn: The Ultimate CPU Emulator by Akshay AjayanUnicorn: The Ultimate CPU Emulator by Akshay Ajayan
Unicorn: The Ultimate CPU Emulator by Akshay Ajayan
Cysinfo Cyber Security Community
 
The Art of Executing JavaScript by Akhil Mahendra
The Art of Executing JavaScript by Akhil MahendraThe Art of Executing JavaScript by Akhil Mahendra
The Art of Executing JavaScript by Akhil Mahendra
Cysinfo Cyber Security Community
 
Reversing and Decrypting Malware Communications by Monnappa
Reversing and Decrypting Malware Communications by MonnappaReversing and Decrypting Malware Communications by Monnappa
Reversing and Decrypting Malware Communications by Monnappa
Cysinfo Cyber Security Community
 
DeViL - Detect Virtual Machine in Linux by Sreelakshmi
DeViL - Detect Virtual Machine in Linux by SreelakshmiDeViL - Detect Virtual Machine in Linux by Sreelakshmi
DeViL - Detect Virtual Machine in Linux by Sreelakshmi
Cysinfo Cyber Security Community
 
Analysis of android apk using adhrit by Abhishek J.M
 Analysis of android apk using adhrit by Abhishek J.M Analysis of android apk using adhrit by Abhishek J.M
Analysis of android apk using adhrit by Abhishek J.M
Cysinfo Cyber Security Community
 
Understanding evasive hollow process injection techniques monnappa k a
Understanding evasive hollow process injection techniques   	monnappa k aUnderstanding evasive hollow process injection techniques   	monnappa k a
Understanding evasive hollow process injection techniques monnappa k a
Cysinfo Cyber Security Community
 
Security challenges in d2d communication by ajithkumar vyasarao
Security challenges in d2d communication  by ajithkumar vyasaraoSecurity challenges in d2d communication  by ajithkumar vyasarao
Security challenges in d2d communication by ajithkumar vyasarao
Cysinfo Cyber Security Community
 
S2 e (selective symbolic execution) -shivkrishna a
S2 e (selective symbolic execution) -shivkrishna aS2 e (selective symbolic execution) -shivkrishna a
S2 e (selective symbolic execution) -shivkrishna a
Cysinfo Cyber Security Community
 
Dynamic binary analysis using angr siddharth muralee
Dynamic binary analysis using angr   siddharth muraleeDynamic binary analysis using angr   siddharth muralee
Dynamic binary analysis using angr siddharth muralee
Cysinfo Cyber Security Community
 
Bit flipping attack on aes cbc - ashutosh ahelleya
Bit flipping attack on aes cbc -	ashutosh ahelleyaBit flipping attack on aes cbc -	ashutosh ahelleya
Bit flipping attack on aes cbc - ashutosh ahelleya
Cysinfo Cyber Security Community
 
Security Analytics using ELK stack
Security Analytics using ELK stack	Security Analytics using ELK stack
Security Analytics using ELK stack
Cysinfo Cyber Security Community
 
Linux Malware Analysis
Linux Malware Analysis	Linux Malware Analysis
Linux Malware Analysis
Cysinfo Cyber Security Community
 
Introduction to Binary Exploitation
Introduction to Binary Exploitation	Introduction to Binary Exploitation
Introduction to Binary Exploitation
Cysinfo Cyber Security Community
 
ATM Malware: Understanding the threat
ATM Malware: Understanding the threat	ATM Malware: Understanding the threat
ATM Malware: Understanding the threat
Cysinfo Cyber Security Community
 

More from Cysinfo Cyber Security Community (20)

Understanding Malware Persistence Techniques by Monnappa K A
Understanding Malware Persistence Techniques by Monnappa K AUnderstanding Malware Persistence Techniques by Monnappa K A
Understanding Malware Persistence Techniques by Monnappa K A
 
Understanding & analyzing obfuscated malicious web scripts by Vikram Kharvi
Understanding & analyzing obfuscated malicious web scripts by Vikram KharviUnderstanding & analyzing obfuscated malicious web scripts by Vikram Kharvi
Understanding & analyzing obfuscated malicious web scripts by Vikram Kharvi
 
Getting started with cybersecurity through CTFs by Shruti Dixit & Geethna TK
Getting started with cybersecurity through CTFs by Shruti Dixit & Geethna TKGetting started with cybersecurity through CTFs by Shruti Dixit & Geethna TK
Getting started with cybersecurity through CTFs by Shruti Dixit & Geethna TK
 
Emerging Trends in Cybersecurity by Amar Prusty
Emerging Trends in Cybersecurity by Amar PrustyEmerging Trends in Cybersecurity by Amar Prusty
Emerging Trends in Cybersecurity by Amar Prusty
 
A look into the sanitizer family (ASAN & UBSAN) by Akul Pillai
A look into the sanitizer family (ASAN & UBSAN) by Akul PillaiA look into the sanitizer family (ASAN & UBSAN) by Akul Pillai
A look into the sanitizer family (ASAN & UBSAN) by Akul Pillai
 
Closer look at PHP Unserialization by Ashwin Shenoi
Closer look at PHP Unserialization by Ashwin ShenoiCloser look at PHP Unserialization by Ashwin Shenoi
Closer look at PHP Unserialization by Ashwin Shenoi
 
Unicorn: The Ultimate CPU Emulator by Akshay Ajayan
Unicorn: The Ultimate CPU Emulator by Akshay AjayanUnicorn: The Ultimate CPU Emulator by Akshay Ajayan
Unicorn: The Ultimate CPU Emulator by Akshay Ajayan
 
The Art of Executing JavaScript by Akhil Mahendra
The Art of Executing JavaScript by Akhil MahendraThe Art of Executing JavaScript by Akhil Mahendra
The Art of Executing JavaScript by Akhil Mahendra
 
Reversing and Decrypting Malware Communications by Monnappa
Reversing and Decrypting Malware Communications by MonnappaReversing and Decrypting Malware Communications by Monnappa
Reversing and Decrypting Malware Communications by Monnappa
 
DeViL - Detect Virtual Machine in Linux by Sreelakshmi
DeViL - Detect Virtual Machine in Linux by SreelakshmiDeViL - Detect Virtual Machine in Linux by Sreelakshmi
DeViL - Detect Virtual Machine in Linux by Sreelakshmi
 
Analysis of android apk using adhrit by Abhishek J.M
 Analysis of android apk using adhrit by Abhishek J.M Analysis of android apk using adhrit by Abhishek J.M
Analysis of android apk using adhrit by Abhishek J.M
 
Understanding evasive hollow process injection techniques monnappa k a
Understanding evasive hollow process injection techniques   	monnappa k aUnderstanding evasive hollow process injection techniques   	monnappa k a
Understanding evasive hollow process injection techniques monnappa k a
 
Security challenges in d2d communication by ajithkumar vyasarao
Security challenges in d2d communication  by ajithkumar vyasaraoSecurity challenges in d2d communication  by ajithkumar vyasarao
Security challenges in d2d communication by ajithkumar vyasarao
 
S2 e (selective symbolic execution) -shivkrishna a
S2 e (selective symbolic execution) -shivkrishna aS2 e (selective symbolic execution) -shivkrishna a
S2 e (selective symbolic execution) -shivkrishna a
 
Dynamic binary analysis using angr siddharth muralee
Dynamic binary analysis using angr   siddharth muraleeDynamic binary analysis using angr   siddharth muralee
Dynamic binary analysis using angr siddharth muralee
 
Bit flipping attack on aes cbc - ashutosh ahelleya
Bit flipping attack on aes cbc -	ashutosh ahelleyaBit flipping attack on aes cbc -	ashutosh ahelleya
Bit flipping attack on aes cbc - ashutosh ahelleya
 
Security Analytics using ELK stack
Security Analytics using ELK stack	Security Analytics using ELK stack
Security Analytics using ELK stack
 
Linux Malware Analysis
Linux Malware Analysis	Linux Malware Analysis
Linux Malware Analysis
 
Introduction to Binary Exploitation
Introduction to Binary Exploitation	Introduction to Binary Exploitation
Introduction to Binary Exploitation
 
ATM Malware: Understanding the threat
ATM Malware: Understanding the threat	ATM Malware: Understanding the threat
ATM Malware: Understanding the threat
 

Recently uploaded

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 

Recently uploaded (20)

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 

Malicious url detection using machine learning

  • 1. Beyond Blacklists: Malicious Url Detection Using Machine Learning
  • 2. Who am I ? • Info security Investigator @ Cisco. • Completed Mtech from IIT Jodhpur in 2014. • Areas of interest include machine learning, computer vision and A.I. • Email : satyamiitj89@gmail.com
  • 3. Malicious websites Phishing : which one is real ??
  • 6. Problem in a Nutshell 6  URL features to identify malicious Web sites  No context, no content  Different classes of URLs  Benign, spam, phishing, exploits, scams...  For now, distinguish benign vs. malicious facebook.com fblight.com
  • 8. State of the Practice 8  Current approaches  Blacklists [SORBS, URIBL, SURBL, Spamhaus]  Learning on hand-tuned features [Garera et al, 2007]  Limitations  Cannot predict unlisted sites  Cannot account for new features  Arms race: Fast feedback cycle is critical More automated approach?
  • 10. Data Sets 10  Malicious URLs  5,000 from PhishTank (phishing)  15,000 from Spamscatter (spam, phishing, etc)  Benign URLs  15,000 from Yahoo Web directory  15,000 from DMOZ directory  Malicious x Benign → 4 Data Sets  30,000 – 55,000 features per data set
  • 11. Algorithms 11  Logistic regression w/ L1-norm regularization  Other models  Naive Bayes  Support vector machines (linear, RBF kernels)  Implicit feature selection  Easier to interpret
  • 13. Features to consider? 14 1) Blacklists 2) Simple heuristics 3) Domain name registration 4) Host properties 5) Lexical
  • 14. (1) Blacklist Queries 15  List of known malicious sites  Providers: SORBS, URIBL, SURBL, Spamhaus http://www.bfuduuioo1fp.mobi In blacklist? Yes http://fblight.com No In blacklist? http://www.bfuduuioo1fp.mobi Blacklist queries as features ........................................ ........................................
  • 15. (2) Manually-Selected Features 16  Considered by previous studies  IP address in hostname?  Number of dots in URL  WHOIS (domain name) registration date stopgap.cn registered 28 June 2009 http://72.23.5.122/www.bankofamerica.com/ http://www.bankofamerica.com.qytrpbcw.stopgap.cn/
  • 16. (3) WHOIS Features 17  Domain name registration  Date of registration, update, expiration  Registrant: Who registered domain?  Registrar: Who manages registration? http://sleazysalmon.com http://angryalbacore.com http://mangymackerel.com http://yammeringyellowtail.com Registered on 29 June 2009 By SpamMedia
  • 17. (4) Host-Based Features 18  Blacklisted? (SORBS, URIBL, SURBL, Spamhaus)  WHOIS: registrar, registrant, dates  IP address: Which ASes/IP prefixes?  DNS: TTL? PTR record exists/resolves?  Geography-related: Locale? Connection speed? 75.102.60.0/2269.63.176.0/20 facebook.com fblight.com
  • 18. (5) Lexical Features 19  Tokens in URL hostname + path  Length of URL  Entropy of the domain name http://www.bfuduuioo1fp.mobi/ws/ebayisapi.dll
  • 19. Which feature sets? 20 Blacklist Manual WHOIS Host-based Lexical Full w/o WHOIS/Blacklist 4,000 # Features 13,000 4 3 17,000 30,000 26,000
  • 20. Beyond Blacklists 21 Blacklist Full features Yahoo-PhishTank Higher detection rate for given false positive rate
  • 21. Limitations 22  False positives  Sites hosted in disreputable ISP  Guilt by association  False negatives  Compromised sites  Free hosting sites  Hosted in reputable ISP  Future work: Web page content
  • 22. Conclusion 23  Detect malicious URLs with high accuracy  Only using URL  Diverse feature set helps: 86.5% w/ 18,000+ features  Proof concept working in lab  Future work  Scaling up for deployment
  • 23. References  Ma, Justin, et al. "Beyond blacklists: learning to detect malicious web sites from suspicious URLs." Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2009.
  • 24. Q & A