SlideShare a Scribd company logo
Beyond Blacklists: Malicious Url Detection Using
Machine Learning
Who am I ?
• Info security Investigator @ Cisco.
• Completed Mtech from IIT Jodhpur in 2014.
• Areas of interest include machine learning,
computer vision and A.I.
• Email : satyamiitj89@gmail.com
Malicious websites
Phishing : which one is real ??
Visiting Malicious Websites
What we want ?
Problem in a Nutshell
6
 URL features to identify malicious Web sites
 No context, no content
 Different classes of URLs
 Benign, spam, phishing, exploits, scams...
 For now, distinguish benign vs. malicious
facebook.com fblight.com
Information about new websites
State of the Practice
8
 Current approaches
 Blacklists [SORBS, URIBL, SURBL, Spamhaus]
 Learning on hand-tuned features [Garera et al, 2007]
 Limitations
 Cannot predict unlisted sites
 Cannot account for new features
 Arms race: Fast feedback cycle is critical
More automated approach?
URL Classification System
9
Label Example Hypothesis
Data Sets
10
 Malicious URLs
 5,000 from PhishTank (phishing)
 15,000 from Spamscatter (spam, phishing, etc)
 Benign URLs
 15,000 from Yahoo Web directory
 15,000 from DMOZ directory
 Malicious x Benign → 4 Data Sets
 30,000 – 55,000 features per data set
Algorithms
11
 Logistic regression w/ L1-norm regularization
 Other models
 Naive Bayes
 Support vector machines (linear, RBF kernels)
 Implicit feature selection
 Easier to interpret
Feature vector construction
Features to consider?
14
1) Blacklists
2) Simple heuristics
3) Domain name registration
4) Host properties
5) Lexical
(1) Blacklist Queries
15
 List of known malicious sites
 Providers: SORBS, URIBL, SURBL,
Spamhaus
http://www.bfuduuioo1fp.mobi
In blacklist?
Yes
http://fblight.com
No
In blacklist?
http://www.bfuduuioo1fp.mobi
Blacklist queries as features
........................................
........................................
(2) Manually-Selected Features
16
 Considered by previous studies
 IP address in hostname?
 Number of dots in URL
 WHOIS (domain name) registration date
stopgap.cn registered 28
June 2009
http://72.23.5.122/www.bankofamerica.com/
http://www.bankofamerica.com.qytrpbcw.stopgap.cn/
(3) WHOIS Features
17
 Domain name registration
 Date of registration, update, expiration
 Registrant: Who registered domain?
 Registrar: Who manages registration?
http://sleazysalmon.com
http://angryalbacore.com
http://mangymackerel.com
http://yammeringyellowtail.com
Registered on
29 June 2009
By SpamMedia
(4) Host-Based Features
18
 Blacklisted? (SORBS, URIBL, SURBL, Spamhaus)
 WHOIS: registrar, registrant, dates
 IP address: Which ASes/IP prefixes?
 DNS: TTL? PTR record exists/resolves?
 Geography-related: Locale? Connection speed?
75.102.60.0/2269.63.176.0/20
facebook.com fblight.com
(5) Lexical Features
19
 Tokens in URL hostname + path
 Length of URL
 Entropy of the domain name
http://www.bfuduuioo1fp.mobi/ws/ebayisapi.dll
Which feature sets?
20
Blacklist
Manual
WHOIS
Host-based
Lexical
Full
w/o WHOIS/Blacklist
4,000
# Features
13,000
4
3
17,000
30,000
26,000
Beyond Blacklists
21
Blacklist
Full features
Yahoo-PhishTank
Higher detection rate for
given false positive rate
Limitations
22
 False positives
 Sites hosted in disreputable ISP
 Guilt by association
 False negatives
 Compromised sites
 Free hosting sites
 Hosted in reputable ISP
 Future work: Web page content
Conclusion
23
 Detect malicious URLs with high accuracy
 Only using URL
 Diverse feature set helps: 86.5% w/ 18,000+
features
 Proof concept working in lab
 Future work
 Scaling up for deployment
References
 Ma, Justin, et al. "Beyond blacklists: learning
to detect malicious web sites from suspicious
URLs." Proceedings of the 15th ACM SIGKDD
international conference on Knowledge
discovery and data mining. ACM, 2009.
Q & A

More Related Content

What's hot

Malware detection-using-machine-learning
Malware detection-using-machine-learningMalware detection-using-machine-learning
Malware detection-using-machine-learning
Security Bootcamp
 
IRJET- Fake Profile Identification using Machine Learning
IRJET-  	  Fake Profile Identification using Machine LearningIRJET-  	  Fake Profile Identification using Machine Learning
IRJET- Fake Profile Identification using Machine Learning
IRJET Journal
 
Attendance system based on face recognition using python by Raihan Sikdar
Attendance system based on face recognition using python by Raihan SikdarAttendance system based on face recognition using python by Raihan Sikdar
Attendance system based on face recognition using python by Raihan Sikdar
raihansikdar
 
Rootconf_phishing_v2
Rootconf_phishing_v2Rootconf_phishing_v2
Rootconf_phishing_v2
Arjun BM
 
Automated incident response
Automated incident responseAutomated incident response
Automated incident response
Siemplify
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learning
Sandeep Garg
 
Detecting the presence of cyberbullying using computer software
Detecting the presence of cyberbullying using computer softwareDetecting the presence of cyberbullying using computer software
Detecting the presence of cyberbullying using computer software
Ashish Arora
 
418 Automated Criminal Identification System using Face Detection and.pptx
418 Automated Criminal Identification System using Face Detection and.pptx418 Automated Criminal Identification System using Face Detection and.pptx
418 Automated Criminal Identification System using Face Detection and.pptx
Shivanig12
 
Face Recognition Attendance System
Face Recognition Attendance System Face Recognition Attendance System
Face Recognition Attendance System
Shreya Dandavate
 
Seminar on detecting fake accounts in social media using machine learning
Seminar on detecting fake accounts in social media using machine learningSeminar on detecting fake accounts in social media using machine learning
Seminar on detecting fake accounts in social media using machine learning
Parvathi Sanil Nair
 
BSc CSIT Final Year Project Report on Hamro Krishi - Nepal
BSc CSIT Final Year Project Report on Hamro Krishi - NepalBSc CSIT Final Year Project Report on Hamro Krishi - Nepal
BSc CSIT Final Year Project Report on Hamro Krishi - Nepal
Sirish Paudel
 
Project synopsis on face recognition in e attendance
Project synopsis on face recognition in e attendanceProject synopsis on face recognition in e attendance
Project synopsis on face recognition in e attendance
Nitesh Dubey
 
Detection of Fake Accounts in Instagram Using Machine Learning
Detection of Fake Accounts in Instagram Using Machine LearningDetection of Fake Accounts in Instagram Using Machine Learning
Detection of Fake Accounts in Instagram Using Machine Learning
AIRCC Publishing Corporation
 
Final spam-e-mail-detection
Final  spam-e-mail-detectionFinal  spam-e-mail-detection
Final spam-e-mail-detection
Partnered Health
 
Browser Security
Browser SecurityBrowser Security
Browser Security
Roberto Suggi Liverani
 
Fraud detection
Fraud detectionFraud detection
CYBERBULLYING DETECTION USING MACHINE LEARNING-1 (1).pdf
CYBERBULLYING DETECTION USING              MACHINE LEARNING-1 (1).pdfCYBERBULLYING DETECTION USING              MACHINE LEARNING-1 (1).pdf
CYBERBULLYING DETECTION USING MACHINE LEARNING-1 (1).pdf
KumbidiGaming
 
Malware Detection using Machine Learning
Malware Detection using Machine Learning	Malware Detection using Machine Learning
Malware Detection using Machine Learning
Cysinfo Cyber Security Community
 
Phishing Attacks: A Challenge Ahead
Phishing Attacks: A Challenge AheadPhishing Attacks: A Challenge Ahead
Phishing Attacks: A Challenge Ahead
eLearning Papers
 
Online Voting System ppt
Online Voting System pptOnline Voting System ppt

What's hot (20)

Malware detection-using-machine-learning
Malware detection-using-machine-learningMalware detection-using-machine-learning
Malware detection-using-machine-learning
 
IRJET- Fake Profile Identification using Machine Learning
IRJET-  	  Fake Profile Identification using Machine LearningIRJET-  	  Fake Profile Identification using Machine Learning
IRJET- Fake Profile Identification using Machine Learning
 
Attendance system based on face recognition using python by Raihan Sikdar
Attendance system based on face recognition using python by Raihan SikdarAttendance system based on face recognition using python by Raihan Sikdar
Attendance system based on face recognition using python by Raihan Sikdar
 
Rootconf_phishing_v2
Rootconf_phishing_v2Rootconf_phishing_v2
Rootconf_phishing_v2
 
Automated incident response
Automated incident responseAutomated incident response
Automated incident response
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learning
 
Detecting the presence of cyberbullying using computer software
Detecting the presence of cyberbullying using computer softwareDetecting the presence of cyberbullying using computer software
Detecting the presence of cyberbullying using computer software
 
418 Automated Criminal Identification System using Face Detection and.pptx
418 Automated Criminal Identification System using Face Detection and.pptx418 Automated Criminal Identification System using Face Detection and.pptx
418 Automated Criminal Identification System using Face Detection and.pptx
 
Face Recognition Attendance System
Face Recognition Attendance System Face Recognition Attendance System
Face Recognition Attendance System
 
Seminar on detecting fake accounts in social media using machine learning
Seminar on detecting fake accounts in social media using machine learningSeminar on detecting fake accounts in social media using machine learning
Seminar on detecting fake accounts in social media using machine learning
 
BSc CSIT Final Year Project Report on Hamro Krishi - Nepal
BSc CSIT Final Year Project Report on Hamro Krishi - NepalBSc CSIT Final Year Project Report on Hamro Krishi - Nepal
BSc CSIT Final Year Project Report on Hamro Krishi - Nepal
 
Project synopsis on face recognition in e attendance
Project synopsis on face recognition in e attendanceProject synopsis on face recognition in e attendance
Project synopsis on face recognition in e attendance
 
Detection of Fake Accounts in Instagram Using Machine Learning
Detection of Fake Accounts in Instagram Using Machine LearningDetection of Fake Accounts in Instagram Using Machine Learning
Detection of Fake Accounts in Instagram Using Machine Learning
 
Final spam-e-mail-detection
Final  spam-e-mail-detectionFinal  spam-e-mail-detection
Final spam-e-mail-detection
 
Browser Security
Browser SecurityBrowser Security
Browser Security
 
Fraud detection
Fraud detectionFraud detection
Fraud detection
 
CYBERBULLYING DETECTION USING MACHINE LEARNING-1 (1).pdf
CYBERBULLYING DETECTION USING              MACHINE LEARNING-1 (1).pdfCYBERBULLYING DETECTION USING              MACHINE LEARNING-1 (1).pdf
CYBERBULLYING DETECTION USING MACHINE LEARNING-1 (1).pdf
 
Malware Detection using Machine Learning
Malware Detection using Machine Learning	Malware Detection using Machine Learning
Malware Detection using Machine Learning
 
Phishing Attacks: A Challenge Ahead
Phishing Attacks: A Challenge AheadPhishing Attacks: A Challenge Ahead
Phishing Attacks: A Challenge Ahead
 
Online Voting System ppt
Online Voting System pptOnline Voting System ppt
Online Voting System ppt
 

Similar to Malicious Url Detection Using Machine Learning

Understanding Web Bots and How They Hurt Your Business
Understanding Web Bots and How They Hurt Your BusinessUnderstanding Web Bots and How They Hurt Your Business
Understanding Web Bots and How They Hurt Your BusinessImperva Incapsula
 
Hitbkl 2012
Hitbkl 2012Hitbkl 2012
Hitbkl 2012F _
 
Case Study on Property Portal Data Security
Case Study on Property Portal Data SecurityCase Study on Property Portal Data Security
Case Study on Property Portal Data Security
Property Portal Watch
 
Ensuring Property Portal Listing Data Security
Ensuring Property Portal Listing Data SecurityEnsuring Property Portal Listing Data Security
Ensuring Property Portal Listing Data Security
Distil Networks
 
How to clean up travel website traffic from bots and spammers?
How to clean up travel website traffic from bots and spammers?How to clean up travel website traffic from bots and spammers?
How to clean up travel website traffic from bots and spammers?
tnooz
 
Cleaning up website traffic from bots & spammers
Cleaning up website traffic from bots & spammersCleaning up website traffic from bots & spammers
Cleaning up website traffic from bots & spammers
Distil Networks
 
Rtp rsp16-distil networks-final-deck
Rtp rsp16-distil networks-final-deckRtp rsp16-distil networks-final-deck
Rtp rsp16-distil networks-final-deck
G3 Communications
 
Are Bot Operators Eating Your Lunch?
Are Bot Operators Eating Your Lunch?Are Bot Operators Eating Your Lunch?
Are Bot Operators Eating Your Lunch?
Distil Networks
 
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
IRJET Journal
 
PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...
PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...
PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...
IJCNCJournal
 
PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...
PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...
PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...
IJCNCJournal
 
IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...
IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...
IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...
IRJET Journal
 
IRJET- Phishing Website Detection System
IRJET- Phishing Website Detection SystemIRJET- Phishing Website Detection System
IRJET- Phishing Website Detection System
IRJET Journal
 
DETECTION OF PHISHING WEBSITES USING MACHINE LEARNING
DETECTION OF PHISHING WEBSITES USING MACHINE LEARNINGDETECTION OF PHISHING WEBSITES USING MACHINE LEARNING
DETECTION OF PHISHING WEBSITES USING MACHINE LEARNING
IRJET Journal
 
State of the Art Analysis Approach for Identification of the Malignant URLs
State of the Art Analysis Approach for Identification of the Malignant URLsState of the Art Analysis Approach for Identification of the Malignant URLs
State of the Art Analysis Approach for Identification of the Malignant URLs
IOSRjournaljce
 
17 00 distil rami
17 00 distil rami17 00 distil rami
17 00 distil rami
Property Portal Watch
 
Scraping in 60 minutes (CIJ Summer School 2019)
Scraping in 60 minutes (CIJ Summer School 2019)Scraping in 60 minutes (CIJ Summer School 2019)
Scraping in 60 minutes (CIJ Summer School 2019)
Paul Bradshaw
 
Paper Presentation - "Your Botnet is my Botnet : Analysis of a Botnet Takeover"
Paper Presentation - "Your Botnet is my Botnet : Analysis of a Botnet Takeover"Paper Presentation - "Your Botnet is my Botnet : Analysis of a Botnet Takeover"
Paper Presentation - "Your Botnet is my Botnet : Analysis of a Botnet Takeover"
Jishnu Pradeep
 
Borges rprojectcs691y
Borges rprojectcs691yBorges rprojectcs691y
Borges rprojectcs691y
rayborg
 

Similar to Malicious Url Detection Using Machine Learning (20)

Understanding Web Bots and How They Hurt Your Business
Understanding Web Bots and How They Hurt Your BusinessUnderstanding Web Bots and How They Hurt Your Business
Understanding Web Bots and How They Hurt Your Business
 
Hitbkl 2012
Hitbkl 2012Hitbkl 2012
Hitbkl 2012
 
Case Study on Property Portal Data Security
Case Study on Property Portal Data SecurityCase Study on Property Portal Data Security
Case Study on Property Portal Data Security
 
Ensuring Property Portal Listing Data Security
Ensuring Property Portal Listing Data SecurityEnsuring Property Portal Listing Data Security
Ensuring Property Portal Listing Data Security
 
How to clean up travel website traffic from bots and spammers?
How to clean up travel website traffic from bots and spammers?How to clean up travel website traffic from bots and spammers?
How to clean up travel website traffic from bots and spammers?
 
Cleaning up website traffic from bots & spammers
Cleaning up website traffic from bots & spammersCleaning up website traffic from bots & spammers
Cleaning up website traffic from bots & spammers
 
Rtp rsp16-distil networks-final-deck
Rtp rsp16-distil networks-final-deckRtp rsp16-distil networks-final-deck
Rtp rsp16-distil networks-final-deck
 
Are Bot Operators Eating Your Lunch?
Are Bot Operators Eating Your Lunch?Are Bot Operators Eating Your Lunch?
Are Bot Operators Eating Your Lunch?
 
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
 
PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...
PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...
PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...
 
PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...
PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...
PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...
 
IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...
IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...
IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...
 
IRJET- Phishing Website Detection System
IRJET- Phishing Website Detection SystemIRJET- Phishing Website Detection System
IRJET- Phishing Website Detection System
 
DETECTION OF PHISHING WEBSITES USING MACHINE LEARNING
DETECTION OF PHISHING WEBSITES USING MACHINE LEARNINGDETECTION OF PHISHING WEBSITES USING MACHINE LEARNING
DETECTION OF PHISHING WEBSITES USING MACHINE LEARNING
 
State of the Art Analysis Approach for Identification of the Malignant URLs
State of the Art Analysis Approach for Identification of the Malignant URLsState of the Art Analysis Approach for Identification of the Malignant URLs
State of the Art Analysis Approach for Identification of the Malignant URLs
 
17 00 distil rami
17 00 distil rami17 00 distil rami
17 00 distil rami
 
Scraping in 60 minutes (CIJ Summer School 2019)
Scraping in 60 minutes (CIJ Summer School 2019)Scraping in 60 minutes (CIJ Summer School 2019)
Scraping in 60 minutes (CIJ Summer School 2019)
 
Wsdm yu
Wsdm yuWsdm yu
Wsdm yu
 
Paper Presentation - "Your Botnet is my Botnet : Analysis of a Botnet Takeover"
Paper Presentation - "Your Botnet is my Botnet : Analysis of a Botnet Takeover"Paper Presentation - "Your Botnet is my Botnet : Analysis of a Botnet Takeover"
Paper Presentation - "Your Botnet is my Botnet : Analysis of a Botnet Takeover"
 
Borges rprojectcs691y
Borges rprojectcs691yBorges rprojectcs691y
Borges rprojectcs691y
 

More from securityxploded

Fingerprinting healthcare institutions
Fingerprinting healthcare institutionsFingerprinting healthcare institutions
Fingerprinting healthcare institutions
securityxploded
 
Hollow Process Injection - Reversing and Investigating Malware Evasive Tactics
Hollow Process Injection - Reversing and Investigating Malware Evasive TacticsHollow Process Injection - Reversing and Investigating Malware Evasive Tactics
Hollow Process Injection - Reversing and Investigating Malware Evasive Tactics
securityxploded
 
Buffer Overflow Attacks
Buffer Overflow AttacksBuffer Overflow Attacks
Buffer Overflow Attacks
securityxploded
 
Malicious Client Detection Using Machine Learning
Malicious Client Detection Using Machine LearningMalicious Client Detection Using Machine Learning
Malicious Client Detection Using Machine Learning
securityxploded
 
Understanding CryptoLocker (Ransomware) with a Case Study
Understanding CryptoLocker (Ransomware) with a Case StudyUnderstanding CryptoLocker (Ransomware) with a Case Study
Understanding CryptoLocker (Ransomware) with a Case Study
securityxploded
 
Linux Malware Analysis using Limon Sandbox
Linux Malware Analysis using Limon SandboxLinux Malware Analysis using Limon Sandbox
Linux Malware Analysis using Limon Sandbox
securityxploded
 
Introduction to SMPC
Introduction to SMPCIntroduction to SMPC
Introduction to SMPC
securityxploded
 
Breaking into hospitals
Breaking into hospitalsBreaking into hospitals
Breaking into hospitals
securityxploded
 
Bluetooth [in]security
Bluetooth [in]securityBluetooth [in]security
Bluetooth [in]security
securityxploded
 
Basic malware analysis
Basic malware analysisBasic malware analysis
Basic malware analysis
securityxploded
 
Automating Malware Analysis
Automating Malware AnalysisAutomating Malware Analysis
Automating Malware Analysis
securityxploded
 
Reverse Engineering Malware
Reverse Engineering MalwareReverse Engineering Malware
Reverse Engineering Malware
securityxploded
 
DLL Preloading Attack
DLL Preloading AttackDLL Preloading Attack
DLL Preloading Attack
securityxploded
 
Partial Homomorphic Encryption
Partial Homomorphic EncryptionPartial Homomorphic Encryption
Partial Homomorphic Encryption
securityxploded
 
Hunting Rootkit From the Dark Corners Of Memory
Hunting Rootkit From the Dark Corners Of MemoryHunting Rootkit From the Dark Corners Of Memory
Hunting Rootkit From the Dark Corners Of Memory
securityxploded
 
Return Address – The Silver Bullet
Return Address – The Silver BulletReturn Address – The Silver Bullet
Return Address – The Silver Bullet
securityxploded
 
Defeating public exploit protections (EMET v5.2 and more)
Defeating public exploit protections (EMET v5.2 and more)Defeating public exploit protections (EMET v5.2 and more)
Defeating public exploit protections (EMET v5.2 and more)
securityxploded
 
Hunting Ghost RAT Using Memory Forensics
Hunting Ghost RAT Using Memory ForensicsHunting Ghost RAT Using Memory Forensics
Hunting Ghost RAT Using Memory Forensics
securityxploded
 
Anatomy of Exploit Kits
Anatomy of Exploit KitsAnatomy of Exploit Kits
Anatomy of Exploit Kits
securityxploded
 
MalwareNet Project
MalwareNet ProjectMalwareNet Project
MalwareNet Project
securityxploded
 

More from securityxploded (20)

Fingerprinting healthcare institutions
Fingerprinting healthcare institutionsFingerprinting healthcare institutions
Fingerprinting healthcare institutions
 
Hollow Process Injection - Reversing and Investigating Malware Evasive Tactics
Hollow Process Injection - Reversing and Investigating Malware Evasive TacticsHollow Process Injection - Reversing and Investigating Malware Evasive Tactics
Hollow Process Injection - Reversing and Investigating Malware Evasive Tactics
 
Buffer Overflow Attacks
Buffer Overflow AttacksBuffer Overflow Attacks
Buffer Overflow Attacks
 
Malicious Client Detection Using Machine Learning
Malicious Client Detection Using Machine LearningMalicious Client Detection Using Machine Learning
Malicious Client Detection Using Machine Learning
 
Understanding CryptoLocker (Ransomware) with a Case Study
Understanding CryptoLocker (Ransomware) with a Case StudyUnderstanding CryptoLocker (Ransomware) with a Case Study
Understanding CryptoLocker (Ransomware) with a Case Study
 
Linux Malware Analysis using Limon Sandbox
Linux Malware Analysis using Limon SandboxLinux Malware Analysis using Limon Sandbox
Linux Malware Analysis using Limon Sandbox
 
Introduction to SMPC
Introduction to SMPCIntroduction to SMPC
Introduction to SMPC
 
Breaking into hospitals
Breaking into hospitalsBreaking into hospitals
Breaking into hospitals
 
Bluetooth [in]security
Bluetooth [in]securityBluetooth [in]security
Bluetooth [in]security
 
Basic malware analysis
Basic malware analysisBasic malware analysis
Basic malware analysis
 
Automating Malware Analysis
Automating Malware AnalysisAutomating Malware Analysis
Automating Malware Analysis
 
Reverse Engineering Malware
Reverse Engineering MalwareReverse Engineering Malware
Reverse Engineering Malware
 
DLL Preloading Attack
DLL Preloading AttackDLL Preloading Attack
DLL Preloading Attack
 
Partial Homomorphic Encryption
Partial Homomorphic EncryptionPartial Homomorphic Encryption
Partial Homomorphic Encryption
 
Hunting Rootkit From the Dark Corners Of Memory
Hunting Rootkit From the Dark Corners Of MemoryHunting Rootkit From the Dark Corners Of Memory
Hunting Rootkit From the Dark Corners Of Memory
 
Return Address – The Silver Bullet
Return Address – The Silver BulletReturn Address – The Silver Bullet
Return Address – The Silver Bullet
 
Defeating public exploit protections (EMET v5.2 and more)
Defeating public exploit protections (EMET v5.2 and more)Defeating public exploit protections (EMET v5.2 and more)
Defeating public exploit protections (EMET v5.2 and more)
 
Hunting Ghost RAT Using Memory Forensics
Hunting Ghost RAT Using Memory ForensicsHunting Ghost RAT Using Memory Forensics
Hunting Ghost RAT Using Memory Forensics
 
Anatomy of Exploit Kits
Anatomy of Exploit KitsAnatomy of Exploit Kits
Anatomy of Exploit Kits
 
MalwareNet Project
MalwareNet ProjectMalwareNet Project
MalwareNet Project
 

Recently uploaded

Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 

Recently uploaded (20)

Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 

Malicious Url Detection Using Machine Learning

  • 1. Beyond Blacklists: Malicious Url Detection Using Machine Learning
  • 2. Who am I ? • Info security Investigator @ Cisco. • Completed Mtech from IIT Jodhpur in 2014. • Areas of interest include machine learning, computer vision and A.I. • Email : satyamiitj89@gmail.com
  • 3. Malicious websites Phishing : which one is real ??
  • 6. Problem in a Nutshell 6  URL features to identify malicious Web sites  No context, no content  Different classes of URLs  Benign, spam, phishing, exploits, scams...  For now, distinguish benign vs. malicious facebook.com fblight.com
  • 8. State of the Practice 8  Current approaches  Blacklists [SORBS, URIBL, SURBL, Spamhaus]  Learning on hand-tuned features [Garera et al, 2007]  Limitations  Cannot predict unlisted sites  Cannot account for new features  Arms race: Fast feedback cycle is critical More automated approach?
  • 10. Data Sets 10  Malicious URLs  5,000 from PhishTank (phishing)  15,000 from Spamscatter (spam, phishing, etc)  Benign URLs  15,000 from Yahoo Web directory  15,000 from DMOZ directory  Malicious x Benign → 4 Data Sets  30,000 – 55,000 features per data set
  • 11. Algorithms 11  Logistic regression w/ L1-norm regularization  Other models  Naive Bayes  Support vector machines (linear, RBF kernels)  Implicit feature selection  Easier to interpret
  • 13. Features to consider? 14 1) Blacklists 2) Simple heuristics 3) Domain name registration 4) Host properties 5) Lexical
  • 14. (1) Blacklist Queries 15  List of known malicious sites  Providers: SORBS, URIBL, SURBL, Spamhaus http://www.bfuduuioo1fp.mobi In blacklist? Yes http://fblight.com No In blacklist? http://www.bfuduuioo1fp.mobi Blacklist queries as features ........................................ ........................................
  • 15. (2) Manually-Selected Features 16  Considered by previous studies  IP address in hostname?  Number of dots in URL  WHOIS (domain name) registration date stopgap.cn registered 28 June 2009 http://72.23.5.122/www.bankofamerica.com/ http://www.bankofamerica.com.qytrpbcw.stopgap.cn/
  • 16. (3) WHOIS Features 17  Domain name registration  Date of registration, update, expiration  Registrant: Who registered domain?  Registrar: Who manages registration? http://sleazysalmon.com http://angryalbacore.com http://mangymackerel.com http://yammeringyellowtail.com Registered on 29 June 2009 By SpamMedia
  • 17. (4) Host-Based Features 18  Blacklisted? (SORBS, URIBL, SURBL, Spamhaus)  WHOIS: registrar, registrant, dates  IP address: Which ASes/IP prefixes?  DNS: TTL? PTR record exists/resolves?  Geography-related: Locale? Connection speed? 75.102.60.0/2269.63.176.0/20 facebook.com fblight.com
  • 18. (5) Lexical Features 19  Tokens in URL hostname + path  Length of URL  Entropy of the domain name http://www.bfuduuioo1fp.mobi/ws/ebayisapi.dll
  • 19. Which feature sets? 20 Blacklist Manual WHOIS Host-based Lexical Full w/o WHOIS/Blacklist 4,000 # Features 13,000 4 3 17,000 30,000 26,000
  • 20. Beyond Blacklists 21 Blacklist Full features Yahoo-PhishTank Higher detection rate for given false positive rate
  • 21. Limitations 22  False positives  Sites hosted in disreputable ISP  Guilt by association  False negatives  Compromised sites  Free hosting sites  Hosted in reputable ISP  Future work: Web page content
  • 22. Conclusion 23  Detect malicious URLs with high accuracy  Only using URL  Diverse feature set helps: 86.5% w/ 18,000+ features  Proof concept working in lab  Future work  Scaling up for deployment
  • 23. References  Ma, Justin, et al. "Beyond blacklists: learning to detect malicious web sites from suspicious URLs." Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2009.
  • 24. Q & A