SlideShare a Scribd company logo
1 of 33
Classifying Phishing URLs Using
Recurrent Neural Networks
Sergio Villegas
Javier Vargas
*Alejandro Correa Bahnsen
Easy Solutions Research
Eduardo Contreras Bohorquez
Fabio A. Gonzalez
MindLab Research Group,
Universidad Nacional de
Colombia
Industry recognition
A leading global provider of electronic
fraud prevention for financial institutions
and enterprise customers
385 customers
In 30 countries
100 million
Users protected
27+ billion
Online connections monitored
About Easy Solutions®
Easy Solutions to be Acquired by New Joint Venture Creating Global, Secure Infrastructure Company
Phishing
3
Phishing is the act of defrauding an online
user in order to obtain personal information
by posing as a trustworthy institution or
entity.
Typical Phishing Example
4
Why Phishing Detection is Hard
5
Original Website Only Using Images Subtle Changes
Is It Phishing?
Ideal Phishing Detection System
7
Machine
Learning
Algorithm
Ideal Phishing Detection System - Issues
8
Issues with full content
analysis:
• Time consuming
• Impractical to process
millions of websites per day
• Hard to implement for
small devices
There is always the need for an URL
9
Database of URLs
1,000,000 Phishing URLs from PhishTank
10
http://moviesjingle.com/auto/163.com/index.php
1,000,000 Legitimate URLs from Common Crawl
http://paypal.com.update.account.toughbook.cl/8a30e847925afc597516
1aeabe8930f1/?cmd=_home&dispatch=d09b78f5812945a73610edf38
http://msystemtech.ru/components/com_users/Italy/zz/Login.php?run=
_login-submit&session=68bbd43c854147324d77872062349924
https://www.sanfordhealth.org/ChildrensHealth/Article/73980
http://www.grahamleader.com/ci_25029538/these-are-5-worst-super-
bowl-halftime-shows&defid=1634182
http://www.carolinaguesthouse.co.uk/onlinebooking/?industrytype=1&
startdate=2013-09-05&nights=2&location&productid=25d47a24-6b74
CLASSIFYING PHISHING USING
URL LEXICAL AND
STATISTICAL FREQUENCIES
11
URL Lexical and Statistical Frequencies
12
http://www.papaya.com/secure_login.php
URL length Alexa
Ranking
Path length
URL Entropy
# of .com
Punctuation
count
TLD count
Is IP?
Euclidean
distance
KS & KL
distance
URL Lexical and Statistical Frequencies
13
http://www.papaya.com/secure_login.php
URL length Alexa
Ranking
Path length
URL Entropy
# of .com
Punctuation
count
TLD count
Is IP?
Euclidean
distance
KS & KL
distance
Is It Phishing?
URL Lexical and Statistical Frequencies
14
3-Fold CV Accuracy Recall Precision
Average 93.47% 93.28% 93.64%
Deviation 0.01% 0.02% 0.03%
Results:
URL Lexical and Statistical Frequencies
15
Feature
Importance
MODELING PHISHING URLS
WITH RECURRENT
NEURAL NETWORKS
16
Normal Neural Network
17
Source: https://en.wikipedia.org/wiki/Artificial_neural_network
Recurrent Neural Networks RNN
Have loops!
19
The Problem of Long-Term Dependencies
20
Short term dependencies are easy
long term …
Long-Short Term Memory Networks LSTM
21
RNN contains
a single layer
LSTM contains
four interacting
layers
Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Long-Short Term Memory Networks LSTM
22
Key idea: Cell State
LSTM Step-by-Step
23
Step 1. Decide what information is going to be used
LSTM Step-by-Step
24
Step 2. Which new information is stored
LSTM Step-by-Step
25
Step 3. Update old cell state
LSTM Step-by-Step
26
Step 4. Make prediction
Modeling Architecture for URL Classification
27
URL
h
t
t
p
:
/
/
w
w
w
.
p
a
p
a
y
a
.
c
o
m
One hot
Encoding
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
Embedding
3.2 1.2 … 1.7
6.4 2.3 … 2.6
6.4 3.0 … 1.7
3.4 2.6 … 3.4
2.6 3.8 … 2.6
3.5 3.2 … 6.4
1.7 4.2 … 6.4
8.6 2.4 … 6.4
4.3 2.9 … 6.4
2.2 3.4 … 3.4
3.2 2.6 … 2.6
4.2 2.2 … 3.5
2.4 3.2 … 1.7
2.9 1.7 … 8.6
3.0 6.4 … 2.6
2.6 6.4 … 3.8
3.8 3.4 … 3.2
3.3 2.6 … 2.2
3.1 2.2 … 2.9
1.8 3.2 … 3.0
2.5 6.4 … 2.6
LSTM
LSTM
LSTM
LSTM
Sigmoid
…
Long-Short Term Memory Networks
28
3-Fold CV Accuracy Recall Precision
Average 98.76% 98.93% 98.60%
Deviation 0.04% 0.02% 0.02%
Results:
Models Comparison
29
90%
91%
92%
93%
94%
95%
96%
97%
98%
99%
100%
Accuracy Recall Precision
Long-Short Term Memory Network Random Forest
Models Comparison
30
Model
Random Forest
Long-Short Term
Memory Network
Memory
Consumption (MB)
289
0.56
Evaluation Time
(URLs per sec)
942
281
Training Time
(minutes)
2.95
238.7
What we learned
• Discerning URLs by their patterns is a good predictor of
phishing websites
• LSTM model shows an overall higher prediction
performance without the need of expert knowledge to
create the features
31
Free to use
32
Thank you!
Any questions or comments, please let me know.
Alejandro Correa Bahnsen, PhD
Chief Data Scientist
acorrea@easysol.net

More Related Content

What's hot

[CB20] Operation Chimera - APT Operation Targets Semiconductor Vendors by CK ...
[CB20] Operation Chimera - APT Operation Targets Semiconductor Vendors by CK ...[CB20] Operation Chimera - APT Operation Targets Semiconductor Vendors by CK ...
[CB20] Operation Chimera - APT Operation Targets Semiconductor Vendors by CK ...
CODE BLUE
 
BlueHat v18 || The law of unintended consequences - gdpr impact on cybersecur...
BlueHat v18 || The law of unintended consequences - gdpr impact on cybersecur...BlueHat v18 || The law of unintended consequences - gdpr impact on cybersecur...
BlueHat v18 || The law of unintended consequences - gdpr impact on cybersecur...
BlueHat Security Conference
 
BlueHat v18 || Modern day entomology - examining the inner workings of the bu...
BlueHat v18 || Modern day entomology - examining the inner workings of the bu...BlueHat v18 || Modern day entomology - examining the inner workings of the bu...
BlueHat v18 || Modern day entomology - examining the inner workings of the bu...
BlueHat Security Conference
 
PHISHING DETECTION
PHISHING DETECTIONPHISHING DETECTION
PHISHING DETECTION
umme ayesha
 

What's hot (20)

Blueliv Corporate Brochure 2017
Blueliv Corporate Brochure 2017Blueliv Corporate Brochure 2017
Blueliv Corporate Brochure 2017
 
[CB20] Operation Chimera - APT Operation Targets Semiconductor Vendors by CK ...
[CB20] Operation Chimera - APT Operation Targets Semiconductor Vendors by CK ...[CB20] Operation Chimera - APT Operation Targets Semiconductor Vendors by CK ...
[CB20] Operation Chimera - APT Operation Targets Semiconductor Vendors by CK ...
 
Data Analytics in Cyber Security - Intellisys 2015 Keynote
Data Analytics in Cyber Security - Intellisys 2015 KeynoteData Analytics in Cyber Security - Intellisys 2015 Keynote
Data Analytics in Cyber Security - Intellisys 2015 Keynote
 
BlueHat v18 || The law of unintended consequences - gdpr impact on cybersecur...
BlueHat v18 || The law of unintended consequences - gdpr impact on cybersecur...BlueHat v18 || The law of unintended consequences - gdpr impact on cybersecur...
BlueHat v18 || The law of unintended consequences - gdpr impact on cybersecur...
 
The Anatomy of a Data Breach
The Anatomy of a Data BreachThe Anatomy of a Data Breach
The Anatomy of a Data Breach
 
InfoSec Monthly News Recap: April 2017
InfoSec Monthly News Recap: April 2017InfoSec Monthly News Recap: April 2017
InfoSec Monthly News Recap: April 2017
 
CONFidence 2017: Hackers vs SOC - 12 hours to break in, 250 days to detect (G...
CONFidence 2017: Hackers vs SOC - 12 hours to break in, 250 days to detect (G...CONFidence 2017: Hackers vs SOC - 12 hours to break in, 250 days to detect (G...
CONFidence 2017: Hackers vs SOC - 12 hours to break in, 250 days to detect (G...
 
Intelligent Application Security
Intelligent Application SecurityIntelligent Application Security
Intelligent Application Security
 
Symantec Webinar | How to Detect Targeted Ransomware with MITRE ATT&CK
Symantec Webinar | How to Detect Targeted Ransomware with MITRE ATT&CKSymantec Webinar | How to Detect Targeted Ransomware with MITRE ATT&CK
Symantec Webinar | How to Detect Targeted Ransomware with MITRE ATT&CK
 
(SACON) Nilanjan, Jitendra chauhan & Abhisek Datta - How does an attacker kno...
(SACON) Nilanjan, Jitendra chauhan & Abhisek Datta - How does an attacker kno...(SACON) Nilanjan, Jitendra chauhan & Abhisek Datta - How does an attacker kno...
(SACON) Nilanjan, Jitendra chauhan & Abhisek Datta - How does an attacker kno...
 
BlueHat v18 || Modern day entomology - examining the inner workings of the bu...
BlueHat v18 || Modern day entomology - examining the inner workings of the bu...BlueHat v18 || Modern day entomology - examining the inner workings of the bu...
BlueHat v18 || Modern day entomology - examining the inner workings of the bu...
 
Cybersecurity: How to Use What We Already Know
Cybersecurity: How to Use What We Already KnowCybersecurity: How to Use What We Already Know
Cybersecurity: How to Use What We Already Know
 
PHISHING DETECTION
PHISHING DETECTIONPHISHING DETECTION
PHISHING DETECTION
 
How Machine Learning & AI Will Improve Cyber Security
How Machine Learning & AI Will Improve Cyber SecurityHow Machine Learning & AI Will Improve Cyber Security
How Machine Learning & AI Will Improve Cyber Security
 
"Inter- application vulnerabilities. hunting for bugs in secure applications"...
"Inter- application vulnerabilities. hunting for bugs in secure applications"..."Inter- application vulnerabilities. hunting for bugs in secure applications"...
"Inter- application vulnerabilities. hunting for bugs in secure applications"...
 
Netpluz - Managed Firewall & Endpoint Protection
Netpluz - Managed Firewall & Endpoint Protection Netpluz - Managed Firewall & Endpoint Protection
Netpluz - Managed Firewall & Endpoint Protection
 
Why Organisations Need_Barac
Why Organisations Need_BaracWhy Organisations Need_Barac
Why Organisations Need_Barac
 
Insider theft detection
Insider theft detection Insider theft detection
Insider theft detection
 
Phishing Attacks: A Challenge Ahead
Phishing Attacks: A Challenge AheadPhishing Attacks: A Challenge Ahead
Phishing Attacks: A Challenge Ahead
 
Introduction to MITRE ATT&CK
Introduction to MITRE ATT&CKIntroduction to MITRE ATT&CK
Introduction to MITRE ATT&CK
 

Viewers also liked

Fraud Detection with Cost-Sensitive Predictive Analytics
Fraud Detection with Cost-Sensitive Predictive AnalyticsFraud Detection with Cost-Sensitive Predictive Analytics
Fraud Detection with Cost-Sensitive Predictive Analytics
Alejandro Correa Bahnsen, PhD
 
Analytics - compitiendo en la era de la informacion
Analytics - compitiendo en la era de la informacionAnalytics - compitiendo en la era de la informacion
Analytics - compitiendo en la era de la informacion
Alejandro Correa Bahnsen, PhD
 
Maximizing a churn campaigns profitability with cost sensitive machine learning
Maximizing a churn campaigns profitability with cost sensitive machine learningMaximizing a churn campaigns profitability with cost sensitive machine learning
Maximizing a churn campaigns profitability with cost sensitive machine learning
Alejandro Correa Bahnsen, PhD
 
Fraud analytics detección y prevención de fraudes en la era del big data sl...
Fraud analytics detección y prevención de fraudes en la era del big data   sl...Fraud analytics detección y prevención de fraudes en la era del big data   sl...
Fraud analytics detección y prevención de fraudes en la era del big data sl...
Alejandro Correa Bahnsen, PhD
 
1609 Fraud Data Science
1609 Fraud Data Science1609 Fraud Data Science
1609 Fraud Data Science
Alejandro Correa Bahnsen, PhD
 
Ensembles of example dependent cost-sensitive decision trees slides
Ensembles of example dependent cost-sensitive decision trees slidesEnsembles of example dependent cost-sensitive decision trees slides
Ensembles of example dependent cost-sensitive decision trees slides
Alejandro Correa Bahnsen, PhD
 

Viewers also liked (13)

Fraud Detection with Cost-Sensitive Predictive Analytics
Fraud Detection with Cost-Sensitive Predictive AnalyticsFraud Detection with Cost-Sensitive Predictive Analytics
Fraud Detection with Cost-Sensitive Predictive Analytics
 
Example-Dependent Cost-Sensitive Credit Card Fraud Detection
Example-Dependent Cost-Sensitive Credit Card Fraud DetectionExample-Dependent Cost-Sensitive Credit Card Fraud Detection
Example-Dependent Cost-Sensitive Credit Card Fraud Detection
 
PhD Defense - Example-Dependent Cost-Sensitive Classification
PhD Defense - Example-Dependent Cost-Sensitive ClassificationPhD Defense - Example-Dependent Cost-Sensitive Classification
PhD Defense - Example-Dependent Cost-Sensitive Classification
 
Analytics - compitiendo en la era de la informacion
Analytics - compitiendo en la era de la informacionAnalytics - compitiendo en la era de la informacion
Analytics - compitiendo en la era de la informacion
 
Modern Data Science
Modern Data ScienceModern Data Science
Modern Data Science
 
Maximizing a churn campaigns profitability with cost sensitive machine learning
Maximizing a churn campaigns profitability with cost sensitive machine learningMaximizing a churn campaigns profitability with cost sensitive machine learning
Maximizing a churn campaigns profitability with cost sensitive machine learning
 
Fraud analytics detección y prevención de fraudes en la era del big data sl...
Fraud analytics detección y prevención de fraudes en la era del big data   sl...Fraud analytics detección y prevención de fraudes en la era del big data   sl...
Fraud analytics detección y prevención de fraudes en la era del big data sl...
 
1609 Fraud Data Science
1609 Fraud Data Science1609 Fraud Data Science
1609 Fraud Data Science
 
2013 credit card fraud detection why theory dosent adjust to practice
2013 credit card fraud detection why theory dosent adjust to practice2013 credit card fraud detection why theory dosent adjust to practice
2013 credit card fraud detection why theory dosent adjust to practice
 
2011 advanced analytics through the credit cycle
2011 advanced analytics through the credit cycle2011 advanced analytics through the credit cycle
2011 advanced analytics through the credit cycle
 
Maximizing a churn campaign’s profitability with cost sensitive predictive an...
Maximizing a churn campaign’s profitability with cost sensitive predictive an...Maximizing a churn campaign’s profitability with cost sensitive predictive an...
Maximizing a churn campaign’s profitability with cost sensitive predictive an...
 
Demystifying machine learning using lime
Demystifying machine learning using limeDemystifying machine learning using lime
Demystifying machine learning using lime
 
Ensembles of example dependent cost-sensitive decision trees slides
Ensembles of example dependent cost-sensitive decision trees slidesEnsembles of example dependent cost-sensitive decision trees slides
Ensembles of example dependent cost-sensitive decision trees slides
 

Similar to Classifying Phishing URLs Using Recurrent Neural Networks

CyberCrime in the Cloud and How to defend Yourself
CyberCrime in the Cloud and How to defend Yourself CyberCrime in the Cloud and How to defend Yourself
CyberCrime in the Cloud and How to defend Yourself
Alert Logic
 

Similar to Classifying Phishing URLs Using Recurrent Neural Networks (20)

BDAS-2017 | Deep Neural Networks Para la Detección de Phishing
BDAS-2017 | Deep Neural Networks Para la Detección de PhishingBDAS-2017 | Deep Neural Networks Para la Detección de Phishing
BDAS-2017 | Deep Neural Networks Para la Detección de Phishing
 
Introduction to Ion – a layer 2 network for Decentralized Identifiers with Bi...
Introduction to Ion – a layer 2 network for Decentralized Identifiers with Bi...Introduction to Ion – a layer 2 network for Decentralized Identifiers with Bi...
Introduction to Ion – a layer 2 network for Decentralized Identifiers with Bi...
 
BLOCKHUNTER.pptx
BLOCKHUNTER.pptxBLOCKHUNTER.pptx
BLOCKHUNTER.pptx
 
SCADA Security: The Five Stages of Cyber Grief
SCADA Security: The Five Stages of Cyber GriefSCADA Security: The Five Stages of Cyber Grief
SCADA Security: The Five Stages of Cyber Grief
 
CyberCrime in the Cloud and How to defend Yourself
CyberCrime in the Cloud and How to defend Yourself CyberCrime in the Cloud and How to defend Yourself
CyberCrime in the Cloud and How to defend Yourself
 
2016 - 10 questions you should answer before building a new microservice
2016 - 10 questions you should answer before building a new microservice2016 - 10 questions you should answer before building a new microservice
2016 - 10 questions you should answer before building a new microservice
 
CLASS 2018 - Palestra de Edgard Capdevielle (Presidente e CEO – Nozomi)
CLASS 2018 - Palestra de Edgard Capdevielle (Presidente e CEO – Nozomi)CLASS 2018 - Palestra de Edgard Capdevielle (Presidente e CEO – Nozomi)
CLASS 2018 - Palestra de Edgard Capdevielle (Presidente e CEO – Nozomi)
 
SCADA Security: The Five Stages of Cyber Grief
SCADA Security: The Five Stages of Cyber GriefSCADA Security: The Five Stages of Cyber Grief
SCADA Security: The Five Stages of Cyber Grief
 
IRJET - Improving Password System using Blockchain
IRJET - Improving Password System using BlockchainIRJET - Improving Password System using Blockchain
IRJET - Improving Password System using Blockchain
 
CLASS 2022 - Marty Edwards (Tenable) - O perigo crescente de ransomware crimi...
CLASS 2022 - Marty Edwards (Tenable) - O perigo crescente de ransomware crimi...CLASS 2022 - Marty Edwards (Tenable) - O perigo crescente de ransomware crimi...
CLASS 2022 - Marty Edwards (Tenable) - O perigo crescente de ransomware crimi...
 
Blockchain on Azure
Blockchain on AzureBlockchain on Azure
Blockchain on Azure
 
IoT meets Big Data
IoT meets Big DataIoT meets Big Data
IoT meets Big Data
 
Mris network architecture proposal r1
Mris network architecture proposal r1Mris network architecture proposal r1
Mris network architecture proposal r1
 
Blockchains and Adult Education
Blockchains and Adult EducationBlockchains and Adult Education
Blockchains and Adult Education
 
Industrial Control System Network Cyber Security Monitoring Solution (SCAB)
Industrial Control System Network Cyber Security Monitoring Solution (SCAB)Industrial Control System Network Cyber Security Monitoring Solution (SCAB)
Industrial Control System Network Cyber Security Monitoring Solution (SCAB)
 
Core intel
Core intelCore intel
Core intel
 
Cyber security
Cyber securityCyber security
Cyber security
 
ING CoreIntel - collect and process network logs across data centers in near ...
ING CoreIntel - collect and process network logs across data centers in near ...ING CoreIntel - collect and process network logs across data centers in near ...
ING CoreIntel - collect and process network logs across data centers in near ...
 
How to measure your security response readiness?
How to measure your security response readiness?How to measure your security response readiness?
How to measure your security response readiness?
 
Blockchain, Finance & Regulatory Development
Blockchain, Finance & Regulatory DevelopmentBlockchain, Finance & Regulatory Development
Blockchain, Finance & Regulatory Development
 

Recently uploaded

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Recently uploaded (20)

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 

Classifying Phishing URLs Using Recurrent Neural Networks

  • 1. Classifying Phishing URLs Using Recurrent Neural Networks Sergio Villegas Javier Vargas *Alejandro Correa Bahnsen Easy Solutions Research Eduardo Contreras Bohorquez Fabio A. Gonzalez MindLab Research Group, Universidad Nacional de Colombia
  • 2. Industry recognition A leading global provider of electronic fraud prevention for financial institutions and enterprise customers 385 customers In 30 countries 100 million Users protected 27+ billion Online connections monitored About Easy Solutions® Easy Solutions to be Acquired by New Joint Venture Creating Global, Secure Infrastructure Company
  • 3. Phishing 3 Phishing is the act of defrauding an online user in order to obtain personal information by posing as a trustworthy institution or entity.
  • 5. Why Phishing Detection is Hard 5 Original Website Only Using Images Subtle Changes
  • 6.
  • 7. Is It Phishing? Ideal Phishing Detection System 7 Machine Learning Algorithm
  • 8. Ideal Phishing Detection System - Issues 8 Issues with full content analysis: • Time consuming • Impractical to process millions of websites per day • Hard to implement for small devices
  • 9. There is always the need for an URL 9
  • 10. Database of URLs 1,000,000 Phishing URLs from PhishTank 10 http://moviesjingle.com/auto/163.com/index.php 1,000,000 Legitimate URLs from Common Crawl http://paypal.com.update.account.toughbook.cl/8a30e847925afc597516 1aeabe8930f1/?cmd=_home&dispatch=d09b78f5812945a73610edf38 http://msystemtech.ru/components/com_users/Italy/zz/Login.php?run= _login-submit&session=68bbd43c854147324d77872062349924 https://www.sanfordhealth.org/ChildrensHealth/Article/73980 http://www.grahamleader.com/ci_25029538/these-are-5-worst-super- bowl-halftime-shows&defid=1634182 http://www.carolinaguesthouse.co.uk/onlinebooking/?industrytype=1& startdate=2013-09-05&nights=2&location&productid=25d47a24-6b74
  • 11. CLASSIFYING PHISHING USING URL LEXICAL AND STATISTICAL FREQUENCIES 11
  • 12. URL Lexical and Statistical Frequencies 12 http://www.papaya.com/secure_login.php URL length Alexa Ranking Path length URL Entropy # of .com Punctuation count TLD count Is IP? Euclidean distance KS & KL distance
  • 13. URL Lexical and Statistical Frequencies 13 http://www.papaya.com/secure_login.php URL length Alexa Ranking Path length URL Entropy # of .com Punctuation count TLD count Is IP? Euclidean distance KS & KL distance Is It Phishing?
  • 14. URL Lexical and Statistical Frequencies 14 3-Fold CV Accuracy Recall Precision Average 93.47% 93.28% 93.64% Deviation 0.01% 0.02% 0.03% Results:
  • 15. URL Lexical and Statistical Frequencies 15 Feature Importance
  • 16. MODELING PHISHING URLS WITH RECURRENT NEURAL NETWORKS 16
  • 17. Normal Neural Network 17 Source: https://en.wikipedia.org/wiki/Artificial_neural_network
  • 18.
  • 19. Recurrent Neural Networks RNN Have loops! 19
  • 20. The Problem of Long-Term Dependencies 20 Short term dependencies are easy long term …
  • 21. Long-Short Term Memory Networks LSTM 21 RNN contains a single layer LSTM contains four interacting layers Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  • 22. Long-Short Term Memory Networks LSTM 22 Key idea: Cell State
  • 23. LSTM Step-by-Step 23 Step 1. Decide what information is going to be used
  • 24. LSTM Step-by-Step 24 Step 2. Which new information is stored
  • 25. LSTM Step-by-Step 25 Step 3. Update old cell state
  • 26. LSTM Step-by-Step 26 Step 4. Make prediction
  • 27. Modeling Architecture for URL Classification 27 URL h t t p : / / w w w . p a p a y a . c o m One hot Encoding … … … … … … … … … … … … … … … … … … … … … Embedding 3.2 1.2 … 1.7 6.4 2.3 … 2.6 6.4 3.0 … 1.7 3.4 2.6 … 3.4 2.6 3.8 … 2.6 3.5 3.2 … 6.4 1.7 4.2 … 6.4 8.6 2.4 … 6.4 4.3 2.9 … 6.4 2.2 3.4 … 3.4 3.2 2.6 … 2.6 4.2 2.2 … 3.5 2.4 3.2 … 1.7 2.9 1.7 … 8.6 3.0 6.4 … 2.6 2.6 6.4 … 3.8 3.8 3.4 … 3.2 3.3 2.6 … 2.2 3.1 2.2 … 2.9 1.8 3.2 … 3.0 2.5 6.4 … 2.6 LSTM LSTM LSTM LSTM Sigmoid …
  • 28. Long-Short Term Memory Networks 28 3-Fold CV Accuracy Recall Precision Average 98.76% 98.93% 98.60% Deviation 0.04% 0.02% 0.02% Results:
  • 29. Models Comparison 29 90% 91% 92% 93% 94% 95% 96% 97% 98% 99% 100% Accuracy Recall Precision Long-Short Term Memory Network Random Forest
  • 30. Models Comparison 30 Model Random Forest Long-Short Term Memory Network Memory Consumption (MB) 289 0.56 Evaluation Time (URLs per sec) 942 281 Training Time (minutes) 2.95 238.7
  • 31. What we learned • Discerning URLs by their patterns is a good predictor of phishing websites • LSTM model shows an overall higher prediction performance without the need of expert knowledge to create the features 31
  • 33. Thank you! Any questions or comments, please let me know. Alejandro Correa Bahnsen, PhD Chief Data Scientist acorrea@easysol.net