SlideShare a Scribd company logo
1 of 40
Phishing Attacks:
Trends, Detection Systems and
Computer Vision as a
Promising Approach
DR. AHMET SELMAN BOZKIR
DEPT. OF COMPUTER ENGINEERING - HACETTEPE UNIVERSITY
SEMINAR
Today
• What is phishing ?
• Facts and current trends
• Types of phishing
• Examples of attack types
• Why the problem of phishing could not be solved yet?
• Phishing detection methods in the literature
• Vision based analysis and various studies, challanges
• What we have done so far? Our vision
• Conclusion
What is Phishing?
• Phishing is a criminal mechanism employing both social
engineering and technical subterfuge to steal consumers’
personal identity data and financial account credentials.
• Social engineering schemes prey on unwary victims by
fooling them into believing they are dealing with a
trusted, legitimate party, such as by using deceptive email
addresses and email messages.
(APWG – Anti-Phishing Working Group )
• Phone phreaking + fishing -> «phishing»
Underlying Truth
• In 350BC, Aristotle noted that “our senses can be trusted
but they can be easily fooled”.
• According to the study written by Richard Gregory claims
that only %20 of our visual perception comes through our
eyes, the remaining part is rely on our inferences.
• Actual Reason : Careless operations
Phish or not?
Phish or not?
A typical life cycle of a phishing campaign
Facts and Current Trends
Facts and Current Trends
• SAAS/Webmail (36%)
• Payment (22%)
• Financial Inst. (18%)
• Other (9%)
• E-commerce, Retail (3%)
• Social Media (3%)
• Cloud Storage, Hosting (3%)
• Telecommunications (3%)
Financial Loss
• BEC * Business Email Compromise
Use of SSL in Phishing Websites
Types of phishing attacks
Typical
phishing
Spear
phishing
Whaling
Less quantity /More profitMore quantity / less profit
Example e-mails for “typical phishing”
1. [1]
Example e-mails for “typical phishing”
1. [1]
Example e-mails for “spear phishing”
1. [1]
An example e-mail of “whaling phishing”
1. [1]
Why the problem of phishing could not be
solved yet?
• Even HighlyTrained Users are Clicking
When reading a hundred emails in the middle of a stressful workday, even the most well-
trained and observant employee will click on a malicious email.
• Phishing Attacks are Increasingly Sophisticated
-Employees are taught to look for typos and poor grammar to identify a text lure, but over
the last year, attackers have improved their spelling and learned to match legitimate
messages.
-More phishing sites are using HTTPS certificates in order to fool users with the green
“secure” icon in the browser that, ironically, users will interpret as ‘safe’.
-DomainSpoofing and Domain Impersonation Is More Sophisticated. the attacker can send
from an authentic Microsoft address
Why the problem of phishing could not be
solved yet?
• Phishing Has Become tooTargeted forTraditional Spam-Type filters
Broad Spam-like Phishing Attacks are EasilyCaught. Targeted, Customized Phishing
Attacks are Hard to Catch and on the Rise: Spear-phishing attacks, especially business email
compromise (BEC), have almost doubled since the beginning of the year, made easier by the
large scale data breaches last year.
• Targeted Attacks Have Become Psychologically More Sophisticated
-Attackers have learned to combine personalized information with a number of effective
motivators
-Fear, urgency, and curiosity were the top motivators in previous years, but they've been
replaced by entertainment, social and reward recognition.
Combatting Methods against Phishing
The URL
The Source
Code (DOM)
The Image/Screenshot
Domain Knowledge
(Web Information)
Classification of Anti-Phishing Methods
Blacklist
Google Safe Browsing
API
Rosiello et al. (2007)
Han et al. (2008)
PDA – Jain&Gupta (2016)
URL
Sahingoz et al. (2017)
CatchPhish – Rao et al.
(2018)
URLNet – Hung et al.
(2018)
PDRCNN –Wang et al.
(2019)
DOM
CANTINA+ (2011)
Marchal et al. (2016)
Buber et al. (2017)
Jain & Gupta (2018)
Visual Similarity
Maurer et al.(2013)
Verilogo (2015)
DeltaPhish (2017)
PhishIRIS - Dalgic et al.
(2018)
Less resource Time consuming / More resource/ Robust to “zero-hour” attacks
The URL
• The Uniform Resource Locator (URL) is the address of any resource,
in which case it is the webpage, inWorldWide Web
• Many researchers use this source of information in their studies to
extract key features to identify a phishing webpage.
• While some of them purpose a solution by using hand-crafted
(lexical) features, the others chose to apply machine learning based
features
Some Phishing URLs
http://www.cnhedge.cn/js/index.htm?http://us.battle.net/login/en/?ref=http://spdfozrus.battle.net/d3/en/index
http://www.arvindudyog.com/bright/bright/drake/bright/45886564bea8a9f07a8055347163a4a3/
http://amcnamibia.com/wp-admin/file/files/db/file.dropbox/
http://www.arvindudyog.com/papa/
http://www.iowasaferoutes.org/wp-content/plugins/wpsecone/dhl/
http://www.imanaforums.com/neomodules/accesst/
http://ausbuildblog.com.au/wp-content/heaven/index.php
http://fengshuireview.com/upload/free.mobile.fr/facturtion/finale/free/
http://searchenginetricks.ca/cam/config/webmail/
http://www.i-robot.kiev.ua/self/dropbox/dropbox/dropbox/
http://www.justaskaron.com/octapharma.com.ca/
http://i-robot.kiev.ua/self/dropbox/dropbox/dropbox/index.php
http://kiltonmotor.com/others/m.i.php?n=1774256418&rand.13inboxlight.aspxn.1774256418&rand=13inboxlightaspxn.1774256
418&username1=&username
http://www.sindhuratna.com/new2015/document.php
http://www.sindhuratna.com/new2015/document.php
http://justaskaron.com/octapharma.com.ca/index.php
http://www.alexsandroleiloes.com.br/admin/beats/verification-folder.php
http://www.vantaiduccuong.com/soutdoc/es/
http://www.pt-tkbi.com/providernet/provider/provider/webmail/securenow/webnet/
http://www.alhadbaa.org/googledrive/
http://www.parfumwangimurah.com/g9/
http://proseind.cl/new/index.php
http://annstringer.com/storagechecker/domain/ii.php
Lexical URL Features
• #dots
• #special characters
• #suffixes
• Length of URL
• Length of the query string
• Subdomain name
• SuspiciousCharacters / Punny code
• TLD Name and its length
• Domain Name
• The depth of the subdomain
• Having a SSL certificate (https)
• ….
Most discriminative 4-grams: chi-square
• “%20(“ :99.35901350685741
• “.log” :155.82961566651434
• “logi “ :1947.7954010788872
• “ogin” :2096.632706999275
• “secu” :895.0781029132113
• “/wp-” :1629.5131963112008
The Source Code
• Consists of HTML DOM, Js and CSS components.
• Used as the main markup directives for layout
information
The source code is no longer applicable!
• Thanks to capabilities of JavaScript and enormous libraries such as
React.js and Angular.js, the web page implementation is changing
from static rendering to dynamic rendering.
• Ajax and dynamic content loading
• Misuse of HTML tags
• Uncountable markupping combination for the same rendering!
• Thus, HTML, CSS or tag similarity are not guaranteed to be source
of evidence!
Phish-Sense
• Introduces fusion of information extracted
from lexical features and various n-gram
models to capture phishing URL patterns
• Chi Square method is selected as the feature
selection!
• 71.250 samples were provided.
• Out-performs all traditional methods by
havingTP rate 98.24% however outperformed
by the Deep Learning methods!
URLNet - 2018
• One of the first published
work based on Deep
Learning methods.
Le, Hung, et al. "URLNet: learning a URL representation with deep learning for malicious URL detection." arXiv preprint arXiv: 1802.03162 (2018).
Visual similarity or vision based analysis?
Logo
Screenshot of
whole page
Image with
Layout
• DOM tree similarity
• Visual features
• CSS Similarity
• Layout Similarity viaVIPS
(Block and overall layout)
Can computer vision help us?
• 47%-83% of the newly found phishing pages are added to lists in 12 hours. Zero
day attacks need pro-active solutions!
• Predefined or handy-crafted heuristics are evaded by attackers
• 23% of the users do not even look at the address bar! (Dhamija et al.)
• Substitution of textual HTML elements with <IMG> or applet like rich internet
application (RIA) contents such as Flash,ActiveX, Silverlight
• Loading of dynamic /AJAX based content, IFRAME
• Different DOM organizations between legitimate and target phishing version.
• Robustness against complex backgrounds or page layouts
• Brand recognition can be done in a holistic manner
• Language and source code independence
• And the most important is vision based solutions are in concordance with human
perception
Challenges related to vision based anti-phishing
• Lack of a well curated dataset
• Vast amount of brands
• High intra-class variations among the phishing samples of brands
• Inconsistent layouts
• Unrelated layouts and
color schemes
• Data leakage which
skews the bias
Phish-Iris Dataset
Publicly available at https://web.cs.hacettepe.edu.tr/~selman/phish-iris-dataset
HOG and MPEG7 like compact visual
descriptors (2016, 2018)
• Based on image global
image similarity via
descriptors
• Process whole webpage’s
screenshot.
• 92% accuracy.
- Bozkir, Ahmet Selman, and EbruAkcapinar Sezer. "Use of HOG descriptors in phishing detection." 2016 4th International Symposium on Digital Forensic and
Security (ISDFS). IEEE, 2016
- F. C. Dalgic,A. S. Bozkir, and M.Aydos, “Phish-iris: A new approach for vision based brand prediction of phishing web pages via compact visual descriptors,” in
Proceedings of the IEEE International Symposium on Multidisciplinary Studies and InnovativeTechnologies (ISMSIT),2018
White-Net (Phishing Website Detection by
Visual Whitelists)
• Consists of three CNNs where they are
structured as Siamese Networks.
• 2 steps in training stage (81% top-1 match)
• Based on FaceNet.
- Sahar Abdelnabi, Katharina Krombholz and Mario Fritz,WhiteNet: Phishing Website Detection byVisualWhitelists, https://arxiv.org/pdf/1909.00300.pdf, 2019
Verilogo : proactive phishing detection via
logo recognition
•SIFT based keypoint matching over 400/200 px stripes
•Pairwise comparison (not scalable)
•6 seconds/image
•352 image dataset
G.Wang et al.,Verilogo: Proactive Phishing Detection via Logo Recognition, 2010
LogoSENSE
•Object detection strategy with Max-Margin Loss
SVM and HOG
•0.04 seconds to analyze onCPU ~(1024*1024 px)
•A special dataset covering 15 brands on 1530
training + 1979 testing images (1000 samples for
legitimate)
Bozkir and Aydos, LogoSENSE: A Companion HOG based Logo Detection Scheme for Phishing Web Page and E-mail Brand Recognition (under revision)
Scr2Seg : Screenshot to Segments by deep learning
•A deep learning semantic segmentation approach to
understand the page layout by just looking at the
screenshot without needing any thing else
•Pixelwise annotated 197 screenshots were collected
•Up to 85% mIOU accuracy has been achieved
•Data collection process in continuing
Conclusion
• Due to capabilities of JavaScript and enormous libraries such as React.js and Angular.js,
the way of web page building is changing from static rendering to dynamic rendering.
Therefore, using HTML and CSS contents in a further solutions may not be feasible as they
used to be.
• Combined with legitimate domain compromise, the SSL is no more an effective evidence
of trust.
• Computer vision based approaches work similar to human perception and gain popularity
for the tasks of both phishing e-mail and web page identification and brand recognition.
• Low FPR is crucial!
• A standard and well curated benchmark dataset is required!
• Incorporation of online learning could be beneficial
• Image understanding and aggregation with URL based features are promising
12.2.2020
THANKS FOR LISTENING

More Related Content

Similar to Phishing Attacks: Trends, Detection Systems and Computer Vision as a Promising Approach

AI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4jAI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4jIvan Zoratti
 
How would AI shape Future Integrations?
How would AI shape Future Integrations?How would AI shape Future Integrations?
How would AI shape Future Integrations?Srinath Perera
 
Phishing Attacks: A Challenge Ahead
Phishing Attacks: A Challenge AheadPhishing Attacks: A Challenge Ahead
Phishing Attacks: A Challenge AheadeLearning Papers
 
Basic SEO by Andrea H. Berberich @webpresenceopti
Basic SEO by Andrea H. Berberich @webpresenceoptiBasic SEO by Andrea H. Berberich @webpresenceopti
Basic SEO by Andrea H. Berberich @webpresenceoptiAndrea Berberich
 
Information Security Risk Management
Information Security Risk ManagementInformation Security Risk Management
Information Security Risk Managementipspat
 
API World 2019 Presentation on Securing sensitive data through APIs and AI pa...
API World 2019 Presentation on Securing sensitive data through APIs and AI pa...API World 2019 Presentation on Securing sensitive data through APIs and AI pa...
API World 2019 Presentation on Securing sensitive data through APIs and AI pa...dsapps
 
From semantic platforms to semantic apps
From semantic platforms to semantic appsFrom semantic platforms to semantic apps
From semantic platforms to semantic appsscroisier
 
Splunk workshop-Threat Hunting
Splunk workshop-Threat HuntingSplunk workshop-Threat Hunting
Splunk workshop-Threat HuntingSplunk
 
Eliminate the 49% of Documents that Contain Data Breaches Webinar
Eliminate the 49% of Documents that Contain Data Breaches WebinarEliminate the 49% of Documents that Contain Data Breaches Webinar
Eliminate the 49% of Documents that Contain Data Breaches WebinarConcept Searching, Inc
 
Network Security and Spoofing Attacks
Network Security and Spoofing AttacksNetwork Security and Spoofing Attacks
Network Security and Spoofing AttacksPECB
 
A Hybrid Approach For Phishing Website Detection Using Machine Learning.
A Hybrid Approach For Phishing Website Detection Using Machine Learning.A Hybrid Approach For Phishing Website Detection Using Machine Learning.
A Hybrid Approach For Phishing Website Detection Using Machine Learning.vivatechijri
 
Threat Hunting
Threat HuntingThreat Hunting
Threat HuntingSplunk
 
Phishing Detection using Decision Tree Model
Phishing Detection using Decision Tree ModelPhishing Detection using Decision Tree Model
Phishing Detection using Decision Tree ModelIRJET Journal
 
Splunk Threat Hunting Workshop
Splunk Threat Hunting WorkshopSplunk Threat Hunting Workshop
Splunk Threat Hunting WorkshopSplunk
 
Hacking hired [Forecasting 2021] Jan 2021
Hacking hired [Forecasting 2021] Jan 2021Hacking hired [Forecasting 2021] Jan 2021
Hacking hired [Forecasting 2021] Jan 2021Rachel Harpley
 
Artificial intelligence presentation slides.pptx
Artificial intelligence presentation slides.pptxArtificial intelligence presentation slides.pptx
Artificial intelligence presentation slides.pptxrakhicse
 
Artificial Intelligence disruption: How technologies are predicted to change ...
Artificial Intelligence disruption: How technologies are predicted to change ...Artificial Intelligence disruption: How technologies are predicted to change ...
Artificial Intelligence disruption: How technologies are predicted to change ...LinkedIn Talent Solutions
 
AI in Talent Acquisition - Talent Connect 2017
AI in Talent Acquisition - Talent Connect 2017AI in Talent Acquisition - Talent Connect 2017
AI in Talent Acquisition - Talent Connect 2017Przemek Berendt
 
Threat Hunting Workshop
Threat Hunting WorkshopThreat Hunting Workshop
Threat Hunting WorkshopSplunk
 
Next Generation Internet
Next Generation InternetNext Generation Internet
Next Generation InternetSabiha M
 

Similar to Phishing Attacks: Trends, Detection Systems and Computer Vision as a Promising Approach (20)

AI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4jAI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
 
How would AI shape Future Integrations?
How would AI shape Future Integrations?How would AI shape Future Integrations?
How would AI shape Future Integrations?
 
Phishing Attacks: A Challenge Ahead
Phishing Attacks: A Challenge AheadPhishing Attacks: A Challenge Ahead
Phishing Attacks: A Challenge Ahead
 
Basic SEO by Andrea H. Berberich @webpresenceopti
Basic SEO by Andrea H. Berberich @webpresenceoptiBasic SEO by Andrea H. Berberich @webpresenceopti
Basic SEO by Andrea H. Berberich @webpresenceopti
 
Information Security Risk Management
Information Security Risk ManagementInformation Security Risk Management
Information Security Risk Management
 
API World 2019 Presentation on Securing sensitive data through APIs and AI pa...
API World 2019 Presentation on Securing sensitive data through APIs and AI pa...API World 2019 Presentation on Securing sensitive data through APIs and AI pa...
API World 2019 Presentation on Securing sensitive data through APIs and AI pa...
 
From semantic platforms to semantic apps
From semantic platforms to semantic appsFrom semantic platforms to semantic apps
From semantic platforms to semantic apps
 
Splunk workshop-Threat Hunting
Splunk workshop-Threat HuntingSplunk workshop-Threat Hunting
Splunk workshop-Threat Hunting
 
Eliminate the 49% of Documents that Contain Data Breaches Webinar
Eliminate the 49% of Documents that Contain Data Breaches WebinarEliminate the 49% of Documents that Contain Data Breaches Webinar
Eliminate the 49% of Documents that Contain Data Breaches Webinar
 
Network Security and Spoofing Attacks
Network Security and Spoofing AttacksNetwork Security and Spoofing Attacks
Network Security and Spoofing Attacks
 
A Hybrid Approach For Phishing Website Detection Using Machine Learning.
A Hybrid Approach For Phishing Website Detection Using Machine Learning.A Hybrid Approach For Phishing Website Detection Using Machine Learning.
A Hybrid Approach For Phishing Website Detection Using Machine Learning.
 
Threat Hunting
Threat HuntingThreat Hunting
Threat Hunting
 
Phishing Detection using Decision Tree Model
Phishing Detection using Decision Tree ModelPhishing Detection using Decision Tree Model
Phishing Detection using Decision Tree Model
 
Splunk Threat Hunting Workshop
Splunk Threat Hunting WorkshopSplunk Threat Hunting Workshop
Splunk Threat Hunting Workshop
 
Hacking hired [Forecasting 2021] Jan 2021
Hacking hired [Forecasting 2021] Jan 2021Hacking hired [Forecasting 2021] Jan 2021
Hacking hired [Forecasting 2021] Jan 2021
 
Artificial intelligence presentation slides.pptx
Artificial intelligence presentation slides.pptxArtificial intelligence presentation slides.pptx
Artificial intelligence presentation slides.pptx
 
Artificial Intelligence disruption: How technologies are predicted to change ...
Artificial Intelligence disruption: How technologies are predicted to change ...Artificial Intelligence disruption: How technologies are predicted to change ...
Artificial Intelligence disruption: How technologies are predicted to change ...
 
AI in Talent Acquisition - Talent Connect 2017
AI in Talent Acquisition - Talent Connect 2017AI in Talent Acquisition - Talent Connect 2017
AI in Talent Acquisition - Talent Connect 2017
 
Threat Hunting Workshop
Threat Hunting WorkshopThreat Hunting Workshop
Threat Hunting Workshop
 
Next Generation Internet
Next Generation InternetNext Generation Internet
Next Generation Internet
 

More from Selman Bozkır

23--Web-Design-Principles
23--Web-Design-Principles23--Web-Design-Principles
23--Web-Design-PrinciplesSelman Bozkır
 
Kötücül Yazılımların Tanınmasında Evrişimsel Sinir Ağlarının Kullanımı ve Kar...
Kötücül Yazılımların Tanınmasında Evrişimsel Sinir Ağlarının Kullanımı ve Kar...Kötücül Yazılımların Tanınmasında Evrişimsel Sinir Ağlarının Kullanımı ve Kar...
Kötücül Yazılımların Tanınmasında Evrişimsel Sinir Ağlarının Kullanımı ve Kar...Selman Bozkır
 
ADEM: An Online Decision Tree Based Menu Demand Prediction Tool for Food Courts
ADEM: An Online Decision Tree Based Menu Demand Prediction Tool for Food CourtsADEM: An Online Decision Tree Based Menu Demand Prediction Tool for Food Courts
ADEM: An Online Decision Tree Based Menu Demand Prediction Tool for Food CourtsSelman Bozkır
 
Measurement and metrics in model driven software development
Measurement and metrics in model driven software developmentMeasurement and metrics in model driven software development
Measurement and metrics in model driven software developmentSelman Bozkır
 
Probabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsProbabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsSelman Bozkır
 
SHOE (simple html ontology extensions)
SHOE (simple html ontology extensions)SHOE (simple html ontology extensions)
SHOE (simple html ontology extensions)Selman Bozkır
 
Predicting food demand in food courts by decision tree approaches
Predicting food demand in food courts by decision tree approachesPredicting food demand in food courts by decision tree approaches
Predicting food demand in food courts by decision tree approachesSelman Bozkır
 
Identification of User Patterns in Social Networks by Data Mining Techniques:...
Identification of User Patterns in Social Networks by Data Mining Techniques:...Identification of User Patterns in Social Networks by Data Mining Techniques:...
Identification of User Patterns in Social Networks by Data Mining Techniques:...Selman Bozkır
 
FUAT – A Fuzzy Clustering Analysis Tool
FUAT – A Fuzzy Clustering Analysis ToolFUAT – A Fuzzy Clustering Analysis Tool
FUAT – A Fuzzy Clustering Analysis ToolSelman Bozkır
 
Data mining & Decison Trees
Data mining & Decison TreesData mining & Decison Trees
Data mining & Decison TreesSelman Bozkır
 

More from Selman Bozkır (13)

lecture_07.pptx
lecture_07.pptxlecture_07.pptx
lecture_07.pptx
 
23--Web-Design-Principles
23--Web-Design-Principles23--Web-Design-Principles
23--Web-Design-Principles
 
Kötücül Yazılımların Tanınmasında Evrişimsel Sinir Ağlarının Kullanımı ve Kar...
Kötücül Yazılımların Tanınmasında Evrişimsel Sinir Ağlarının Kullanımı ve Kar...Kötücül Yazılımların Tanınmasında Evrişimsel Sinir Ağlarının Kullanımı ve Kar...
Kötücül Yazılımların Tanınmasında Evrişimsel Sinir Ağlarının Kullanımı ve Kar...
 
ADEM: An Online Decision Tree Based Menu Demand Prediction Tool for Food Courts
ADEM: An Online Decision Tree Based Menu Demand Prediction Tool for Food CourtsADEM: An Online Decision Tree Based Menu Demand Prediction Tool for Food Courts
ADEM: An Online Decision Tree Based Menu Demand Prediction Tool for Food Courts
 
Measurement and metrics in model driven software development
Measurement and metrics in model driven software developmentMeasurement and metrics in model driven software development
Measurement and metrics in model driven software development
 
UML ile Modelleme
UML ile ModellemeUML ile Modelleme
UML ile Modelleme
 
Hopfield Ağı
Hopfield AğıHopfield Ağı
Hopfield Ağı
 
Probabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsProbabilistic information retrieval models & systems
Probabilistic information retrieval models & systems
 
SHOE (simple html ontology extensions)
SHOE (simple html ontology extensions)SHOE (simple html ontology extensions)
SHOE (simple html ontology extensions)
 
Predicting food demand in food courts by decision tree approaches
Predicting food demand in food courts by decision tree approachesPredicting food demand in food courts by decision tree approaches
Predicting food demand in food courts by decision tree approaches
 
Identification of User Patterns in Social Networks by Data Mining Techniques:...
Identification of User Patterns in Social Networks by Data Mining Techniques:...Identification of User Patterns in Social Networks by Data Mining Techniques:...
Identification of User Patterns in Social Networks by Data Mining Techniques:...
 
FUAT – A Fuzzy Clustering Analysis Tool
FUAT – A Fuzzy Clustering Analysis ToolFUAT – A Fuzzy Clustering Analysis Tool
FUAT – A Fuzzy Clustering Analysis Tool
 
Data mining & Decison Trees
Data mining & Decison TreesData mining & Decison Trees
Data mining & Decison Trees
 

Recently uploaded

Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxvipinkmenon1
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .Satyam Kumar
 

Recently uploaded (20)

Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptx
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 

Phishing Attacks: Trends, Detection Systems and Computer Vision as a Promising Approach

  • 1. Phishing Attacks: Trends, Detection Systems and Computer Vision as a Promising Approach DR. AHMET SELMAN BOZKIR DEPT. OF COMPUTER ENGINEERING - HACETTEPE UNIVERSITY SEMINAR
  • 2. Today • What is phishing ? • Facts and current trends • Types of phishing • Examples of attack types • Why the problem of phishing could not be solved yet? • Phishing detection methods in the literature • Vision based analysis and various studies, challanges • What we have done so far? Our vision • Conclusion
  • 3. What is Phishing? • Phishing is a criminal mechanism employing both social engineering and technical subterfuge to steal consumers’ personal identity data and financial account credentials. • Social engineering schemes prey on unwary victims by fooling them into believing they are dealing with a trusted, legitimate party, such as by using deceptive email addresses and email messages. (APWG – Anti-Phishing Working Group ) • Phone phreaking + fishing -> «phishing»
  • 4. Underlying Truth • In 350BC, Aristotle noted that “our senses can be trusted but they can be easily fooled”. • According to the study written by Richard Gregory claims that only %20 of our visual perception comes through our eyes, the remaining part is rely on our inferences. • Actual Reason : Careless operations
  • 7. A typical life cycle of a phishing campaign
  • 9. Facts and Current Trends • SAAS/Webmail (36%) • Payment (22%) • Financial Inst. (18%) • Other (9%) • E-commerce, Retail (3%) • Social Media (3%) • Cloud Storage, Hosting (3%) • Telecommunications (3%)
  • 10. Financial Loss • BEC * Business Email Compromise
  • 11. Use of SSL in Phishing Websites
  • 12. Types of phishing attacks Typical phishing Spear phishing Whaling Less quantity /More profitMore quantity / less profit
  • 13. Example e-mails for “typical phishing” 1. [1]
  • 14. Example e-mails for “typical phishing” 1. [1]
  • 15. Example e-mails for “spear phishing” 1. [1]
  • 16. An example e-mail of “whaling phishing” 1. [1]
  • 17. Why the problem of phishing could not be solved yet? • Even HighlyTrained Users are Clicking When reading a hundred emails in the middle of a stressful workday, even the most well- trained and observant employee will click on a malicious email. • Phishing Attacks are Increasingly Sophisticated -Employees are taught to look for typos and poor grammar to identify a text lure, but over the last year, attackers have improved their spelling and learned to match legitimate messages. -More phishing sites are using HTTPS certificates in order to fool users with the green “secure” icon in the browser that, ironically, users will interpret as ‘safe’. -DomainSpoofing and Domain Impersonation Is More Sophisticated. the attacker can send from an authentic Microsoft address
  • 18. Why the problem of phishing could not be solved yet? • Phishing Has Become tooTargeted forTraditional Spam-Type filters Broad Spam-like Phishing Attacks are EasilyCaught. Targeted, Customized Phishing Attacks are Hard to Catch and on the Rise: Spear-phishing attacks, especially business email compromise (BEC), have almost doubled since the beginning of the year, made easier by the large scale data breaches last year. • Targeted Attacks Have Become Psychologically More Sophisticated -Attackers have learned to combine personalized information with a number of effective motivators -Fear, urgency, and curiosity were the top motivators in previous years, but they've been replaced by entertainment, social and reward recognition.
  • 19. Combatting Methods against Phishing The URL The Source Code (DOM) The Image/Screenshot Domain Knowledge (Web Information)
  • 20. Classification of Anti-Phishing Methods Blacklist Google Safe Browsing API Rosiello et al. (2007) Han et al. (2008) PDA – Jain&Gupta (2016) URL Sahingoz et al. (2017) CatchPhish – Rao et al. (2018) URLNet – Hung et al. (2018) PDRCNN –Wang et al. (2019) DOM CANTINA+ (2011) Marchal et al. (2016) Buber et al. (2017) Jain & Gupta (2018) Visual Similarity Maurer et al.(2013) Verilogo (2015) DeltaPhish (2017) PhishIRIS - Dalgic et al. (2018) Less resource Time consuming / More resource/ Robust to “zero-hour” attacks
  • 21. The URL • The Uniform Resource Locator (URL) is the address of any resource, in which case it is the webpage, inWorldWide Web • Many researchers use this source of information in their studies to extract key features to identify a phishing webpage. • While some of them purpose a solution by using hand-crafted (lexical) features, the others chose to apply machine learning based features
  • 22. Some Phishing URLs http://www.cnhedge.cn/js/index.htm?http://us.battle.net/login/en/?ref=http://spdfozrus.battle.net/d3/en/index http://www.arvindudyog.com/bright/bright/drake/bright/45886564bea8a9f07a8055347163a4a3/ http://amcnamibia.com/wp-admin/file/files/db/file.dropbox/ http://www.arvindudyog.com/papa/ http://www.iowasaferoutes.org/wp-content/plugins/wpsecone/dhl/ http://www.imanaforums.com/neomodules/accesst/ http://ausbuildblog.com.au/wp-content/heaven/index.php http://fengshuireview.com/upload/free.mobile.fr/facturtion/finale/free/ http://searchenginetricks.ca/cam/config/webmail/ http://www.i-robot.kiev.ua/self/dropbox/dropbox/dropbox/ http://www.justaskaron.com/octapharma.com.ca/ http://i-robot.kiev.ua/self/dropbox/dropbox/dropbox/index.php http://kiltonmotor.com/others/m.i.php?n=1774256418&amp;rand.13inboxlight.aspxn.1774256418&amp;rand=13inboxlightaspxn.1774256 418&amp;username1=&amp;username http://www.sindhuratna.com/new2015/document.php http://www.sindhuratna.com/new2015/document.php http://justaskaron.com/octapharma.com.ca/index.php http://www.alexsandroleiloes.com.br/admin/beats/verification-folder.php http://www.vantaiduccuong.com/soutdoc/es/ http://www.pt-tkbi.com/providernet/provider/provider/webmail/securenow/webnet/ http://www.alhadbaa.org/googledrive/ http://www.parfumwangimurah.com/g9/ http://proseind.cl/new/index.php http://annstringer.com/storagechecker/domain/ii.php
  • 23. Lexical URL Features • #dots • #special characters • #suffixes • Length of URL • Length of the query string • Subdomain name • SuspiciousCharacters / Punny code • TLD Name and its length • Domain Name • The depth of the subdomain • Having a SSL certificate (https) • ….
  • 24. Most discriminative 4-grams: chi-square • “%20(“ :99.35901350685741 • “.log” :155.82961566651434 • “logi “ :1947.7954010788872 • “ogin” :2096.632706999275 • “secu” :895.0781029132113 • “/wp-” :1629.5131963112008
  • 25. The Source Code • Consists of HTML DOM, Js and CSS components. • Used as the main markup directives for layout information
  • 26. The source code is no longer applicable! • Thanks to capabilities of JavaScript and enormous libraries such as React.js and Angular.js, the web page implementation is changing from static rendering to dynamic rendering. • Ajax and dynamic content loading • Misuse of HTML tags • Uncountable markupping combination for the same rendering! • Thus, HTML, CSS or tag similarity are not guaranteed to be source of evidence!
  • 27. Phish-Sense • Introduces fusion of information extracted from lexical features and various n-gram models to capture phishing URL patterns • Chi Square method is selected as the feature selection! • 71.250 samples were provided. • Out-performs all traditional methods by havingTP rate 98.24% however outperformed by the Deep Learning methods!
  • 28. URLNet - 2018 • One of the first published work based on Deep Learning methods. Le, Hung, et al. "URLNet: learning a URL representation with deep learning for malicious URL detection." arXiv preprint arXiv: 1802.03162 (2018).
  • 29. Visual similarity or vision based analysis? Logo Screenshot of whole page Image with Layout • DOM tree similarity • Visual features • CSS Similarity • Layout Similarity viaVIPS (Block and overall layout)
  • 30. Can computer vision help us? • 47%-83% of the newly found phishing pages are added to lists in 12 hours. Zero day attacks need pro-active solutions! • Predefined or handy-crafted heuristics are evaded by attackers • 23% of the users do not even look at the address bar! (Dhamija et al.) • Substitution of textual HTML elements with <IMG> or applet like rich internet application (RIA) contents such as Flash,ActiveX, Silverlight • Loading of dynamic /AJAX based content, IFRAME • Different DOM organizations between legitimate and target phishing version. • Robustness against complex backgrounds or page layouts • Brand recognition can be done in a holistic manner • Language and source code independence • And the most important is vision based solutions are in concordance with human perception
  • 31. Challenges related to vision based anti-phishing • Lack of a well curated dataset • Vast amount of brands • High intra-class variations among the phishing samples of brands • Inconsistent layouts • Unrelated layouts and color schemes • Data leakage which skews the bias
  • 32. Phish-Iris Dataset Publicly available at https://web.cs.hacettepe.edu.tr/~selman/phish-iris-dataset
  • 33. HOG and MPEG7 like compact visual descriptors (2016, 2018) • Based on image global image similarity via descriptors • Process whole webpage’s screenshot. • 92% accuracy. - Bozkir, Ahmet Selman, and EbruAkcapinar Sezer. "Use of HOG descriptors in phishing detection." 2016 4th International Symposium on Digital Forensic and Security (ISDFS). IEEE, 2016 - F. C. Dalgic,A. S. Bozkir, and M.Aydos, “Phish-iris: A new approach for vision based brand prediction of phishing web pages via compact visual descriptors,” in Proceedings of the IEEE International Symposium on Multidisciplinary Studies and InnovativeTechnologies (ISMSIT),2018
  • 34. White-Net (Phishing Website Detection by Visual Whitelists) • Consists of three CNNs where they are structured as Siamese Networks. • 2 steps in training stage (81% top-1 match) • Based on FaceNet. - Sahar Abdelnabi, Katharina Krombholz and Mario Fritz,WhiteNet: Phishing Website Detection byVisualWhitelists, https://arxiv.org/pdf/1909.00300.pdf, 2019
  • 35. Verilogo : proactive phishing detection via logo recognition •SIFT based keypoint matching over 400/200 px stripes •Pairwise comparison (not scalable) •6 seconds/image •352 image dataset G.Wang et al.,Verilogo: Proactive Phishing Detection via Logo Recognition, 2010
  • 36. LogoSENSE •Object detection strategy with Max-Margin Loss SVM and HOG •0.04 seconds to analyze onCPU ~(1024*1024 px) •A special dataset covering 15 brands on 1530 training + 1979 testing images (1000 samples for legitimate) Bozkir and Aydos, LogoSENSE: A Companion HOG based Logo Detection Scheme for Phishing Web Page and E-mail Brand Recognition (under revision)
  • 37. Scr2Seg : Screenshot to Segments by deep learning •A deep learning semantic segmentation approach to understand the page layout by just looking at the screenshot without needing any thing else •Pixelwise annotated 197 screenshots were collected •Up to 85% mIOU accuracy has been achieved •Data collection process in continuing
  • 38.
  • 39. Conclusion • Due to capabilities of JavaScript and enormous libraries such as React.js and Angular.js, the way of web page building is changing from static rendering to dynamic rendering. Therefore, using HTML and CSS contents in a further solutions may not be feasible as they used to be. • Combined with legitimate domain compromise, the SSL is no more an effective evidence of trust. • Computer vision based approaches work similar to human perception and gain popularity for the tasks of both phishing e-mail and web page identification and brand recognition. • Low FPR is crucial! • A standard and well curated benchmark dataset is required! • Incorporation of online learning could be beneficial • Image understanding and aggregation with URL based features are promising