SlideShare a Scribd company logo
THE FIFTH ELEPHANT
- ARJUN B.M.
MUDPIPE
MaliciousURLDetectionfor
PhishingIdentificationandPrevention
PHISHING INTRODUCTION
The fraudulent practice of sending emails purporting to be from reputable
companies in order to induce individuals to reveal personal information, such
as passwords and credit card numbers
MOTIVES: Financial gain, damage reputation, identity theft, fame & notoriety
Phishing websites indicators:
• Visually appears like the original website
• Email creates a sense of urgency to force user action
• Fake HTTPS certificate & domain name
• Provides attractive offers which tempts the user to respond
PROBLEM STATEMENT
For Employees:
• Accessing malicious sites by being victims of phishing emails
• No mechanism to check bad sites by employees through self-service
• Lack of awareness and training for employees
For Security Teams
• Manual time & effort spent to block sites by Security Operations team
• Lack of internal ML solution insights on phishing data; current solutions maybe rule-based
• Different teams/networks may have different requirements for site access, which cannot be
served by external commercial solutions
For Business:
• 91% of all cyber-attacks are via phishing and they have devastating consequences
• Licensing cost for commercial based solutions to detect phishing sites
• Dependency on external solution/product
APPROACH &METHODOLOGY
MACHINE LEARNING APPROACH
• Data Collection & Validation
• Parameter Determination (Address Bar based
Features, HTML and JavaScript based Features, Domain
based Features, Abnormal Based Features, URL Blacklist
Features)
• Feature Extraction from unknown, incoming
data (test data)
• Create baseline model with initial dataset
• Evaluate performance of model and fine-tuning
• Apply test data on pre-trained baseline model &
make prediction
• Compare with known data sources & further
fine-tune results
• Retrain model on frequent intervals for better
accuracy, context and relevancy
• Classification model pickled and exposed as
REST API
WHITELIST / BLACKLIST APPROACH
• Identify data sources which provide
info on phishing sites
• Scrape data from data sources
• Create whitelist / blacklist and
compare URLs
CONS
• Lack of updated data sources
• Lack of real-time intelligence
• Data not comprehensive enough
• Extensive effort for data scraping
RULE BASED APPROACH
• Determine phishing indicators
• Define rules using combination of
indicators
• Compare & match URLs against
rules to deny/allow
CONS
• Complex rule set definitions
• Overhead in managing and
updating rules
• High False Positive and False
Negative rates
ARCHITECTURE /WORKFLOW
BASELINE
DATA SET
BASELINE
ML MODEL
PREDICTION
TRAIN
EXPOSE AS
REST API
FEATURE
EXTRACTION
TEST DATA
OUTPUT
RETRAIN
WEB TRAFFIC
(UNKNOWN DATA)
INPUT
• Address Bar based Features
• HTML and JavaScript based
Features
• Domain based Features
• Abnormal Based Features
• URL Blacklist Features
• Total of 30 features
• CLASSIFICATION = 0: LEGITIMATE
• CLASSIFICATION = 1: PHISHING
• COMPARE WITH KNOWN SOURCES
• PROBABILITY OF PREDICTION
SECURITY
ACTION
(BLOCK / ALLOW)
FEATURE EXTRACTION
1. having_IP_Address
2. URL_Length
3. Shortening_Service
4. having_At_Symbol
5. double_slash_redirecting
6. Prefix_Suffix
7. having_Sub_Domain
8. SSL_State
25. DNS_Record
26. web_traffic_rank
27. Page_Rank
28. Google_Index
29. Links_pointing_to_page
30. Statistical_report - top
phishing domains
Classification output: 0 = legitimate, 1 = phishing
9. Domain_registeration_length
10. Favicon
11. Open_ports
12. HTTPS_token_in_URL
13. Request_URL
14. URL_of_Anchor
15. Links_in_tags
16. Server_Form_Handler
17. Submitting_to_email
18. Abnormal_URL
19. Site_Redirect
20. on_mouseover_changes
21. RightClick_Disabled
22. popUpWindow
23. Iframe_redirection
24. age_of_domain
DEPLOYING TOPRODUCTION
• Context specific use-cases:
• Certain sub-nets within the org might require access to certain websites to support business functionality
• Org might want to block access to sites even though they are classified as “suspicious” by commercial softwares
• Infrastructural & capacity planning considerations: client, load balancer, web server, queues, etc
• REST-API approach: train, retrain & predictions
• Develop automation test cases for your model (especially on feature engineering side)
• Automate evaluation of the production model, which allows to efficiently back-test changes to the model on historical data and determine if
improvements have been made or not
• Possibly have different ML models / end-points exposed for different sections of the network or for different departments
• Have a fall-back or set-default-value for parameters which fail to get processed by the Feature Engineering module (exception handling)
• Decouple the input and the output for the model; model should still work if parameters are added, modified or deleted in feature engineering
• Single egress point for web traffic, where the ML model can be plugged-in with the REST API
• Have a fail-open or kill-switch mechanism for traffic to flow through if model processing fails
• Place model operation in “monitoring” or “non-blocking” mode initially, which allows the ML model to get additional data and allows for fine-
tuning and prevents errors
• Supplement with existing controls like spam filtering, black-listing, etc
• Model should refer to other data sources as well for fine-tuning in the initial stages
• Baselining and retraining the model at frequent internals; also maintaining model versions
• Provide security analysts with an option to tweak/edit input data for contextual representation
• Deploying the MODEL client-side versus server-side
PROS&CONS
PROS
• Reduce dependency, cost & license on third-party external software
• Re-use of in-house org’s data rather than contribute towards improving
commercial software
• Better insights into online behavior of employees
• Real-time protection for employees who access malicious websites or
click on phishing links
• Detect and prevent against unknown phishing attacks, as new patterns
are created by attackers
• Next level of intelligence on top of signature-based prevention
techniques & blacklists
• Email filtering solutions help in filtering phishing/spam emails, but this
provides holistic protection for all outgoing internet traffic
• Centralized solution implemented org-wide and no dependency on client-
side agents/software
• Anti-phishing: move from offline to real-time; move from reactive to
proactive
CONS
• Data collection & building data repository
• Initial baseline dataset has too few records
• Cost / Maintenance of solution/product
• Fine-tuning of rules & predictions to meet
changing threat vectors
• False positive rate could cause bad user
experience
• Needs to be supplemented with Cyber Threat
Intel
• Solution works only when users are connected to
org network, since there is no client-side agent
AUDIENCE TAKE-AWAYS
• Opportunity for engineers and analysts to collaborate and work
together to build tailored intelligent security solutions / products
• Learn the various considerations in designing and deploying a
ML solution in the InfoSec domain
EMAIL : arjun.job14@gmail.com
LINKEDIN: https://www.linkedin.com/in/arjunbm
FURTHERREADING
LINKS & REFERENCES
• https://www.researchgate.net/publication/226420039_Detection_of_Phishing_Attacks_
A_Machine_Learning_Approach
• https://ieeexplore.ieee.org/document/8004877
• https://pdfs.semanticscholar.org/188f/3bde688d5a47ce86bc0a8eca03aeb1bb9dfc.pdf

More Related Content

What's hot

Introduction to penetration testing
Introduction to penetration testingIntroduction to penetration testing
Introduction to penetration testing
Amine SAIGHI
 
Detection of Phishing Websites
Detection of Phishing Websites Detection of Phishing Websites
Detection of Phishing Websites
Nikhil Soni
 
Phishing Detection using Machine Learning
Phishing Detection using Machine LearningPhishing Detection using Machine Learning
Phishing Detection using Machine Learning
Arjun BM
 
Meet the hackers powering the world's best bug bounty programs
Meet the hackers powering the world's best bug bounty programsMeet the hackers powering the world's best bug bounty programs
Meet the hackers powering the world's best bug bounty programs
HackerOne
 
OWASP Top 10 Proactive Controls
OWASP Top 10 Proactive ControlsOWASP Top 10 Proactive Controls
OWASP Top 10 Proactive Controls
Katy Anton
 
Security testing
Security testingSecurity testing
Security testing
Khizra Sammad
 
NETWORK PENETRATION TESTING
NETWORK PENETRATION TESTINGNETWORK PENETRATION TESTING
NETWORK PENETRATION TESTING
Er Vivek Rana
 
Ethical Hacking
Ethical HackingEthical Hacking
Ethical Hacking
Jakub Ruzicka
 
Burp Suite v1.1 Introduction
Burp Suite v1.1 IntroductionBurp Suite v1.1 Introduction
Burp Suite v1.1 Introduction
Ashraf Bashir
 
Introduction to Web Application Penetration Testing
Introduction to Web Application Penetration TestingIntroduction to Web Application Penetration Testing
Introduction to Web Application Penetration Testing
Anurag Srivastava
 
Web Application Penetration Testing
Web Application Penetration Testing Web Application Penetration Testing
Web Application Penetration Testing
Priyanka Aash
 
MITRE ATT&CK framework
MITRE ATT&CK frameworkMITRE ATT&CK framework
MITRE ATT&CK framework
Bhushan Gurav
 
Delivering User Behavior Analytics at Apache Hadoop Scale : A new perspective...
Delivering User Behavior Analytics at Apache Hadoop Scale : A new perspective...Delivering User Behavior Analytics at Apache Hadoop Scale : A new perspective...
Delivering User Behavior Analytics at Apache Hadoop Scale : A new perspective...
Cloudera, Inc.
 
User Behavior Analytics Using Machine Learning
User Behavior Analytics Using Machine LearningUser Behavior Analytics Using Machine Learning
User Behavior Analytics Using Machine Learning
DNIF
 
Ransomware (Fidye Yazılımları) ve Fidye Pazarlıkları.pdf
Ransomware (Fidye Yazılımları) ve Fidye Pazarlıkları.pdfRansomware (Fidye Yazılımları) ve Fidye Pazarlıkları.pdf
Ransomware (Fidye Yazılımları) ve Fidye Pazarlıkları.pdf
Mahir Çayan Karakaya
 
8.8 Las Vegas - Adversary Emulation con C2 Matrix
8.8 Las Vegas - Adversary Emulation con C2 Matrix8.8 Las Vegas - Adversary Emulation con C2 Matrix
8.8 Las Vegas - Adversary Emulation con C2 Matrix
Jorge Orchilles
 
Phishing detection & protection scheme
Phishing detection & protection schemePhishing detection & protection scheme
Phishing detection & protection scheme
Mussavir Shaikh
 
Bug Bounty 101
Bug Bounty 101Bug Bounty 101
Bug Bounty 101
Shahee Mirza
 
Web application security
Web application securityWeb application security
Web application security
Kapil Sharma
 

What's hot (20)

Introduction to penetration testing
Introduction to penetration testingIntroduction to penetration testing
Introduction to penetration testing
 
Detection of Phishing Websites
Detection of Phishing Websites Detection of Phishing Websites
Detection of Phishing Websites
 
Phishing Detection using Machine Learning
Phishing Detection using Machine LearningPhishing Detection using Machine Learning
Phishing Detection using Machine Learning
 
Meet the hackers powering the world's best bug bounty programs
Meet the hackers powering the world's best bug bounty programsMeet the hackers powering the world's best bug bounty programs
Meet the hackers powering the world's best bug bounty programs
 
OWASP Top 10 Proactive Controls
OWASP Top 10 Proactive ControlsOWASP Top 10 Proactive Controls
OWASP Top 10 Proactive Controls
 
Reconnaissance
ReconnaissanceReconnaissance
Reconnaissance
 
Security testing
Security testingSecurity testing
Security testing
 
NETWORK PENETRATION TESTING
NETWORK PENETRATION TESTINGNETWORK PENETRATION TESTING
NETWORK PENETRATION TESTING
 
Ethical Hacking
Ethical HackingEthical Hacking
Ethical Hacking
 
Burp Suite v1.1 Introduction
Burp Suite v1.1 IntroductionBurp Suite v1.1 Introduction
Burp Suite v1.1 Introduction
 
Introduction to Web Application Penetration Testing
Introduction to Web Application Penetration TestingIntroduction to Web Application Penetration Testing
Introduction to Web Application Penetration Testing
 
Web Application Penetration Testing
Web Application Penetration Testing Web Application Penetration Testing
Web Application Penetration Testing
 
MITRE ATT&CK framework
MITRE ATT&CK frameworkMITRE ATT&CK framework
MITRE ATT&CK framework
 
Delivering User Behavior Analytics at Apache Hadoop Scale : A new perspective...
Delivering User Behavior Analytics at Apache Hadoop Scale : A new perspective...Delivering User Behavior Analytics at Apache Hadoop Scale : A new perspective...
Delivering User Behavior Analytics at Apache Hadoop Scale : A new perspective...
 
User Behavior Analytics Using Machine Learning
User Behavior Analytics Using Machine LearningUser Behavior Analytics Using Machine Learning
User Behavior Analytics Using Machine Learning
 
Ransomware (Fidye Yazılımları) ve Fidye Pazarlıkları.pdf
Ransomware (Fidye Yazılımları) ve Fidye Pazarlıkları.pdfRansomware (Fidye Yazılımları) ve Fidye Pazarlıkları.pdf
Ransomware (Fidye Yazılımları) ve Fidye Pazarlıkları.pdf
 
8.8 Las Vegas - Adversary Emulation con C2 Matrix
8.8 Las Vegas - Adversary Emulation con C2 Matrix8.8 Las Vegas - Adversary Emulation con C2 Matrix
8.8 Las Vegas - Adversary Emulation con C2 Matrix
 
Phishing detection & protection scheme
Phishing detection & protection schemePhishing detection & protection scheme
Phishing detection & protection scheme
 
Bug Bounty 101
Bug Bounty 101Bug Bounty 101
Bug Bounty 101
 
Web application security
Web application securityWeb application security
Web application security
 

Similar to Rootconf_phishing_v2

dasdweda PPT.pptx
dasdweda PPT.pptxdasdweda PPT.pptx
dasdweda PPT.pptx
UditanshuPandey5
 
Phishing Website Detection by Machine Learning Techniques Presentation.pdf
Phishing Website Detection by Machine Learning Techniques Presentation.pdfPhishing Website Detection by Machine Learning Techniques Presentation.pdf
Phishing Website Detection by Machine Learning Techniques Presentation.pdf
VaralakshmiKC
 
Machine Learning in Cyber Security
Machine Learning in Cyber SecurityMachine Learning in Cyber Security
Machine Learning in Cyber Security
Rishi Kant
 
Web Analytics: Challenges in Data Modeling
Web Analytics: Challenges in Data ModelingWeb Analytics: Challenges in Data Modeling
Web Analytics: Challenges in Data Modeling
Excella
 
Overcoming Barriers to the Cloud
Overcoming Barriers to the Cloud Overcoming Barriers to the Cloud
Overcoming Barriers to the Cloud
Andy Milsark
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
Sri Ambati
 
Using ML to Protect Customer Privacy by fmr Amazon Sr PM
Using ML to Protect Customer Privacy by fmr Amazon Sr PMUsing ML to Protect Customer Privacy by fmr Amazon Sr PM
Using ML to Protect Customer Privacy by fmr Amazon Sr PM
Product School
 
Cloud Services Brokerage Demystified
Cloud Services Brokerage DemystifiedCloud Services Brokerage Demystified
Cloud Services Brokerage DemystifiedZach Gardner
 
Productionising Machine Learning Models
Productionising Machine Learning ModelsProductionising Machine Learning Models
Productionising Machine Learning Models
Tash Bickley
 
IWMW 2000: Self Evident Applications for Universities
IWMW 2000: Self Evident Applications for UniversitiesIWMW 2000: Self Evident Applications for Universities
IWMW 2000: Self Evident Applications for Universities
IWMW
 
Using Data Science for Cybersecurity
Using Data Science for CybersecurityUsing Data Science for Cybersecurity
Using Data Science for Cybersecurity
VMware Tanzu
 
AReNA - Machine Learning in Financial Institutions - Prof Hernan Huwyler MBA CPA
AReNA - Machine Learning in Financial Institutions - Prof Hernan Huwyler MBA CPAAReNA - Machine Learning in Financial Institutions - Prof Hernan Huwyler MBA CPA
AReNA - Machine Learning in Financial Institutions - Prof Hernan Huwyler MBA CPA
Hernan Huwyler, MBA CPA
 
Cloud Cmputing Security
Cloud Cmputing SecurityCloud Cmputing Security
Cloud Cmputing Security
Devyani Vaidya
 
Webinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your BusinessWebinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your Business
MongoDB
 
ADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and ComparisonADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and Comparison
DATAVERSITY
 
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
Dell World
 
Three layer API Design Architecture
Three layer API Design ArchitectureThree layer API Design Architecture
Three layer API Design Architecture
Harish Kumar
 
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
Richard Robinson
 
Security-Top-10-Penetration-Findings.pptx
Security-Top-10-Penetration-Findings.pptxSecurity-Top-10-Penetration-Findings.pptx
Security-Top-10-Penetration-Findings.pptx
ssuser5a0ad11
 
Marlabs Capability Overview: Insurance
Marlabs Capability Overview: Insurance Marlabs Capability Overview: Insurance
Marlabs Capability Overview: Insurance
Marlabs
 

Similar to Rootconf_phishing_v2 (20)

dasdweda PPT.pptx
dasdweda PPT.pptxdasdweda PPT.pptx
dasdweda PPT.pptx
 
Phishing Website Detection by Machine Learning Techniques Presentation.pdf
Phishing Website Detection by Machine Learning Techniques Presentation.pdfPhishing Website Detection by Machine Learning Techniques Presentation.pdf
Phishing Website Detection by Machine Learning Techniques Presentation.pdf
 
Machine Learning in Cyber Security
Machine Learning in Cyber SecurityMachine Learning in Cyber Security
Machine Learning in Cyber Security
 
Web Analytics: Challenges in Data Modeling
Web Analytics: Challenges in Data ModelingWeb Analytics: Challenges in Data Modeling
Web Analytics: Challenges in Data Modeling
 
Overcoming Barriers to the Cloud
Overcoming Barriers to the Cloud Overcoming Barriers to the Cloud
Overcoming Barriers to the Cloud
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
 
Using ML to Protect Customer Privacy by fmr Amazon Sr PM
Using ML to Protect Customer Privacy by fmr Amazon Sr PMUsing ML to Protect Customer Privacy by fmr Amazon Sr PM
Using ML to Protect Customer Privacy by fmr Amazon Sr PM
 
Cloud Services Brokerage Demystified
Cloud Services Brokerage DemystifiedCloud Services Brokerage Demystified
Cloud Services Brokerage Demystified
 
Productionising Machine Learning Models
Productionising Machine Learning ModelsProductionising Machine Learning Models
Productionising Machine Learning Models
 
IWMW 2000: Self Evident Applications for Universities
IWMW 2000: Self Evident Applications for UniversitiesIWMW 2000: Self Evident Applications for Universities
IWMW 2000: Self Evident Applications for Universities
 
Using Data Science for Cybersecurity
Using Data Science for CybersecurityUsing Data Science for Cybersecurity
Using Data Science for Cybersecurity
 
AReNA - Machine Learning in Financial Institutions - Prof Hernan Huwyler MBA CPA
AReNA - Machine Learning in Financial Institutions - Prof Hernan Huwyler MBA CPAAReNA - Machine Learning in Financial Institutions - Prof Hernan Huwyler MBA CPA
AReNA - Machine Learning in Financial Institutions - Prof Hernan Huwyler MBA CPA
 
Cloud Cmputing Security
Cloud Cmputing SecurityCloud Cmputing Security
Cloud Cmputing Security
 
Webinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your BusinessWebinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your Business
 
ADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and ComparisonADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and Comparison
 
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
 
Three layer API Design Architecture
Three layer API Design ArchitectureThree layer API Design Architecture
Three layer API Design Architecture
 
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
 
Security-Top-10-Penetration-Findings.pptx
Security-Top-10-Penetration-Findings.pptxSecurity-Top-10-Penetration-Findings.pptx
Security-Top-10-Penetration-Findings.pptx
 
Marlabs Capability Overview: Insurance
Marlabs Capability Overview: Insurance Marlabs Capability Overview: Insurance
Marlabs Capability Overview: Insurance
 

Recently uploaded

RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 

Recently uploaded (20)

RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 

Rootconf_phishing_v2

  • 1. THE FIFTH ELEPHANT - ARJUN B.M. MUDPIPE MaliciousURLDetectionfor PhishingIdentificationandPrevention
  • 2. PHISHING INTRODUCTION The fraudulent practice of sending emails purporting to be from reputable companies in order to induce individuals to reveal personal information, such as passwords and credit card numbers MOTIVES: Financial gain, damage reputation, identity theft, fame & notoriety Phishing websites indicators: • Visually appears like the original website • Email creates a sense of urgency to force user action • Fake HTTPS certificate & domain name • Provides attractive offers which tempts the user to respond
  • 3. PROBLEM STATEMENT For Employees: • Accessing malicious sites by being victims of phishing emails • No mechanism to check bad sites by employees through self-service • Lack of awareness and training for employees For Security Teams • Manual time & effort spent to block sites by Security Operations team • Lack of internal ML solution insights on phishing data; current solutions maybe rule-based • Different teams/networks may have different requirements for site access, which cannot be served by external commercial solutions For Business: • 91% of all cyber-attacks are via phishing and they have devastating consequences • Licensing cost for commercial based solutions to detect phishing sites • Dependency on external solution/product
  • 4. APPROACH &METHODOLOGY MACHINE LEARNING APPROACH • Data Collection & Validation • Parameter Determination (Address Bar based Features, HTML and JavaScript based Features, Domain based Features, Abnormal Based Features, URL Blacklist Features) • Feature Extraction from unknown, incoming data (test data) • Create baseline model with initial dataset • Evaluate performance of model and fine-tuning • Apply test data on pre-trained baseline model & make prediction • Compare with known data sources & further fine-tune results • Retrain model on frequent intervals for better accuracy, context and relevancy • Classification model pickled and exposed as REST API WHITELIST / BLACKLIST APPROACH • Identify data sources which provide info on phishing sites • Scrape data from data sources • Create whitelist / blacklist and compare URLs CONS • Lack of updated data sources • Lack of real-time intelligence • Data not comprehensive enough • Extensive effort for data scraping RULE BASED APPROACH • Determine phishing indicators • Define rules using combination of indicators • Compare & match URLs against rules to deny/allow CONS • Complex rule set definitions • Overhead in managing and updating rules • High False Positive and False Negative rates
  • 5. ARCHITECTURE /WORKFLOW BASELINE DATA SET BASELINE ML MODEL PREDICTION TRAIN EXPOSE AS REST API FEATURE EXTRACTION TEST DATA OUTPUT RETRAIN WEB TRAFFIC (UNKNOWN DATA) INPUT • Address Bar based Features • HTML and JavaScript based Features • Domain based Features • Abnormal Based Features • URL Blacklist Features • Total of 30 features • CLASSIFICATION = 0: LEGITIMATE • CLASSIFICATION = 1: PHISHING • COMPARE WITH KNOWN SOURCES • PROBABILITY OF PREDICTION SECURITY ACTION (BLOCK / ALLOW)
  • 6. FEATURE EXTRACTION 1. having_IP_Address 2. URL_Length 3. Shortening_Service 4. having_At_Symbol 5. double_slash_redirecting 6. Prefix_Suffix 7. having_Sub_Domain 8. SSL_State 25. DNS_Record 26. web_traffic_rank 27. Page_Rank 28. Google_Index 29. Links_pointing_to_page 30. Statistical_report - top phishing domains Classification output: 0 = legitimate, 1 = phishing 9. Domain_registeration_length 10. Favicon 11. Open_ports 12. HTTPS_token_in_URL 13. Request_URL 14. URL_of_Anchor 15. Links_in_tags 16. Server_Form_Handler 17. Submitting_to_email 18. Abnormal_URL 19. Site_Redirect 20. on_mouseover_changes 21. RightClick_Disabled 22. popUpWindow 23. Iframe_redirection 24. age_of_domain
  • 7. DEPLOYING TOPRODUCTION • Context specific use-cases: • Certain sub-nets within the org might require access to certain websites to support business functionality • Org might want to block access to sites even though they are classified as “suspicious” by commercial softwares • Infrastructural & capacity planning considerations: client, load balancer, web server, queues, etc • REST-API approach: train, retrain & predictions • Develop automation test cases for your model (especially on feature engineering side) • Automate evaluation of the production model, which allows to efficiently back-test changes to the model on historical data and determine if improvements have been made or not • Possibly have different ML models / end-points exposed for different sections of the network or for different departments • Have a fall-back or set-default-value for parameters which fail to get processed by the Feature Engineering module (exception handling) • Decouple the input and the output for the model; model should still work if parameters are added, modified or deleted in feature engineering • Single egress point for web traffic, where the ML model can be plugged-in with the REST API • Have a fail-open or kill-switch mechanism for traffic to flow through if model processing fails • Place model operation in “monitoring” or “non-blocking” mode initially, which allows the ML model to get additional data and allows for fine- tuning and prevents errors • Supplement with existing controls like spam filtering, black-listing, etc • Model should refer to other data sources as well for fine-tuning in the initial stages • Baselining and retraining the model at frequent internals; also maintaining model versions • Provide security analysts with an option to tweak/edit input data for contextual representation • Deploying the MODEL client-side versus server-side
  • 8. PROS&CONS PROS • Reduce dependency, cost & license on third-party external software • Re-use of in-house org’s data rather than contribute towards improving commercial software • Better insights into online behavior of employees • Real-time protection for employees who access malicious websites or click on phishing links • Detect and prevent against unknown phishing attacks, as new patterns are created by attackers • Next level of intelligence on top of signature-based prevention techniques & blacklists • Email filtering solutions help in filtering phishing/spam emails, but this provides holistic protection for all outgoing internet traffic • Centralized solution implemented org-wide and no dependency on client- side agents/software • Anti-phishing: move from offline to real-time; move from reactive to proactive CONS • Data collection & building data repository • Initial baseline dataset has too few records • Cost / Maintenance of solution/product • Fine-tuning of rules & predictions to meet changing threat vectors • False positive rate could cause bad user experience • Needs to be supplemented with Cyber Threat Intel • Solution works only when users are connected to org network, since there is no client-side agent
  • 9. AUDIENCE TAKE-AWAYS • Opportunity for engineers and analysts to collaborate and work together to build tailored intelligent security solutions / products • Learn the various considerations in designing and deploying a ML solution in the InfoSec domain
  • 10. EMAIL : arjun.job14@gmail.com LINKEDIN: https://www.linkedin.com/in/arjunbm FURTHERREADING LINKS & REFERENCES • https://www.researchgate.net/publication/226420039_Detection_of_Phishing_Attacks_ A_Machine_Learning_Approach • https://ieeexplore.ieee.org/document/8004877 • https://pdfs.semanticscholar.org/188f/3bde688d5a47ce86bc0a8eca03aeb1bb9dfc.pdf