SlideShare a Scribd company logo
1 of 8
MUDPIPE
Malicious URL Detection for Phishing
Identification and Prevention
EMAIL : arjun.job14@gmail.com
LINKEDIN: https://www.linkedin.com/in/arjunbm
PHISHING INTRODUCTION
The fraudulent practice of sending emails purporting to be from reputable
companies in order to induce individuals to reveal personal information, such
as passwords and credit card numbers
MOTIVES: Financial gain, damage reputation, identity theft, fame & notoriety
Phishing websites indicators:
• Visually appears like the original website
• Email creates a sense of urgency to force user action
• Fake HTTPS certificate & domain name
• Provides attractive offers which tempts the user to respond
APPROACH & METHODOLOGY
• Data Collection & Validation
• Parameter Determination
• Address Bar based Features
• HTML and JavaScript based Features
• Domain based Features
• Abnormal Based Features
• URL Blacklist Features
• Feature Extraction
• Algorithm selection and output classification
• Create baseline model with initial dataset
• Evaluate performance of model and fine-tuning
• Apply test data on pre-trained baseline model & make prediction
• Compare with known data sources & further fine-tune results
• Retrain model on frequent intervals for better accuracy, context and relevancy
• Classification model pickled and exposed as REST API
• Pre-trained classification model used to classify and predict incoming URLs
• In-line integration with outgoing web traffic at egress points for centralized
monitoring and control
ARCHITECTURE / WORKFLOW
BASELINE
DATA SET
BASELINE
ML MODEL
PREDICTION
TRAIN
EXPOSE AS
REST API
FEATURE
EXTRACTION
TEST DATA
OUTPUT
RETRAIN
WEB
TRAFFIC
(UNKNOWN DATA)
INPUT
• Address Bar based Features
• HTML and JavaScript based
Features
• Domain based Features
• Abnormal Based Features
• URL Blacklist Features
• CLASSIFICATION = 0: LEGITIMATE
• CLASSIFICATION = 1: PHISHING
• COMPARE WITH KNOWN SOURCES
• PROBABILITY OF PREDICTION
SECURITY
ACTION
(BLOCK / ALLOW)
FEATURE EXTRACTION
1. having_IP_Address
2. URL_Length
3. Shortening_Service
4. having_At_Symbol
5. double_slash_redirecting
6. Prefix_Suffix
7. having_Sub_Domain
8. SSL_State
25. DNS_Record
26. web_traffic_rank
27. Page_Rank
28. Google_Index
29. Links_pointing_to_page
30. Statistical_report - top
phishing domains
Classification output: 0 = legitimate, 1 = phishing
9. Domain_registeration_length
10. Favicon
11. Open_ports
12. HTTPS_token_in_URL
13. Request_URL
14. URL_of_Anchor
15. Links_in_tags
16. Server_Form_Handler
17. Submitting_to_email
18. Abnormal_URL
19. Site_Redirect
20. on_mouseover_changes
21. RightClick_Disabled
22. popUpWindow
23. Iframe_redirection
24. age_of_domain
DEPLOYING TO PRODUCTION
• Context specific use-cases:
• Certain sub-nets within the org might require access to certain websites to support business functionality
• Org might want to block access to sites even though they are classified as “suspicious” by commercial softwares
• Infrastructural & capacity planning considerations: client, load balancer, web server, queues, etc
• REST-API approach: train, retrain & predictions
• Develop automation test cases for your model (especially on feature engineering side)
• Automate evaluation of the production model, which allows to efficiently back-test changes to the model on historical data and determine if
improvements have been made or not
• Possibly have different end-points exposed for different sections of the network or for different departments
• Have a fall-back or set-default-value for parameters which fal to get processed by the Feature Engineering module
• Decouple the input and the output for the model; model should still work if parameters are added, modified or deleted in feature engineering
• Single egress point for web traffic, where the ML model can be plugged-in with the REST API
• Have a fail-open or kill-switch mechanism for traffic to flow through if model processing fails
• Place model operation in “monitoring” or “non-blocking” mode initially, which allows the ML model to get additional data and allows for fine-
tuning and prevents errors
• Supplement with existing controls like spam filtering, black-listing, etc
• Model should refer to other data sources as well for fine-tuning in the initial stages
• Baselining and retraining the model at frequent internals; also maintaining model versions
• Provide security analysts with an option to tweak/edit input data for contextual representation
• Deploying the MODEL client-side versus server-side
BENEFITS
• Reduce dependency, cost & license on third-party external software
• Re-use of in-house org’s data rather than contribute towards improving commercial software
• Better insights into online behavior of employees
• Real-time protection for employees who access malicious websites or click on phishing links
• Detect and prevent against unknown phishing attacks, as new patterns are created by attackers
• Next level of intelligence on top of signature-based prevention techniques & blacklists
• Email filtering solutions help in filtering phishing/spam emails, but this provides holistic
protection for all outgoing internet traffic
• Centralized solution implemented org-wide and no dependency on client-side agents/software
• Anti-phishing: move from real-time to offline; move from reactive to proactive
AUDIENCE TAKE-AWAYS
• Provide insights into building an ML pipeline, data engineering & feature extraction
• Learn how to solve a “Classification” problem using ML
• Cyber Security Analysts can use the feature extraction component to quickly analyze indicators and
hence expedite incident response
• Helps security engineers to build more intelligent products, tailored to their own org requirements
• Helps understand the constituents/factors to identify malicious URLs
• Learn how to fingerprint a URL for phishing indicators using various data sources and components
• How to create/obtain baseline dataset for training the baseline ML model
• Learn how to deploy ML model in production
• Learn how to retrain the model for better accuracy and relevancy
• Learn how to identify top influencing variables which determine model output

More Related Content

What's hot

Presentation-Detecting Spammers on Social Networks
Presentation-Detecting Spammers on Social NetworksPresentation-Detecting Spammers on Social Networks
Presentation-Detecting Spammers on Social NetworksAshish Arora
 
Phishing Attacks
Phishing AttacksPhishing Attacks
Phishing AttacksJagan Mohan
 
DDoS Attack PPT by Nitin Bisht
DDoS Attack  PPT by Nitin BishtDDoS Attack  PPT by Nitin Bisht
DDoS Attack PPT by Nitin BishtNitin Bisht
 
Phishing attacks ppt
Phishing attacks pptPhishing attacks ppt
Phishing attacks pptAryan Ragu
 
Phishing attack, with SSL Encryption and HTTPS Working
Phishing attack, with SSL Encryption and HTTPS WorkingPhishing attack, with SSL Encryption and HTTPS Working
Phishing attack, with SSL Encryption and HTTPS WorkingSachin Saini
 
Seminar on detecting fake accounts in social media using machine learning
Seminar on detecting fake accounts in social media using machine learningSeminar on detecting fake accounts in social media using machine learning
Seminar on detecting fake accounts in social media using machine learningParvathi Sanil Nair
 
Password cracking and brute force
Password cracking and brute forcePassword cracking and brute force
Password cracking and brute forcevishalgohel12195
 
Introduction to Malware Analysis
Introduction to Malware AnalysisIntroduction to Malware Analysis
Introduction to Malware AnalysisAndrew McNicol
 
Final spam-e-mail-detection
Final  spam-e-mail-detectionFinal  spam-e-mail-detection
Final spam-e-mail-detectionPartnered Health
 
final presentation fake news detection.pptx
final presentation fake news detection.pptxfinal presentation fake news detection.pptx
final presentation fake news detection.pptxRudraSaraswat6
 
Intrusion detection
Intrusion detectionIntrusion detection
Intrusion detectionCAS
 
Basics of Denial of Service Attacks
Basics of Denial of Service AttacksBasics of Denial of Service Attacks
Basics of Denial of Service AttacksHansa Nidushan
 

What's hot (20)

Email spam detection
Email spam detectionEmail spam detection
Email spam detection
 
PPT on Phishing
PPT on PhishingPPT on Phishing
PPT on Phishing
 
Presentation-Detecting Spammers on Social Networks
Presentation-Detecting Spammers on Social NetworksPresentation-Detecting Spammers on Social Networks
Presentation-Detecting Spammers on Social Networks
 
IP Spoofing
IP SpoofingIP Spoofing
IP Spoofing
 
Phishing Attacks
Phishing AttacksPhishing Attacks
Phishing Attacks
 
Phishing technology
Phishing technologyPhishing technology
Phishing technology
 
Bug bounty
Bug bountyBug bounty
Bug bounty
 
DDoS Attack PPT by Nitin Bisht
DDoS Attack  PPT by Nitin BishtDDoS Attack  PPT by Nitin Bisht
DDoS Attack PPT by Nitin Bisht
 
Email Forensics
Email ForensicsEmail Forensics
Email Forensics
 
Phishing attacks ppt
Phishing attacks pptPhishing attacks ppt
Phishing attacks ppt
 
Phishing attack, with SSL Encryption and HTTPS Working
Phishing attack, with SSL Encryption and HTTPS WorkingPhishing attack, with SSL Encryption and HTTPS Working
Phishing attack, with SSL Encryption and HTTPS Working
 
Web Security
Web SecurityWeb Security
Web Security
 
Seminar on detecting fake accounts in social media using machine learning
Seminar on detecting fake accounts in social media using machine learningSeminar on detecting fake accounts in social media using machine learning
Seminar on detecting fake accounts in social media using machine learning
 
Password cracking and brute force
Password cracking and brute forcePassword cracking and brute force
Password cracking and brute force
 
Cyber attacks
Cyber attacks Cyber attacks
Cyber attacks
 
Introduction to Malware Analysis
Introduction to Malware AnalysisIntroduction to Malware Analysis
Introduction to Malware Analysis
 
Final spam-e-mail-detection
Final  spam-e-mail-detectionFinal  spam-e-mail-detection
Final spam-e-mail-detection
 
final presentation fake news detection.pptx
final presentation fake news detection.pptxfinal presentation fake news detection.pptx
final presentation fake news detection.pptx
 
Intrusion detection
Intrusion detectionIntrusion detection
Intrusion detection
 
Basics of Denial of Service Attacks
Basics of Denial of Service AttacksBasics of Denial of Service Attacks
Basics of Denial of Service Attacks
 

Similar to Detect Phishing URLs with MUDPIPE Machine Learning Model

Rootconf_phishing_v2
Rootconf_phishing_v2Rootconf_phishing_v2
Rootconf_phishing_v2Arjun BM
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Sri Ambati
 
Webinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your BusinessWebinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your BusinessMongoDB
 
Web Analytics: Challenges in Data Modeling
Web Analytics: Challenges in Data ModelingWeb Analytics: Challenges in Data Modeling
Web Analytics: Challenges in Data ModelingExcella
 
Business Driven IT Design
Business Driven IT Design Business Driven IT Design
Business Driven IT Design WSO2
 
Ensure a Successful SAP Hybris Implementation – Part 2: Architecture and Buil...
Ensure a Successful SAP Hybris Implementation – Part 2: Architecture and Buil...Ensure a Successful SAP Hybris Implementation – Part 2: Architecture and Buil...
Ensure a Successful SAP Hybris Implementation – Part 2: Architecture and Buil...Kellton Tech Solutions Ltd
 
CV_PurnimaBalla_WCS-Consultant_7Yrs
CV_PurnimaBalla_WCS-Consultant_7YrsCV_PurnimaBalla_WCS-Consultant_7Yrs
CV_PurnimaBalla_WCS-Consultant_7YrsPurnima Balla
 
Cloud Services Brokerage Demystified
Cloud Services Brokerage DemystifiedCloud Services Brokerage Demystified
Cloud Services Brokerage DemystifiedZach Gardner
 
Overcoming Barriers to the Cloud
Overcoming Barriers to the Cloud Overcoming Barriers to the Cloud
Overcoming Barriers to the Cloud Andy Milsark
 
How to Avoid Continuously Delivering Faulty Software
How to Avoid Continuously Delivering Faulty SoftwareHow to Avoid Continuously Delivering Faulty Software
How to Avoid Continuously Delivering Faulty SoftwarePerforce
 
Using ML to Protect Customer Privacy by fmr Amazon Sr PM
Using ML to Protect Customer Privacy by fmr Amazon Sr PMUsing ML to Protect Customer Privacy by fmr Amazon Sr PM
Using ML to Protect Customer Privacy by fmr Amazon Sr PMProduct School
 
Securing Applications in the Cloud
Securing Applications in the CloudSecuring Applications in the Cloud
Securing Applications in the CloudSecurity Innovation
 
Marlabs Capability Overview: Insurance
Marlabs Capability Overview: Insurance Marlabs Capability Overview: Insurance
Marlabs Capability Overview: Insurance Marlabs
 
The Cloud's Business Impact on Human Resources
The Cloud's Business Impact on Human ResourcesThe Cloud's Business Impact on Human Resources
The Cloud's Business Impact on Human ResourcesFrankHolman
 
Bootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source ToolsBootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source Toolsbotsplash.com
 

Similar to Detect Phishing URLs with MUDPIPE Machine Learning Model (20)

Rootconf_phishing_v2
Rootconf_phishing_v2Rootconf_phishing_v2
Rootconf_phishing_v2
 
dasdweda PPT.pptx
dasdweda PPT.pptxdasdweda PPT.pptx
dasdweda PPT.pptx
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
 
Webinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your BusinessWebinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your Business
 
Web Analytics: Challenges in Data Modeling
Web Analytics: Challenges in Data ModelingWeb Analytics: Challenges in Data Modeling
Web Analytics: Challenges in Data Modeling
 
Business Driven IT Design
Business Driven IT Design Business Driven IT Design
Business Driven IT Design
 
Apps
AppsApps
Apps
 
Ensure a Successful SAP Hybris Implementation – Part 2: Architecture and Buil...
Ensure a Successful SAP Hybris Implementation – Part 2: Architecture and Buil...Ensure a Successful SAP Hybris Implementation – Part 2: Architecture and Buil...
Ensure a Successful SAP Hybris Implementation – Part 2: Architecture and Buil...
 
Cybersecurity update 12
Cybersecurity update 12Cybersecurity update 12
Cybersecurity update 12
 
CV_PurnimaBalla_WCS-Consultant_7Yrs
CV_PurnimaBalla_WCS-Consultant_7YrsCV_PurnimaBalla_WCS-Consultant_7Yrs
CV_PurnimaBalla_WCS-Consultant_7Yrs
 
Cloud Services Brokerage Demystified
Cloud Services Brokerage DemystifiedCloud Services Brokerage Demystified
Cloud Services Brokerage Demystified
 
Overcoming Barriers to the Cloud
Overcoming Barriers to the Cloud Overcoming Barriers to the Cloud
Overcoming Barriers to the Cloud
 
How to Avoid Continuously Delivering Faulty Software
How to Avoid Continuously Delivering Faulty SoftwareHow to Avoid Continuously Delivering Faulty Software
How to Avoid Continuously Delivering Faulty Software
 
Using ML to Protect Customer Privacy by fmr Amazon Sr PM
Using ML to Protect Customer Privacy by fmr Amazon Sr PMUsing ML to Protect Customer Privacy by fmr Amazon Sr PM
Using ML to Protect Customer Privacy by fmr Amazon Sr PM
 
Securing Applications in the Cloud
Securing Applications in the CloudSecuring Applications in the Cloud
Securing Applications in the Cloud
 
Sai_Resume
Sai_ResumeSai_Resume
Sai_Resume
 
Marlabs Capability Overview: Insurance
Marlabs Capability Overview: Insurance Marlabs Capability Overview: Insurance
Marlabs Capability Overview: Insurance
 
Satya_Prakash
Satya_PrakashSatya_Prakash
Satya_Prakash
 
The Cloud's Business Impact on Human Resources
The Cloud's Business Impact on Human ResourcesThe Cloud's Business Impact on Human Resources
The Cloud's Business Impact on Human Resources
 
Bootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source ToolsBootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source Tools
 

Recently uploaded

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Detect Phishing URLs with MUDPIPE Machine Learning Model

  • 1. MUDPIPE Malicious URL Detection for Phishing Identification and Prevention EMAIL : arjun.job14@gmail.com LINKEDIN: https://www.linkedin.com/in/arjunbm
  • 2. PHISHING INTRODUCTION The fraudulent practice of sending emails purporting to be from reputable companies in order to induce individuals to reveal personal information, such as passwords and credit card numbers MOTIVES: Financial gain, damage reputation, identity theft, fame & notoriety Phishing websites indicators: • Visually appears like the original website • Email creates a sense of urgency to force user action • Fake HTTPS certificate & domain name • Provides attractive offers which tempts the user to respond
  • 3. APPROACH & METHODOLOGY • Data Collection & Validation • Parameter Determination • Address Bar based Features • HTML and JavaScript based Features • Domain based Features • Abnormal Based Features • URL Blacklist Features • Feature Extraction • Algorithm selection and output classification • Create baseline model with initial dataset • Evaluate performance of model and fine-tuning • Apply test data on pre-trained baseline model & make prediction • Compare with known data sources & further fine-tune results • Retrain model on frequent intervals for better accuracy, context and relevancy • Classification model pickled and exposed as REST API • Pre-trained classification model used to classify and predict incoming URLs • In-line integration with outgoing web traffic at egress points for centralized monitoring and control
  • 4. ARCHITECTURE / WORKFLOW BASELINE DATA SET BASELINE ML MODEL PREDICTION TRAIN EXPOSE AS REST API FEATURE EXTRACTION TEST DATA OUTPUT RETRAIN WEB TRAFFIC (UNKNOWN DATA) INPUT • Address Bar based Features • HTML and JavaScript based Features • Domain based Features • Abnormal Based Features • URL Blacklist Features • CLASSIFICATION = 0: LEGITIMATE • CLASSIFICATION = 1: PHISHING • COMPARE WITH KNOWN SOURCES • PROBABILITY OF PREDICTION SECURITY ACTION (BLOCK / ALLOW)
  • 5. FEATURE EXTRACTION 1. having_IP_Address 2. URL_Length 3. Shortening_Service 4. having_At_Symbol 5. double_slash_redirecting 6. Prefix_Suffix 7. having_Sub_Domain 8. SSL_State 25. DNS_Record 26. web_traffic_rank 27. Page_Rank 28. Google_Index 29. Links_pointing_to_page 30. Statistical_report - top phishing domains Classification output: 0 = legitimate, 1 = phishing 9. Domain_registeration_length 10. Favicon 11. Open_ports 12. HTTPS_token_in_URL 13. Request_URL 14. URL_of_Anchor 15. Links_in_tags 16. Server_Form_Handler 17. Submitting_to_email 18. Abnormal_URL 19. Site_Redirect 20. on_mouseover_changes 21. RightClick_Disabled 22. popUpWindow 23. Iframe_redirection 24. age_of_domain
  • 6. DEPLOYING TO PRODUCTION • Context specific use-cases: • Certain sub-nets within the org might require access to certain websites to support business functionality • Org might want to block access to sites even though they are classified as “suspicious” by commercial softwares • Infrastructural & capacity planning considerations: client, load balancer, web server, queues, etc • REST-API approach: train, retrain & predictions • Develop automation test cases for your model (especially on feature engineering side) • Automate evaluation of the production model, which allows to efficiently back-test changes to the model on historical data and determine if improvements have been made or not • Possibly have different end-points exposed for different sections of the network or for different departments • Have a fall-back or set-default-value for parameters which fal to get processed by the Feature Engineering module • Decouple the input and the output for the model; model should still work if parameters are added, modified or deleted in feature engineering • Single egress point for web traffic, where the ML model can be plugged-in with the REST API • Have a fail-open or kill-switch mechanism for traffic to flow through if model processing fails • Place model operation in “monitoring” or “non-blocking” mode initially, which allows the ML model to get additional data and allows for fine- tuning and prevents errors • Supplement with existing controls like spam filtering, black-listing, etc • Model should refer to other data sources as well for fine-tuning in the initial stages • Baselining and retraining the model at frequent internals; also maintaining model versions • Provide security analysts with an option to tweak/edit input data for contextual representation • Deploying the MODEL client-side versus server-side
  • 7. BENEFITS • Reduce dependency, cost & license on third-party external software • Re-use of in-house org’s data rather than contribute towards improving commercial software • Better insights into online behavior of employees • Real-time protection for employees who access malicious websites or click on phishing links • Detect and prevent against unknown phishing attacks, as new patterns are created by attackers • Next level of intelligence on top of signature-based prevention techniques & blacklists • Email filtering solutions help in filtering phishing/spam emails, but this provides holistic protection for all outgoing internet traffic • Centralized solution implemented org-wide and no dependency on client-side agents/software • Anti-phishing: move from real-time to offline; move from reactive to proactive
  • 8. AUDIENCE TAKE-AWAYS • Provide insights into building an ML pipeline, data engineering & feature extraction • Learn how to solve a “Classification” problem using ML • Cyber Security Analysts can use the feature extraction component to quickly analyze indicators and hence expedite incident response • Helps security engineers to build more intelligent products, tailored to their own org requirements • Helps understand the constituents/factors to identify malicious URLs • Learn how to fingerprint a URL for phishing indicators using various data sources and components • How to create/obtain baseline dataset for training the baseline ML model • Learn how to deploy ML model in production • Learn how to retrain the model for better accuracy and relevancy • Learn how to identify top influencing variables which determine model output