SlideShare a Scribd company logo
1 of 12
Download to read offline
SHREYA GOPAL SUNDARI
PHISHING WEBSITE DETECTION
by MACHINE LEARNING
TECHNIQUES
INTRODUCTION
• Phishing is the most commonly used social engineering and cyber attack.
• Through such attacks, the phisher targets naïve online users by tricking them into
revealing confidential information, with the purpose of using it fraudulently.
• In order to avoid getting phished,
• users should have awareness of phishing websites.
• have a blacklist of phishing websites which requires the knowledge of website being detected
as phishing.
• detect them in their early appearance, using machine learning and deep neural network
algorithms.
• Of the above three, the machine learning based method is proven to be most
effective than the other methods.
• Even then, online users are still being trapped into revealing sensitive information in
phishing websites.
OBJECTIVES
A phishing website is a common social engineering method that mimics
trustful uniform resource locators (URLs) and webpages. The objective of this
project is to train machine learning models and deep neural nets on the dataset
created to predict phishing websites. Both phishing and benign URLs of websites
are gathered to form a dataset and from them required URL and website
content-based features are extracted. The performance level of each model is
measures and compared.
APPROACH
Below mentioned are the steps involved in the completion of this project:
• Collect dataset containing phishing and legitimate websites from the open source platforms.
• Write a code to extract the required features from the URL database.
• Analyze and preprocess the dataset by using EDA techniques.
• Divide the dataset into training and testing sets.
• Run selected machine learning and deep neural network algorithms like SVM, Random Forest,
Autoencoder on the dataset.
• Write a code for displaying the evaluation result considering accuracy metrics.
• Compare the obtained results for trained models and specify which is better.
DATA COLLECTION
• Legitimate URLs are collected from the dataset provided by University of
New Brunswick, https://www.unb.ca/cic/datasets/url-2016.html.
• From the collection, 5000 URLs are randomly picked.
• Phishing URLs are collected from opensource service called PhishTank . This
service provide a set of phishing URLs in multiple formats like csv, json etc.
that gets updated hourly.
• Form the obtained collection, 5000 URLs are randomly picked.
FEATURE SELECTION
• The following category of features are selected:
• Address Bar based Features
• Domain based Features
• HTML & Javascript based Feature
• Address Bar based Features considered are:
• Domian of URL • Redirection ‘//’ in URL
• IP Address in URL • ‘http/https’ in Domain name
• ‘@’ Symbol in URL • Using URL Shortening Service
• Length of URL • Prefix or Suffix "-" in Domain
• Depth of URL
FEATURE SELECTION (CONT.)
• Domain based Features considered are:
• HTML and JavaScript based Features considered are:
• All together 17 features are extracted from the dataset.
• DNS Record • Age of Domain
• Website Traffic • End Period of Domain
• Iframe Redirection • Disabling Right Click
• Status Bar Customization • Website Forwarding
FEATURES DISTRIBUTION
MACHINE LEARNING MODELS
• This is a supervised machine learning task. There are two major types of supervised
machine learning problems, called classification and regression.
• This data set comes under classification problem, as the input URL is classified as
phishing (1) or legitimate (0). The machine learning models (classification) considered
to train the dataset in this notebook are:
• Decision Tree
• Random Forest
• Multilayer Perceptrons
• XGBoost
• Autoencoder Neural Network
• Support Vector Machines
MODEL EVALUATION
• The models are evaluated, and the considered metric is accuracy.
• Below Figure shows the training and test dataset accuracy by the respective models:
• For the above it is clear that the XGBoost model gives better performance. The model is
saved for further usage.
NEXT STEPS
• Working on this project is very knowledgeable and worth the effort.
• Through this project, one can know a lot about the phishing websites and how
they are differentiated from legitimate ones.
• This project can be taken further by creating a browser extensions of
developing a GUI.
• These should classify the inputted URL to legitimate or phishing with the use of
the saved model.
Thank You…..

More Related Content

What's hot

Understanding Cross-site Request Forgery
Understanding Cross-site Request ForgeryUnderstanding Cross-site Request Forgery
Understanding Cross-site Request ForgeryDaniel Miessler
 
Intro to Web Application Security
Intro to Web Application SecurityIntro to Web Application Security
Intro to Web Application SecurityRob Ragan
 
Thick Client Penetration Testing.pdf
Thick Client Penetration Testing.pdfThick Client Penetration Testing.pdf
Thick Client Penetration Testing.pdfSouvikRoy114738
 
Web security ppt sniper corporation
Web security ppt   sniper corporationWeb security ppt   sniper corporation
Web security ppt sniper corporationsharmaakash1881
 
Vulnerabilities in modern web applications
Vulnerabilities in modern web applicationsVulnerabilities in modern web applications
Vulnerabilities in modern web applicationsNiyas Nazar
 
CSRF Attack and Its Prevention technique in ASP.NET MVC
CSRF Attack and Its Prevention technique in ASP.NET MVCCSRF Attack and Its Prevention technique in ASP.NET MVC
CSRF Attack and Its Prevention technique in ASP.NET MVCSuvash Shah
 
Thick Application Penetration Testing: Crash Course
Thick Application Penetration Testing: Crash CourseThick Application Penetration Testing: Crash Course
Thick Application Penetration Testing: Crash CourseScott Sutherland
 
Web Application Penetration Tests - Information Gathering Stage
Web Application Penetration Tests - Information Gathering StageWeb Application Penetration Tests - Information Gathering Stage
Web Application Penetration Tests - Information Gathering StageNetsparker
 
DVWA(Damn Vulnerabilities Web Application)
DVWA(Damn Vulnerabilities Web Application)DVWA(Damn Vulnerabilities Web Application)
DVWA(Damn Vulnerabilities Web Application)Soham Kansodaria
 
Phishing Attacks
Phishing AttacksPhishing Attacks
Phishing AttacksJagan Mohan
 
MITRE ATT&CK framework
MITRE ATT&CK frameworkMITRE ATT&CK framework
MITRE ATT&CK frameworkBhushan Gurav
 

What's hot (20)

Understanding Cross-site Request Forgery
Understanding Cross-site Request ForgeryUnderstanding Cross-site Request Forgery
Understanding Cross-site Request Forgery
 
Phishing ppt
Phishing pptPhishing ppt
Phishing ppt
 
Web Application
Web ApplicationWeb Application
Web Application
 
Intro to Web Application Security
Intro to Web Application SecurityIntro to Web Application Security
Intro to Web Application Security
 
Thick Client Penetration Testing.pdf
Thick Client Penetration Testing.pdfThick Client Penetration Testing.pdf
Thick Client Penetration Testing.pdf
 
Cross site scripting
Cross site scriptingCross site scripting
Cross site scripting
 
Web browsers and web document
Web browsers and web documentWeb browsers and web document
Web browsers and web document
 
Xss attack
Xss attackXss attack
Xss attack
 
Web security ppt sniper corporation
Web security ppt   sniper corporationWeb security ppt   sniper corporation
Web security ppt sniper corporation
 
Web security
Web securityWeb security
Web security
 
Web Application Security 101
Web Application Security 101Web Application Security 101
Web Application Security 101
 
Vulnerabilities in modern web applications
Vulnerabilities in modern web applicationsVulnerabilities in modern web applications
Vulnerabilities in modern web applications
 
Packet sniffers
Packet sniffersPacket sniffers
Packet sniffers
 
IDS and IPS
IDS and IPSIDS and IPS
IDS and IPS
 
CSRF Attack and Its Prevention technique in ASP.NET MVC
CSRF Attack and Its Prevention technique in ASP.NET MVCCSRF Attack and Its Prevention technique in ASP.NET MVC
CSRF Attack and Its Prevention technique in ASP.NET MVC
 
Thick Application Penetration Testing: Crash Course
Thick Application Penetration Testing: Crash CourseThick Application Penetration Testing: Crash Course
Thick Application Penetration Testing: Crash Course
 
Web Application Penetration Tests - Information Gathering Stage
Web Application Penetration Tests - Information Gathering StageWeb Application Penetration Tests - Information Gathering Stage
Web Application Penetration Tests - Information Gathering Stage
 
DVWA(Damn Vulnerabilities Web Application)
DVWA(Damn Vulnerabilities Web Application)DVWA(Damn Vulnerabilities Web Application)
DVWA(Damn Vulnerabilities Web Application)
 
Phishing Attacks
Phishing AttacksPhishing Attacks
Phishing Attacks
 
MITRE ATT&CK framework
MITRE ATT&CK frameworkMITRE ATT&CK framework
MITRE ATT&CK framework
 

Similar to Phishing Website Detection by Machine Learning Techniques Presentation.pdf

PHISHING URL - Review 1.pptx
PHISHING URL - Review 1.pptxPHISHING URL - Review 1.pptx
PHISHING URL - Review 1.pptxArulvincent3
 
Phishing Detection using Machine Learning
Phishing Detection using Machine LearningPhishing Detection using Machine Learning
Phishing Detection using Machine LearningArjun BM
 
Detection of Phishing Websites
Detection of Phishing Websites Detection of Phishing Websites
Detection of Phishing Websites Nikhil Soni
 
Rootconf_phishing_v2
Rootconf_phishing_v2Rootconf_phishing_v2
Rootconf_phishing_v2Arjun BM
 
CYBER THREAT DETECTION PLATFORM USING MACHINE LEARNING.pptx
CYBER THREAT DETECTION PLATFORM USING MACHINE LEARNING.pptxCYBER THREAT DETECTION PLATFORM USING MACHINE LEARNING.pptx
CYBER THREAT DETECTION PLATFORM USING MACHINE LEARNING.pptxTAMILMANIP6
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Sri Ambati
 
Detecting Phishing Websites Using Machine Learning
Detecting Phishing Websites Using Machine LearningDetecting Phishing Websites Using Machine Learning
Detecting Phishing Websites Using Machine LearningIRJET Journal
 
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malwareDefcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malwareDaveEdwards12
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningYogesh Sharma
 
Static Analysis Primer
Static Analysis PrimerStatic Analysis Primer
Static Analysis PrimerCoverity
 
Avtar's ppt
Avtar's pptAvtar's ppt
Avtar's pptmak57
 
Using Data Science for Cybersecurity
Using Data Science for CybersecurityUsing Data Science for Cybersecurity
Using Data Science for CybersecurityVMware Tanzu
 
Online talent sourcing - a future essentia
Online talent sourcing - a future essentiaOnline talent sourcing - a future essentia
Online talent sourcing - a future essentiaHSE Guru
 
How to build corporate size fraud prevention
How to build corporate size fraud preventionHow to build corporate size fraud prevention
How to build corporate size fraud preventionYury Leonychev
 
MOBILE CLOUD COMPUTING USING CRYPTOGRAPHIC HASH FUNCTION
MOBILE CLOUD COMPUTING USING CRYPTOGRAPHIC HASH FUNCTIONMOBILE CLOUD COMPUTING USING CRYPTOGRAPHIC HASH FUNCTION
MOBILE CLOUD COMPUTING USING CRYPTOGRAPHIC HASH FUNCTIONI Ruby
 
J2ee project lists:-Mumbai Academics
J2ee project lists:-Mumbai AcademicsJ2ee project lists:-Mumbai Academics
J2ee project lists:-Mumbai AcademicsMumbai Academisc
 

Similar to Phishing Website Detection by Machine Learning Techniques Presentation.pdf (20)

PHISHING URL - Review 1.pptx
PHISHING URL - Review 1.pptxPHISHING URL - Review 1.pptx
PHISHING URL - Review 1.pptx
 
dasdweda PPT.pptx
dasdweda PPT.pptxdasdweda PPT.pptx
dasdweda PPT.pptx
 
Phishing Detection using Machine Learning
Phishing Detection using Machine LearningPhishing Detection using Machine Learning
Phishing Detection using Machine Learning
 
Detection of Phishing Websites
Detection of Phishing Websites Detection of Phishing Websites
Detection of Phishing Websites
 
Rootconf_phishing_v2
Rootconf_phishing_v2Rootconf_phishing_v2
Rootconf_phishing_v2
 
CYBER THREAT DETECTION PLATFORM USING MACHINE LEARNING.pptx
CYBER THREAT DETECTION PLATFORM USING MACHINE LEARNING.pptxCYBER THREAT DETECTION PLATFORM USING MACHINE LEARNING.pptx
CYBER THREAT DETECTION PLATFORM USING MACHINE LEARNING.pptx
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
 
Fuzzy Rough Set Feature Selection to Enhance Phishing Attack Detection
Fuzzy Rough Set Feature Selection to Enhance Phishing Attack Detection Fuzzy Rough Set Feature Selection to Enhance Phishing Attack Detection
Fuzzy Rough Set Feature Selection to Enhance Phishing Attack Detection
 
Detecting Phishing Websites Using Machine Learning
Detecting Phishing Websites Using Machine LearningDetecting Phishing Websites Using Machine Learning
Detecting Phishing Websites Using Machine Learning
 
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malwareDefcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Static Analysis Primer
Static Analysis PrimerStatic Analysis Primer
Static Analysis Primer
 
Avtar's ppt
Avtar's pptAvtar's ppt
Avtar's ppt
 
Using Data Science for Cybersecurity
Using Data Science for CybersecurityUsing Data Science for Cybersecurity
Using Data Science for Cybersecurity
 
Final .pptx
Final .pptxFinal .pptx
Final .pptx
 
Online talent sourcing - a future essentia
Online talent sourcing - a future essentiaOnline talent sourcing - a future essentia
Online talent sourcing - a future essentia
 
How to build corporate size fraud prevention
How to build corporate size fraud preventionHow to build corporate size fraud prevention
How to build corporate size fraud prevention
 
MOBILE CLOUD COMPUTING USING CRYPTOGRAPHIC HASH FUNCTION
MOBILE CLOUD COMPUTING USING CRYPTOGRAPHIC HASH FUNCTIONMOBILE CLOUD COMPUTING USING CRYPTOGRAPHIC HASH FUNCTION
MOBILE CLOUD COMPUTING USING CRYPTOGRAPHIC HASH FUNCTION
 
What is web scraping?
What is web scraping?What is web scraping?
What is web scraping?
 
J2ee project lists:-Mumbai Academics
J2ee project lists:-Mumbai AcademicsJ2ee project lists:-Mumbai Academics
J2ee project lists:-Mumbai Academics
 

Recently uploaded

Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
Quarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayQuarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayMakMakNepo
 
Planning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxPlanning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxLigayaBacuel1
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 

Recently uploaded (20)

Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
Quarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayQuarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up Friday
 
Planning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxPlanning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptx
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 

Phishing Website Detection by Machine Learning Techniques Presentation.pdf

  • 1. SHREYA GOPAL SUNDARI PHISHING WEBSITE DETECTION by MACHINE LEARNING TECHNIQUES
  • 2. INTRODUCTION • Phishing is the most commonly used social engineering and cyber attack. • Through such attacks, the phisher targets naïve online users by tricking them into revealing confidential information, with the purpose of using it fraudulently. • In order to avoid getting phished, • users should have awareness of phishing websites. • have a blacklist of phishing websites which requires the knowledge of website being detected as phishing. • detect them in their early appearance, using machine learning and deep neural network algorithms. • Of the above three, the machine learning based method is proven to be most effective than the other methods. • Even then, online users are still being trapped into revealing sensitive information in phishing websites.
  • 3. OBJECTIVES A phishing website is a common social engineering method that mimics trustful uniform resource locators (URLs) and webpages. The objective of this project is to train machine learning models and deep neural nets on the dataset created to predict phishing websites. Both phishing and benign URLs of websites are gathered to form a dataset and from them required URL and website content-based features are extracted. The performance level of each model is measures and compared.
  • 4. APPROACH Below mentioned are the steps involved in the completion of this project: • Collect dataset containing phishing and legitimate websites from the open source platforms. • Write a code to extract the required features from the URL database. • Analyze and preprocess the dataset by using EDA techniques. • Divide the dataset into training and testing sets. • Run selected machine learning and deep neural network algorithms like SVM, Random Forest, Autoencoder on the dataset. • Write a code for displaying the evaluation result considering accuracy metrics. • Compare the obtained results for trained models and specify which is better.
  • 5. DATA COLLECTION • Legitimate URLs are collected from the dataset provided by University of New Brunswick, https://www.unb.ca/cic/datasets/url-2016.html. • From the collection, 5000 URLs are randomly picked. • Phishing URLs are collected from opensource service called PhishTank . This service provide a set of phishing URLs in multiple formats like csv, json etc. that gets updated hourly. • Form the obtained collection, 5000 URLs are randomly picked.
  • 6. FEATURE SELECTION • The following category of features are selected: • Address Bar based Features • Domain based Features • HTML & Javascript based Feature • Address Bar based Features considered are: • Domian of URL • Redirection ‘//’ in URL • IP Address in URL • ‘http/https’ in Domain name • ‘@’ Symbol in URL • Using URL Shortening Service • Length of URL • Prefix or Suffix "-" in Domain • Depth of URL
  • 7. FEATURE SELECTION (CONT.) • Domain based Features considered are: • HTML and JavaScript based Features considered are: • All together 17 features are extracted from the dataset. • DNS Record • Age of Domain • Website Traffic • End Period of Domain • Iframe Redirection • Disabling Right Click • Status Bar Customization • Website Forwarding
  • 9. MACHINE LEARNING MODELS • This is a supervised machine learning task. There are two major types of supervised machine learning problems, called classification and regression. • This data set comes under classification problem, as the input URL is classified as phishing (1) or legitimate (0). The machine learning models (classification) considered to train the dataset in this notebook are: • Decision Tree • Random Forest • Multilayer Perceptrons • XGBoost • Autoencoder Neural Network • Support Vector Machines
  • 10. MODEL EVALUATION • The models are evaluated, and the considered metric is accuracy. • Below Figure shows the training and test dataset accuracy by the respective models: • For the above it is clear that the XGBoost model gives better performance. The model is saved for further usage.
  • 11. NEXT STEPS • Working on this project is very knowledgeable and worth the effort. • Through this project, one can know a lot about the phishing websites and how they are differentiated from legitimate ones. • This project can be taken further by creating a browser extensions of developing a GUI. • These should classify the inputted URL to legitimate or phishing with the use of the saved model.