SlideShare a Scribd company logo
SHREYA GOPAL SUNDARI
PHISHING WEBSITE DETECTION
by MACHINE LEARNING
TECHNIQUES
INTRODUCTION
• Phishing is the most commonly used social engineering and cyber attack.
• Through such attacks, the phisher targets naïve online users by tricking them into
revealing confidential information, with the purpose of using it fraudulently.
• In order to avoid getting phished,
• users should have awareness of phishing websites.
• have a blacklist of phishing websites which requires the knowledge of website being detected
as phishing.
• detect them in their early appearance, using machine learning and deep neural network
algorithms.
• Of the above three, the machine learning based method is proven to be most
effective than the other methods.
• Even then, online users are still being trapped into revealing sensitive information in
phishing websites.
OBJECTIVES
A phishing website is a common social engineering method that mimics
trustful uniform resource locators (URLs) and webpages. The objective of this
project is to train machine learning models and deep neural nets on the dataset
created to predict phishing websites. Both phishing and benign URLs of websites
are gathered to form a dataset and from them required URL and website
content-based features are extracted. The performance level of each model is
measures and compared.
APPROACH
Below mentioned are the steps involved in the completion of this project:
• Collect dataset containing phishing and legitimate websites from the open source platforms.
• Write a code to extract the required features from the URL database.
• Analyze and preprocess the dataset by using EDA techniques.
• Divide the dataset into training and testing sets.
• Run selected machine learning and deep neural network algorithms like SVM, Random Forest,
Autoencoder on the dataset.
• Write a code for displaying the evaluation result considering accuracy metrics.
• Compare the obtained results for trained models and specify which is better.
DATA COLLECTION
• Legitimate URLs are collected from the dataset provided by University of
New Brunswick, https://www.unb.ca/cic/datasets/url-2016.html.
• From the collection, 5000 URLs are randomly picked.
• Phishing URLs are collected from opensource service called PhishTank . This
service provide a set of phishing URLs in multiple formats like csv, json etc.
that gets updated hourly.
• Form the obtained collection, 5000 URLs are randomly picked.
FEATURE SELECTION
• The following category of features are selected:
• Address Bar based Features
• Domain based Features
• HTML & Javascript based Feature
• Address Bar based Features considered are:
• Domian of URL • Redirection ‘//’ in URL
• IP Address in URL • ‘http/https’ in Domain name
• ‘@’ Symbol in URL • Using URL Shortening Service
• Length of URL • Prefix or Suffix "-" in Domain
• Depth of URL
FEATURE SELECTION (CONT.)
• Domain based Features considered are:
• HTML and JavaScript based Features considered are:
• All together 17 features are extracted from the dataset.
• DNS Record • Age of Domain
• Website Traffic • End Period of Domain
• Iframe Redirection • Disabling Right Click
• Status Bar Customization • Website Forwarding
FEATURES DISTRIBUTION
MACHINE LEARNING MODELS
• This is a supervised machine learning task. There are two major types of supervised
machine learning problems, called classification and regression.
• This data set comes under classification problem, as the input URL is classified as
phishing (1) or legitimate (0). The machine learning models (classification) considered
to train the dataset in this notebook are:
• Decision Tree
• Random Forest
• Multilayer Perceptrons
• XGBoost
• Autoencoder Neural Network
• Support Vector Machines
MODEL EVALUATION
• The models are evaluated, and the considered metric is accuracy.
• Below Figure shows the training and test dataset accuracy by the respective models:
• For the above it is clear that the XGBoost model gives better performance. The model is
saved for further usage.
NEXT STEPS
• Working on this project is very knowledgeable and worth the effort.
• Through this project, one can know a lot about the phishing websites and how
they are differentiated from legitimate ones.
• This project can be taken further by creating a browser extensions of
developing a GUI.
• These should classify the inputted URL to legitimate or phishing with the use of
the saved model.
Thank You…..

More Related Content

What's hot

Web usage mining
Web usage miningWeb usage mining
Web usage mining
Monu Chaudhary
 
Phishing attack seminar presentation
Phishing attack seminar presentation Phishing attack seminar presentation
Phishing attack seminar presentation
AniketPandit18
 
Intrusion detection system
Intrusion detection systemIntrusion detection system
Intrusion detection system
Aparna Bhadran
 
Malware Classification and Analysis
Malware Classification and AnalysisMalware Classification and Analysis
Malware Classification and Analysis
Prashant Chopra
 
Machine Learning in Cyber Security
Machine Learning in Cyber SecurityMachine Learning in Cyber Security
Machine Learning in Cyber Security
Rishi Kant
 
Fake News detection.pptx
Fake News detection.pptxFake News detection.pptx
Fake News detection.pptx
Sanad Bhowmik
 
Application of Machine Learning in Cybersecurity
Application of Machine Learning in CybersecurityApplication of Machine Learning in Cybersecurity
Application of Machine Learning in Cybersecurity
Pratap Dangeti
 
detection of malicious URLs.pptx
detection of malicious URLs.pptxdetection of malicious URLs.pptx
detection of malicious URLs.pptx
manash40
 
Malware Dectection Using Machine learning
Malware Dectection Using Machine learningMalware Dectection Using Machine learning
Malware Dectection Using Machine learning
Shubham Dubey
 
Phishing attack, with SSL Encryption and HTTPS Working
Phishing attack, with SSL Encryption and HTTPS WorkingPhishing attack, with SSL Encryption and HTTPS Working
Phishing attack, with SSL Encryption and HTTPS Working
Sachin Saini
 
Attendance system based on face recognition using python by Raihan Sikdar
Attendance system based on face recognition using python by Raihan SikdarAttendance system based on face recognition using python by Raihan Sikdar
Attendance system based on face recognition using python by Raihan Sikdar
raihansikdar
 
CYBERBULLYING DETECTION USING MACHINE LEARNING-1 (1).pdf
CYBERBULLYING DETECTION USING              MACHINE LEARNING-1 (1).pdfCYBERBULLYING DETECTION USING              MACHINE LEARNING-1 (1).pdf
CYBERBULLYING DETECTION USING MACHINE LEARNING-1 (1).pdf
KumbidiGaming
 
VULNERABILITY ( CYBER SECURITY )
VULNERABILITY ( CYBER SECURITY )VULNERABILITY ( CYBER SECURITY )
VULNERABILITY ( CYBER SECURITY )
Kashyap Mandaliya
 
WEB Scraping.pptx
WEB Scraping.pptxWEB Scraping.pptx
WEB Scraping.pptx
Shubham Jaybhaye
 
Link analysis .. Data Mining
Link analysis .. Data MiningLink analysis .. Data Mining
Link analysis .. Data Mining
Mustafa Salam
 
Web scraping in python
Web scraping in python Web scraping in python
Web scraping in python
Viren Rajput
 
Cross Site Scripting ( XSS)
Cross Site Scripting ( XSS)Cross Site Scripting ( XSS)
Cross Site Scripting ( XSS)
Amit Tyagi
 
Graphical password authentication
Graphical password authenticationGraphical password authentication
Graphical password authenticationAsim Kumar Pathak
 

What's hot (20)

Web usage mining
Web usage miningWeb usage mining
Web usage mining
 
Phishing attack seminar presentation
Phishing attack seminar presentation Phishing attack seminar presentation
Phishing attack seminar presentation
 
Intrusion detection system
Intrusion detection systemIntrusion detection system
Intrusion detection system
 
Malware Classification and Analysis
Malware Classification and AnalysisMalware Classification and Analysis
Malware Classification and Analysis
 
Machine Learning in Cyber Security
Machine Learning in Cyber SecurityMachine Learning in Cyber Security
Machine Learning in Cyber Security
 
Fake News detection.pptx
Fake News detection.pptxFake News detection.pptx
Fake News detection.pptx
 
Application of Machine Learning in Cybersecurity
Application of Machine Learning in CybersecurityApplication of Machine Learning in Cybersecurity
Application of Machine Learning in Cybersecurity
 
detection of malicious URLs.pptx
detection of malicious URLs.pptxdetection of malicious URLs.pptx
detection of malicious URLs.pptx
 
Malware Dectection Using Machine learning
Malware Dectection Using Machine learningMalware Dectection Using Machine learning
Malware Dectection Using Machine learning
 
Phishing attack, with SSL Encryption and HTTPS Working
Phishing attack, with SSL Encryption and HTTPS WorkingPhishing attack, with SSL Encryption and HTTPS Working
Phishing attack, with SSL Encryption and HTTPS Working
 
Attendance system based on face recognition using python by Raihan Sikdar
Attendance system based on face recognition using python by Raihan SikdarAttendance system based on face recognition using python by Raihan Sikdar
Attendance system based on face recognition using python by Raihan Sikdar
 
CYBERBULLYING DETECTION USING MACHINE LEARNING-1 (1).pdf
CYBERBULLYING DETECTION USING              MACHINE LEARNING-1 (1).pdfCYBERBULLYING DETECTION USING              MACHINE LEARNING-1 (1).pdf
CYBERBULLYING DETECTION USING MACHINE LEARNING-1 (1).pdf
 
VULNERABILITY ( CYBER SECURITY )
VULNERABILITY ( CYBER SECURITY )VULNERABILITY ( CYBER SECURITY )
VULNERABILITY ( CYBER SECURITY )
 
WEB Scraping.pptx
WEB Scraping.pptxWEB Scraping.pptx
WEB Scraping.pptx
 
Link analysis .. Data Mining
Link analysis .. Data MiningLink analysis .. Data Mining
Link analysis .. Data Mining
 
Web Content Mining
Web Content MiningWeb Content Mining
Web Content Mining
 
Web scraping in python
Web scraping in python Web scraping in python
Web scraping in python
 
Cross Site Scripting ( XSS)
Cross Site Scripting ( XSS)Cross Site Scripting ( XSS)
Cross Site Scripting ( XSS)
 
Web spam
Web spamWeb spam
Web spam
 
Graphical password authentication
Graphical password authenticationGraphical password authentication
Graphical password authentication
 

Similar to Phishing Website Detection by Machine Learning Techniques Presentation.pdf

PHISHING URL - Review 1.pptx
PHISHING URL - Review 1.pptxPHISHING URL - Review 1.pptx
PHISHING URL - Review 1.pptx
Arulvincent3
 
dasdweda PPT.pptx
dasdweda PPT.pptxdasdweda PPT.pptx
dasdweda PPT.pptx
UditanshuPandey5
 
Rootconf_phishing_v2
Rootconf_phishing_v2Rootconf_phishing_v2
Rootconf_phishing_v2
Arjun BM
 
CYBER THREAT DETECTION PLATFORM USING MACHINE LEARNING.pptx
CYBER THREAT DETECTION PLATFORM USING MACHINE LEARNING.pptxCYBER THREAT DETECTION PLATFORM USING MACHINE LEARNING.pptx
CYBER THREAT DETECTION PLATFORM USING MACHINE LEARNING.pptx
TAMILMANIP6
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
Sri Ambati
 
Fuzzy Rough Set Feature Selection to Enhance Phishing Attack Detection
Fuzzy Rough Set Feature Selection to Enhance Phishing Attack Detection Fuzzy Rough Set Feature Selection to Enhance Phishing Attack Detection
Fuzzy Rough Set Feature Selection to Enhance Phishing Attack Detection
Wright State University, Dayton, OH, USA
 
Detecting Phishing Websites Using Machine Learning
Detecting Phishing Websites Using Machine LearningDetecting Phishing Websites Using Machine Learning
Detecting Phishing Websites Using Machine Learning
IRJET Journal
 
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malwareDefcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
DaveEdwards12
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Yogesh Sharma
 
Static Analysis Primer
Static Analysis PrimerStatic Analysis Primer
Static Analysis PrimerCoverity
 
Avtar's ppt
Avtar's pptAvtar's ppt
Avtar's pptmak57
 
Using Data Science for Cybersecurity
Using Data Science for CybersecurityUsing Data Science for Cybersecurity
Using Data Science for Cybersecurity
VMware Tanzu
 
Final .pptx
Final .pptxFinal .pptx
Final .pptx
MDTAHA059
 
Online talent sourcing - a future essentia
Online talent sourcing - a future essentiaOnline talent sourcing - a future essentia
Online talent sourcing - a future essentia
HSE Guru
 
How to build corporate size fraud prevention
How to build corporate size fraud preventionHow to build corporate size fraud prevention
How to build corporate size fraud prevention
Yury Leonychev
 
MOBILE CLOUD COMPUTING USING CRYPTOGRAPHIC HASH FUNCTION
MOBILE CLOUD COMPUTING USING CRYPTOGRAPHIC HASH FUNCTIONMOBILE CLOUD COMPUTING USING CRYPTOGRAPHIC HASH FUNCTION
MOBILE CLOUD COMPUTING USING CRYPTOGRAPHIC HASH FUNCTION
I Ruby
 
What is web scraping?
What is web scraping?What is web scraping?
What is web scraping?
Brijesh Prajapati
 
J2ee project lists:-Mumbai Academics
J2ee project lists:-Mumbai AcademicsJ2ee project lists:-Mumbai Academics
J2ee project lists:-Mumbai Academics
Mumbai Academisc
 
crawl technology saves money and time
crawl technology saves money and timecrawl technology saves money and time
crawl technology saves money and time
HashScraper Inc.
 
A density based clustering approach for web robot detection
A density based clustering approach for web robot detectionA density based clustering approach for web robot detection
A density based clustering approach for web robot detection
Wright State University, Dayton, OH, USA
 

Similar to Phishing Website Detection by Machine Learning Techniques Presentation.pdf (20)

PHISHING URL - Review 1.pptx
PHISHING URL - Review 1.pptxPHISHING URL - Review 1.pptx
PHISHING URL - Review 1.pptx
 
dasdweda PPT.pptx
dasdweda PPT.pptxdasdweda PPT.pptx
dasdweda PPT.pptx
 
Rootconf_phishing_v2
Rootconf_phishing_v2Rootconf_phishing_v2
Rootconf_phishing_v2
 
CYBER THREAT DETECTION PLATFORM USING MACHINE LEARNING.pptx
CYBER THREAT DETECTION PLATFORM USING MACHINE LEARNING.pptxCYBER THREAT DETECTION PLATFORM USING MACHINE LEARNING.pptx
CYBER THREAT DETECTION PLATFORM USING MACHINE LEARNING.pptx
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
 
Fuzzy Rough Set Feature Selection to Enhance Phishing Attack Detection
Fuzzy Rough Set Feature Selection to Enhance Phishing Attack Detection Fuzzy Rough Set Feature Selection to Enhance Phishing Attack Detection
Fuzzy Rough Set Feature Selection to Enhance Phishing Attack Detection
 
Detecting Phishing Websites Using Machine Learning
Detecting Phishing Websites Using Machine LearningDetecting Phishing Websites Using Machine Learning
Detecting Phishing Websites Using Machine Learning
 
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malwareDefcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Static Analysis Primer
Static Analysis PrimerStatic Analysis Primer
Static Analysis Primer
 
Avtar's ppt
Avtar's pptAvtar's ppt
Avtar's ppt
 
Using Data Science for Cybersecurity
Using Data Science for CybersecurityUsing Data Science for Cybersecurity
Using Data Science for Cybersecurity
 
Final .pptx
Final .pptxFinal .pptx
Final .pptx
 
Online talent sourcing - a future essentia
Online talent sourcing - a future essentiaOnline talent sourcing - a future essentia
Online talent sourcing - a future essentia
 
How to build corporate size fraud prevention
How to build corporate size fraud preventionHow to build corporate size fraud prevention
How to build corporate size fraud prevention
 
MOBILE CLOUD COMPUTING USING CRYPTOGRAPHIC HASH FUNCTION
MOBILE CLOUD COMPUTING USING CRYPTOGRAPHIC HASH FUNCTIONMOBILE CLOUD COMPUTING USING CRYPTOGRAPHIC HASH FUNCTION
MOBILE CLOUD COMPUTING USING CRYPTOGRAPHIC HASH FUNCTION
 
What is web scraping?
What is web scraping?What is web scraping?
What is web scraping?
 
J2ee project lists:-Mumbai Academics
J2ee project lists:-Mumbai AcademicsJ2ee project lists:-Mumbai Academics
J2ee project lists:-Mumbai Academics
 
crawl technology saves money and time
crawl technology saves money and timecrawl technology saves money and time
crawl technology saves money and time
 
A density based clustering approach for web robot detection
A density based clustering approach for web robot detectionA density based clustering approach for web robot detection
A density based clustering approach for web robot detection
 

Recently uploaded

Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
Celine George
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
thanhdowork
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
goswamiyash170123
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
Scholarhat
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
Israel Genealogy Research Association
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Dr. Shivangi Singh Parihar
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Dr. Vinod Kumar Kanvaria
 
DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
taiba qazi
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
amberjdewit93
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 

Recently uploaded (20)

Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
 
DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 

Phishing Website Detection by Machine Learning Techniques Presentation.pdf

  • 1. SHREYA GOPAL SUNDARI PHISHING WEBSITE DETECTION by MACHINE LEARNING TECHNIQUES
  • 2. INTRODUCTION • Phishing is the most commonly used social engineering and cyber attack. • Through such attacks, the phisher targets naïve online users by tricking them into revealing confidential information, with the purpose of using it fraudulently. • In order to avoid getting phished, • users should have awareness of phishing websites. • have a blacklist of phishing websites which requires the knowledge of website being detected as phishing. • detect them in their early appearance, using machine learning and deep neural network algorithms. • Of the above three, the machine learning based method is proven to be most effective than the other methods. • Even then, online users are still being trapped into revealing sensitive information in phishing websites.
  • 3. OBJECTIVES A phishing website is a common social engineering method that mimics trustful uniform resource locators (URLs) and webpages. The objective of this project is to train machine learning models and deep neural nets on the dataset created to predict phishing websites. Both phishing and benign URLs of websites are gathered to form a dataset and from them required URL and website content-based features are extracted. The performance level of each model is measures and compared.
  • 4. APPROACH Below mentioned are the steps involved in the completion of this project: • Collect dataset containing phishing and legitimate websites from the open source platforms. • Write a code to extract the required features from the URL database. • Analyze and preprocess the dataset by using EDA techniques. • Divide the dataset into training and testing sets. • Run selected machine learning and deep neural network algorithms like SVM, Random Forest, Autoencoder on the dataset. • Write a code for displaying the evaluation result considering accuracy metrics. • Compare the obtained results for trained models and specify which is better.
  • 5. DATA COLLECTION • Legitimate URLs are collected from the dataset provided by University of New Brunswick, https://www.unb.ca/cic/datasets/url-2016.html. • From the collection, 5000 URLs are randomly picked. • Phishing URLs are collected from opensource service called PhishTank . This service provide a set of phishing URLs in multiple formats like csv, json etc. that gets updated hourly. • Form the obtained collection, 5000 URLs are randomly picked.
  • 6. FEATURE SELECTION • The following category of features are selected: • Address Bar based Features • Domain based Features • HTML & Javascript based Feature • Address Bar based Features considered are: • Domian of URL • Redirection ‘//’ in URL • IP Address in URL • ‘http/https’ in Domain name • ‘@’ Symbol in URL • Using URL Shortening Service • Length of URL • Prefix or Suffix "-" in Domain • Depth of URL
  • 7. FEATURE SELECTION (CONT.) • Domain based Features considered are: • HTML and JavaScript based Features considered are: • All together 17 features are extracted from the dataset. • DNS Record • Age of Domain • Website Traffic • End Period of Domain • Iframe Redirection • Disabling Right Click • Status Bar Customization • Website Forwarding
  • 9. MACHINE LEARNING MODELS • This is a supervised machine learning task. There are two major types of supervised machine learning problems, called classification and regression. • This data set comes under classification problem, as the input URL is classified as phishing (1) or legitimate (0). The machine learning models (classification) considered to train the dataset in this notebook are: • Decision Tree • Random Forest • Multilayer Perceptrons • XGBoost • Autoencoder Neural Network • Support Vector Machines
  • 10. MODEL EVALUATION • The models are evaluated, and the considered metric is accuracy. • Below Figure shows the training and test dataset accuracy by the respective models: • For the above it is clear that the XGBoost model gives better performance. The model is saved for further usage.
  • 11. NEXT STEPS • Working on this project is very knowledgeable and worth the effort. • Through this project, one can know a lot about the phishing websites and how they are differentiated from legitimate ones. • This project can be taken further by creating a browser extensions of developing a GUI. • These should classify the inputted URL to legitimate or phishing with the use of the saved model.