1. Fake Website Detection using Machine
Learning Algorithms
PresentedBy
MD SAJADUL ISLAM (193002138)
MST. NUSRAT JAHAN JYOTI (192002022)
1
SupervisedBy
MD. SOLAIMAN MIA
Assistant Professor
(Green University of Bangladesh)
GreenUniversityof Bangladesh
Department of CSE
1
Co-Supervised By
MD. GULZAR HUSSAIN
Lecturer, GUB
2. Contents
Introduction
Motivation
Objectives
Literature Review
Problem Description
Draft Plan
Gantt Chart
Proposed Budget Components
Conclusion
References
2
3. Introduction
Phishing attacks are increasing every year by
200%
Fake Website is the most common way to
conduct a phishing attack
Multinational companies are losing 100
billion dollars per year because of phishing
attack
3
4. Motivation
4
Desire to save internet users from the phishing attack
Reduce the percentage of phishing attack
Using URLs to find out Phishing Websites
To save people’s personal information
5. Objectives
Detecting fake website URLs using Machine Learning Algorithms
To train machine learning models on the dataset and predict
phishing websites
To use the efficient machine learning models
Maintaining an accuracy rate of more than 90%
5
6. LiteratureReview
Use the Neural Network to
detect URL
Use binary visualization
System can be applied to
Phishing
And non-phishing website
classification.
Limited Dataset to
conduct the experiment
Limited Dataset affect the
prediction and efficiency
of the model
Working Process Usefulness Drawbacks Scopeof Improvement
A Novel Approach to Detect Phishing Attacks using Binary Visualization and Machine
Learning [1]
Adding more datasets for
both training and testing
Using the different
models in making
predictions
6
7. LiteratureReview(Cont.)
Phishing detection
Mitigation in emails
and website
Phishing detection
And mitigation in
emails/website
A lot phishing detection
method such as rule base
method, decision tree,
associative classification,
SVM, NN were listed but
none have been
demonstrated in the
research.
Working Process Usefulness Drawbacks Scopeof Improvement
Phishing attacks in Qatar: A literature review of the problems and solutions [2]
Incorporate other
phishing detection
techniques for example
Random Forest, Light
GBM, and XG Boost.
Improving the accuracy
level
7
8. LiteratureReview(Cont.)
Use binary visualization
User training to increase
awareness on phishing
useful for Phishing
detection
offensive defense
No matrices for end-user
evaluation for each
website
No features for user
involvement
Working Process Usefulness Drawbacks Scopeof Improvement
Phishing detection: A literature survey[3]
Adding some user
matrices for end-user
evaluation for each
website
Increasing the accuracy
level
8
9. LiteratureReview(Cont.)
Use Support vector
machine, Random forest
and CNN
Machine learning and
deep learning
CNN had given more
accuracy than SVM and RF
Identifying Fake URL
Accuracy was less for RF
and SVM 67% and 64%
accuracy
Missing user involvement
Working Process Usefulness Drawbacks Scopeof Improvement
Phishing Website Detection using Machine Learning Techniques and CNN [4]
Using Random
Forest Light GBM XG
Boost.
Expected accuracy of up
to 90%
9
10. LiteratureReview(Cont.)
Use a Decision tree
and Support Vector
Machine.
Proposed a deep
learning-based URL
detector. The authors
argued that the method
can produce insights
from URLs.
Deep learning methods
demand more time to
produce an output. In
addition, it processes
the URL and matches it
with the library to
generate an output.
Working Process Usefulness Drawbacks Scopeof Improvement
Phishing Website Detection using Machine Learning Techniques and CNN [4]
Random Forest Light
GBM XG Boost for
producing an output in a
short time
Increase the accuracy
10
12. ProblemDescription(cont.)
12
Random Forest Light GBM XG Boost
Performs well even if
the data contains
null/missing values.
Lower memory usage. Effective for large
data sets
Expected up to 90%
accuracy
Desiring to get high
efficiency
Do not need
normalized features
Random Forest performs both regression and classification tasks.
Light GBM has faster training speed and higher efficiency
XG Boost has an in-built capability to handle missing values.
3rd Stage (Three Machine Learning Classifiers)
15. DraftPlan (cont.)
Dataset
Collect the URL
Lexical Feature
Extraction
Classifier Performance
Analysis
Create the URL into
lexical numeric
Evaluation using
Machine Learning
Evaluation the
accuracy
15
Random
Forest
Light
GBM
XG Boost
16. 16
GanttChart
SL.
No.
Months
Thesis Activities
Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun
1 Read Existing Papers
2 Finding Limitation
3 Planning
4 Fixed Objectives
5 Optimization Formulation
6 Design Learning Algorithm
7 Data Collection
8 Implementation
9 Comparison
10 Paper Writing & Publication
17. ProposedBudgetComponents
17
SL.
No.
Budget Title No.
of
Unit
Per Unit
Cost
Total Cost
(BDT)
1 Research Supervision 3 6000 18000
2 Data Collection - 12000 12000
3 Access Researching Website - 1200 3000
4 Environment Setup 2 5000 10000
5 Implementation and Testing - 15000 15000
6 Paper Publication Cost 2 7000 14000
7 Others - 4000 4000
Total: 76000 (BDT)
Table 1: Budget Plan for Research
18. Conclusion
18
Creating a fake website detection system
Identify a fake website URL with the best accuracy
In the Future, System can upgrade to automatically Detect the web page and the
compatibility of the Application with the web browser.
Additional work also can be done by adding some other characteristics to distinguish the
fake web pages from the legitimate web pages
19. References
[1] L. Barlow, G. Bendiab, S. Shiaeles, and N. Savage, “A Novel Approach to Detect Phishing Attacks using Binary
Visualization and Machine Learning,” in Proceedings - 2020 IEEE World Congress on Services, SERVICES 2020, Oct.
2020, pp. 177–182. doi: 10.1109/SERVICES48979.2020.00046.
[2] Y. Al-Hamar, H. Kolivand, and A. Al-Hamar, “Phishing attacks in Qatar: A literature review of the problems and
solutions,” in Proceedings – International Conference on Developments in eSystems Engineering, DeSE, Oct. 2021, vol.
October-2021, pp. 837–842. doi: 10.1109/DeSE.2020.00155.
[3] A. Basit, M. Zafar, A. R. Javed, and Z. Jalil, “A Novel Ensemble Machine Learning Method to Detect Phishing Attack,”
Nov. 2020. doi: 10.1109/INMIC50486.2020.9318210.
[4] Deepa Mary Vargheese, Sreelakshmi N R “Phishing Website Detection using Machine Learning Techniques and
CNN,” International Journal of Engineering Research & Technology (IJERT) ISSN: 2278-0181 Published by, www.ijert.org
ICCIDT - 2022 Conference Proceedings
[5] Gandotra E., Gupta D, “An Efficient Approach for Phishing Detection using Machine Learning”, Algorithms for
Intelligent Systems, Springer, Singapore, 2021, https://doi.org/10.1007/978-981-15-8711-5_ 12.
19