This document describes a mini project using an XBNet classifier to detect phishing attacks. XBNet is a combination of tree-based models and neural networks that is highly effective for tabular data classification. The project aims to help individuals identify phishing URLs to provide safer online practices. It extracts features from URLs and uses XBNet to classify them as legitimate or phishing. The model achieves good performance but is limited to tabular data and requires significant resources for training. Future work could focus on extending XBNet for unstructured data.
1. MINI PROJECT
XBNET CLASSIFIER IN PHISHING ATTACK
DETECTION
GUIDED BY:
DR.CHANDRA MOULI P.V.S.S.R.
ASSOCIATE PROFESSOR
HEAD OF THE DEPARTMENT
DEPARTMENT OF COMPUTER SCIENCE
PRESENTED BY:
KAVITA – P211307
M.Sc. COMPUTER SCIENCE,
DEPARTMENT OF COMPUTER SCIENCE
DEPARTMENT OF COMPUTER SCIENCE
2. • Phishing isa form of cybercrime inwhich a target iscontacted via
email, telephone, or text message by an attacker disguising as a
reputable entity or person.
• XBNet (Extremely Boosted Neural Network), is combination of
tree-based models with neural networks to create a robust
architecture.
• It is trained by using a novel optimization technique, Boosted
Gradient Descent for Tabular Data which increases its
interpretability and performance.
DEPARTMENT OF COMPUTER SCIENCE
INTRODUCTION
3. The purpose of this project is to help individuals identify these
phishing URLs in order to provide safer practices online.
A popular model for tabular data is boosted trees, a highly
efficacious and extensively used XBNet classifier, and it also
provides good interpretability compared to neural networks
DEPARTMENT OF COMPUTER SCIENCE
AIM OF THE PROJECT
4. Types of Phishing Tactics
96% of phishing attacks arrive by email.
3% of phishing attacks is done over the telephone.
This is also known as vishing.
1% of phishing attacks is done via text message.
This is known as smishing.
Email
Telephone
Text
Message
DEPARTMENT OF COMPUTER SCIENCE
6. 30%of phishing emails
are opened by users
12%of these targeted
users click on the
malicious link or
attachment.
97%of the users are
unable to recognize a
sophisticated phishing
email.
2022Phishing Statistics
DEPARTMENT OF COMPUTER SCIENCE
10. - =
.
?
@ ~
&
! +
*
, #
$
%
space
Feature Extraction
Lengthof URL,domain, path, query,and fragment
are extracted.
Quantity of specific characters in URL, domain,
path, query, and fragment are extracted. These
characters include:
50 T
otal Features Used in Model
DEPARTMENT OF COMPUTER SCIENCE
11. Feature Extraction
Using a function from urllib library, protocol, domain, path, query, and
fragment were extracted from the URL and respective columns were created.
The protocol column was dropped as more sophisticated phishing URLs are
labeled secure with https:/.
DEPARTMENT OF COMPUTER SCIENCE
14. This architecture consists two parts.
Firstly, instead of randomly initializing weights for gradient
descent in the first iteration, we train our tree first and the feature
importance given by the tree are used as weights in the first
iteration.
Secondly, we decrease the magnitude of the value of feature
importance as given in the algorithm during backward propagation
so that instead of creating a big change in the weights, it just
slightly nudges it, thus not disrupting the process.
DEPARTMENT OF COMPUTER SCIENCE
ARCHITECTURE OF XBNET
17. How to Avoid Phishing Attacks
ST
AY INFORMED
Learn about new phishing techniques
that are being developed to avoid
falling prey to one.
UTILIZE ‘FISHING FOR PHISHERS’
When in doubt, use the ‘Fishing for
Phishers’ app to verify the
authenticity of a website.
THINK BEFORE YOU CLICK
Never click on hyperlinks
without examining the hidden
URL.
1
2
3
DEPARTMENT OF COMPUTER SCIENCE
18. XBNet requires more time and resources for training as we train a gradient-boosted
tree in every layer.
Currently, XBNet only works on tabular data and is unable to process unstructured
data. If it is extended for usage in unstructured data, the number of parameters will
see a drastic jump, and hence care has to be taken with respect to it.
DEPARTMENT OF COMPUTER SCIENCE
LIMITATION AND FUTURE WORK