The most popular way to deceive online users nowadays is phishing. Consequently, to increase cybersecurity, more efficient web page phishing detection mechanisms are needed. In this paper, we propose an approach that rely on websites image and URL to deals with the issue of phishing website recognition as a classification challenge. Our model uses webpage URLs and images to detect a phishing attack using convolution neural networks (CNNs) to extract the most important features of website images and URLs and then classifies them into benign and phishing pages. The accuracy rate of the results of the experiment was 99.67%, proving the effectiveness of the proposed model in detecting a web phishing attack.
A Comparative Analysis of Different Feature Set on the Performance of Differe...gerogepatton
This document summarizes a research paper that analyzed the performance of different machine learning algorithms (Random Forest, Probabilistic Neural Network, XGBoost) using different feature sets for detecting phishing websites. The researchers conducted experiments on a benchmark phishing website dataset using 6 categories of features: address bar-based, abnormal-based, HTML/JavaScript-based, domain-based, feature selection, and full dataset. The results showed that XGBoost achieved the best performance (>97% accuracy) when using the full dataset of features, outperforming the other algorithms and feature sets. Therefore, the combination of all features provided the most important information for accurately detecting phishing websites.
A COMPARATIVE ANALYSIS OF DIFFERENT FEATURE SET ON THE PERFORMANCE OF DIFFERE...ijaia
Reducing the risk pose by phishers and other cybercriminals in the cyber space requires a robust and automatic means of detecting phishing websites, since the culprits are constantly coming up with new techniques of achieving their goals almost on daily basis. Phishers are constantly evolving the methods they used for luring user to revealing their sensitive information. Many methods have been proposed in past for phishing detection. But the quest for better solution is still on. This research covers the development of phishing website model based on different algorithms with different set of features in order to investigate the most significant features in the dataset
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...IRJET Journal
1) The document presents a machine learning approach to detect phishing URLs using logistic regression. It trains a logistic regression model on a dataset of 420,467 URLs that have been classified as either phishing or legitimate.
2) It preprocesses the URLs using tokenization before training the logistic regression model. The trained model is able to classify new URLs with 96% accuracy as either phishing or legitimate based on the URL features.
3) The proposed approach provides an automated way to detect phishing URLs in real-time and help prevent phishing attacks. Future work could involve developing a browser extension using this approach and increasing the dataset size for higher accuracy.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORKijcseit
The World Wide Web has become an important part of our everyday life for information communication
and knowledge dissemination. It helps to transact information timely, rapidly and easily. Identifying theft
and identity fraud are referred as two sides of cyber-crime in which hackers and malicious users obtain the
personal data of existing legitimate users to attempt fraud or deception motivation for financial gain.
Malicious URLs host unsolicited content (spam, phishing, drive-by exploits, etc.) and lure unsuspecting
users to become victims of scams (monetary loss, theft of private information, and malware installation),
and cause losses of billions of dollars every year. To detect such crimes systems should be fast and precise
with the ability to detect new malicious content. Traditionally, this detection is done mostly through the
usage of blacklists. However, blacklists cannot be exhaustive, and lack the ability to detect newly generated
malicious URLs. To improve the generality of malicious URL detectors, machine learning techniques have
been explored with increasing attention in recent years. In this paper, I use a simple algorithm to detect
and predicting URLs it is good or bad and compared with two other algorithms to know (SVM, LR).
In spite of the development of aversion strategies, phishing remains an essential risk even after the
primary countermeasures and in view of receptive URL blacklisting. This strategy is insufficient because of the
short lifetime of phishing websites. In order to overcome this problem, developing a real-time phishing website
detection method is an effective solution. This research introduces the PrePhish algorithm which is an automated
machine learning approach to analyze phishing and non-phishing URL to produce reliable result. It represents that
phishing URLs typically have couple of connections between the part of the registered domain level and the path
or query level URL. Using these connections URL is characterized by inter-relatedness and it estimates using
features mined from attributes. These features are then used in machine learning technique to detect phishing
URLs from a real dataset. The classification of phishing and non-phishing website has been implemented by
finding the range value and threshold value for each attribute using decision making classification. This method is
also evaluated in Matlab using three major classifiers SVM, Random Forest and Naive Bayes to find how it works
on the dataset assessed
IRJET- Detecting Phishing Websites using Machine LearningIRJET Journal
This document describes a research project that aims to implement machine learning techniques to detect phishing websites. The researchers plan to test algorithms like logistic regression, SVM, decision trees and neural networks on a dataset of phishing links. They will evaluate the performance of these algorithms and develop a browser plugin using the best model. This plugin will detect malicious URLs and protect users from phishing attacks. The document provides background on phishing and outlines the proposed approach, dataset, algorithms to be tested, planned Chrome extension implementation, and expected results sections of the project.
IRJET- Fake Profile Identification using Machine LearningIRJET Journal
This document proposes a framework for identifying fake profiles on social media sites using machine learning. It involves selecting profile attributes from a dataset, training a random forest classifier model on 80% of the labeled real and fake profiles, and testing it on the remaining 20%. The random forest classifier is able to accurately classify profiles as real or fake with an efficiency of 95% based on attributes like number of friends, followers, and status updates. This automatic fake profile detection method could help social media sites manage the large volume of profiles that can't be manually reviewed.
A Comparative Analysis of Different Feature Set on the Performance of Differe...gerogepatton
This document summarizes a research paper that analyzed the performance of different machine learning algorithms (Random Forest, Probabilistic Neural Network, XGBoost) using different feature sets for detecting phishing websites. The researchers conducted experiments on a benchmark phishing website dataset using 6 categories of features: address bar-based, abnormal-based, HTML/JavaScript-based, domain-based, feature selection, and full dataset. The results showed that XGBoost achieved the best performance (>97% accuracy) when using the full dataset of features, outperforming the other algorithms and feature sets. Therefore, the combination of all features provided the most important information for accurately detecting phishing websites.
A COMPARATIVE ANALYSIS OF DIFFERENT FEATURE SET ON THE PERFORMANCE OF DIFFERE...ijaia
Reducing the risk pose by phishers and other cybercriminals in the cyber space requires a robust and automatic means of detecting phishing websites, since the culprits are constantly coming up with new techniques of achieving their goals almost on daily basis. Phishers are constantly evolving the methods they used for luring user to revealing their sensitive information. Many methods have been proposed in past for phishing detection. But the quest for better solution is still on. This research covers the development of phishing website model based on different algorithms with different set of features in order to investigate the most significant features in the dataset
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...IRJET Journal
1) The document presents a machine learning approach to detect phishing URLs using logistic regression. It trains a logistic regression model on a dataset of 420,467 URLs that have been classified as either phishing or legitimate.
2) It preprocesses the URLs using tokenization before training the logistic regression model. The trained model is able to classify new URLs with 96% accuracy as either phishing or legitimate based on the URL features.
3) The proposed approach provides an automated way to detect phishing URLs in real-time and help prevent phishing attacks. Future work could involve developing a browser extension using this approach and increasing the dataset size for higher accuracy.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORKijcseit
The World Wide Web has become an important part of our everyday life for information communication
and knowledge dissemination. It helps to transact information timely, rapidly and easily. Identifying theft
and identity fraud are referred as two sides of cyber-crime in which hackers and malicious users obtain the
personal data of existing legitimate users to attempt fraud or deception motivation for financial gain.
Malicious URLs host unsolicited content (spam, phishing, drive-by exploits, etc.) and lure unsuspecting
users to become victims of scams (monetary loss, theft of private information, and malware installation),
and cause losses of billions of dollars every year. To detect such crimes systems should be fast and precise
with the ability to detect new malicious content. Traditionally, this detection is done mostly through the
usage of blacklists. However, blacklists cannot be exhaustive, and lack the ability to detect newly generated
malicious URLs. To improve the generality of malicious URL detectors, machine learning techniques have
been explored with increasing attention in recent years. In this paper, I use a simple algorithm to detect
and predicting URLs it is good or bad and compared with two other algorithms to know (SVM, LR).
In spite of the development of aversion strategies, phishing remains an essential risk even after the
primary countermeasures and in view of receptive URL blacklisting. This strategy is insufficient because of the
short lifetime of phishing websites. In order to overcome this problem, developing a real-time phishing website
detection method is an effective solution. This research introduces the PrePhish algorithm which is an automated
machine learning approach to analyze phishing and non-phishing URL to produce reliable result. It represents that
phishing URLs typically have couple of connections between the part of the registered domain level and the path
or query level URL. Using these connections URL is characterized by inter-relatedness and it estimates using
features mined from attributes. These features are then used in machine learning technique to detect phishing
URLs from a real dataset. The classification of phishing and non-phishing website has been implemented by
finding the range value and threshold value for each attribute using decision making classification. This method is
also evaluated in Matlab using three major classifiers SVM, Random Forest and Naive Bayes to find how it works
on the dataset assessed
IRJET- Detecting Phishing Websites using Machine LearningIRJET Journal
This document describes a research project that aims to implement machine learning techniques to detect phishing websites. The researchers plan to test algorithms like logistic regression, SVM, decision trees and neural networks on a dataset of phishing links. They will evaluate the performance of these algorithms and develop a browser plugin using the best model. This plugin will detect malicious URLs and protect users from phishing attacks. The document provides background on phishing and outlines the proposed approach, dataset, algorithms to be tested, planned Chrome extension implementation, and expected results sections of the project.
IRJET- Fake Profile Identification using Machine LearningIRJET Journal
This document proposes a framework for identifying fake profiles on social media sites using machine learning. It involves selecting profile attributes from a dataset, training a random forest classifier model on 80% of the labeled real and fake profiles, and testing it on the remaining 20%. The random forest classifier is able to accurately classify profiles as real or fake with an efficiency of 95% based on attributes like number of friends, followers, and status updates. This automatic fake profile detection method could help social media sites manage the large volume of profiles that can't be manually reviewed.
A SURVEY ON WEB SPAM DETECTION METHODS: TAXONOMYIJNSA Journal
Web spam refers to some techniques, which try to manipulate search engine ranking algorithms in order to raise web page position in search engine results. In the best case, spammers encourage viewers to visit their sites, and provide undeserved advertisement gains to the page owner. In the worst case, they use malicious contents in their pages and try to install malware on the victim’s machine. Spammers use three kinds of spamming techniques to get higher score in ranking. These techniques are Link based techniques, hiding techniques and Content-based techniques. Existing spam pages cause distrust to search engine results. This not only wastes the time of visitors, but also wastes lots of search engine resources. Hence spam detection methods have been proposed as a solution for web spam in order to reduce negative effects of spam pages. Experimental results show that some of these techniques are
working well and can find spam pages more accurate than the others. This paper classifies web spam techniques and the related detection methods.
COMPARATIVE ANALYSIS OF ANOMALY BASED WEB ATTACK DETECTION METHODSIJCI JOURNAL
In the present scenario, protection of websites from web-based attacks is a great challenge due to the bad intention of the malicious user over the Internet. Researchers are trying to find the optimum solution to prevent these web attack activities. There are several techniques available to prevent the web attacks from happening like firewalls, but most of the firewall is not designed to prevent the attack against the websites. Moreover, firewalls mostly work on signature-based detection method. In this paper, we have analyzed different anomaly-based detection methods for the detection of web-based attacks initiated by malicious users. Working of these methods is in a different direction to the signature-based detection method which only detects the web-based attacks for which a signature has been previously created. In this paper, we have introduced two methods: Attribute Length Method (ALM) and Attribute Character Distribution Method (ACDM) which is based on attribute values. Further, we have done the mathematical analysis of three different web attacks and compare their False Accept Rate (FAR) results for both the methods. Results analysis reveals that ALM is more efficient method than ACDM in the detection of web-based attacks.
Seminar on detecting fake accounts in social media using machine learningParvathi Sanil Nair
The document discusses research on detecting fake identities using machine learning on social media platforms. It begins with an abstract stating that little research has been done to detect fake human accounts specifically. The research applies machine learning models to attributes of fake accounts in hopes of advancing detection. It then outlines the contents which include introduction, related work on spam detection, research on detecting fake identities, research results, and conclusions. Key points are that identity deception is a major problem on social media, and while models can detect bot accounts, they are less successful at detecting fake human accounts using only profile attributes without behavioral data. The random forest model achieved the best results with an F1 score of 49.75% for detecting fake human accounts.
IRJET- Identification of Clone Attacks in Social Networking SitesIRJET Journal
This document discusses identifying clone attacks in social networking sites. It proposes a two-level approach using fuzzy-sim algorithm for profile matching and three algorithms (Predictive FP Growth, Eclat, and Apriori) for user activity matching. An experiment compares the execution times of the three algorithms, finding that Predictive FP Growth has the fastest time of 181 milliseconds, making it best for user activity matching.
This document provides an overview of CAPTCHAs (Completely Automated Public Turing test to tell Computers and Humans Apart). It discusses why CAPTCHAs are used, defines them, describes the different types including text-based, graphic-based and audio-based CAPTCHAs. It also outlines some of the major applications of CAPTCHAs, such as preventing comment spam in blogs, protecting website registration, and protecting email addresses from scrapers.
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNINGijcsit
With the advent of the Internet and social media, while hundreds of people have benefitted from the vast sources of information available, there has been an enormous increase in the rise of cyber-crimes, particularly targeted towards women. According to a 2019 report in the [4] Economics Times, India has witnessed a 457% rise in cybercrime in the five year span between 2011 and 2016. Most speculate that this is due to impact of social media such as Facebook, Instagram and Twitter on our daily lives. While these definitely help in creating a sound social network, creation of user accounts in these sites usually needs just an email-id. A real life person can create multiple fake IDs and hence impostors can easily be made. Unlike the real world scenario where multiple rules and regulations are imposed to identify oneself in a unique manner (for example while issuing one’s passport or driver’s license), in the virtual world of social media, admission does not require any such checks. In this paper, we study the different accounts of Instagram, in particular and try to assess an account as fake or real using Machine Learning techniques namely Logistic Regression and Random Forest Algorithm.
IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...IRJET Journal
This document discusses approaches for detecting phishing websites using machine learning. It describes three main approaches: 1) analyzing features of the URL, 2) checking the legitimacy of the website by examining the hosting and management details, and 3) using visual appearance analysis to check the genuineness of the website. It then proposes a hybrid approach that uses blacklist/whitelist screening, heuristic analysis of website features, and visual similarity comparisons to flag potential phishing sites.
Research of usability of Mashup Tools done for Kent County Council as part of the Pic and Mix Pilot (2009), opening up Kent related datasets for all to use and exploit.
AN INTELLIGENT CLASSIFICATION MODEL FOR PHISHING EMAIL DETECTIONIJNSA Journal
Phishing attacks are one of the trending cyber-attacks that apply socially engineered messages that are
communicated to people from professional hackers aiming at fooling users to reveal their sensitive
information, the most popular communication channel to those messages is through users’ emails. This
paper presents an intelligent classification model for detecting phishing emails using knowledge discovery,
data mining and text processing techniques. This paper introduces the concept of phishing terms weighting
which evaluates the weight of phishing terms in each email. The pre-processing phase is enhanced by
applying text stemming and WordNet ontology to enrich the model with word synonyms. The model applied
the knowledge discovery procedures using five popular classification algorithms and achieved a notable
enhancement in classification accuracy; 99.1% accuracy was achieved using the Random Forest algorithm
and 98.4% using J48, which is –to our knowledge- the highest accuracy rate for an accredited data set.
This paper also presents a comparative study with similar proposed classification techniques.
An adaptive anomaly request detection framework based on dynamic web applica...IJECEIAES
Web application firewall is a highly effective application in protecting the application layer and database layer of websites from attack access. This paper proposes a new web application firewall deploying method based on dynamic web application profiling (DWAP) analysis technique. This is a method to deploy a firewall based on analyzing website access data. DWAP is improved to integrate deeply into the structure of the website to increase the compatibility of the anomaly detection system into each website, thereby improving the ability to detect abnormal requests. To improve the compatibility of the web application firewall with protected objects, the proposed system consists of two parts with the main tasks are: i) Detect abnormal access in web application (WA) access; ii) Semi-automatic update the attack data to the abnormal access detection system during WA access. This new method is applicable in real-time detection systems where updating of new attack data is essential since web attacks are increasingly complex and sophisticated.
This document discusses using machine learning and deep learning techniques for classification tasks in R. It first uses decision trees to identify the minimal effective parameters for detecting phishing websites, finding that server form handler (SFH) and pop-up windows were most important. It then trains a convolutional neural network on the CIFAR-10 dataset for image classification. Some challenges included limited phishing website data and hardware constraints for deep learning models. Future work involves executing deep learning models on a GPU.
This document evaluates the performance of various classification techniques for detecting phishing websites. It uses a dataset from the UCI Machine Learning Repository containing attributes of phishing websites and a target variable to classify websites as phishing or legitimate. Several classification algorithms from the Weka tool are applied to the dataset, including Naive Bayes, decision trees, and k-nearest neighbor. Their performances are compared to determine the most accurate technique for phishing detection. Feature selection and data preprocessing are also discussed as important steps for building an effective classification model.
An iac approach for detecting profile cloningIJNSA Journal
Nowadays, Online Social Networks (OSNs) are popular websites on the internet, which millions of users
register on and share their own personal information with others. Privacy threats and disclosing personal
information are the most important concerns of OSNs’ users. Recently, a new attack which is named
Identity Cloned Attack is detected on OSNs. In this attack the attacker tries to make a fake identity of a real
user in order to access to private information of the users’ friends which they do not publish on the public
profiles. In today OSNs, there are some verification services, but they are not active services and they are
useful for users who are familiar with online identity issues. In this paper, Identity cloned attacks are
explained in more details and a new and precise method to detect profile cloning in online social networks
is proposed. In this method, first, the social network is shown in a form of graph, then, according to
similarities among users, this graph is divided into smaller communities. Afterwards, all of the similar
profiles to the real profile are gathered (from the same community), then strength of relationship (among
all selected profiles and the real profile) is calculated, and those which have the less strength of
relationship will be verified by mutual friend system. In this study, in order to evaluate the effectiveness of
proposed method, all steps are applied on a dataset of Facebook, and finally this work is compared with
two previous works by applying them on the dataset.
This document describes a proposed system for detecting identity crime and credit application fraud. The system uses a communal detection (CD) algorithm to find real social relationships and reduce suspicion scores to combat synthetic identities. It includes modules for credit applications, whitelist creation, and verification. The system is designed to address limitations of existing rules-based detection systems and better handle changing criminal behaviors.
This document discusses information security as it relates to e-commerce applications. It covers several technical security attack methods that e-commerce applications can be vulnerable to, including financial frauds, spam, phishing, bots, DDoS attacks, brute force attacks, SQL injections, XSS, and Trojan horses. It also discusses vulnerability assessments, penetration testing stages and methods, and ISO/IEC 27001:2013 as an international standard for managing information security risks.
Social network has become so popular with overwhelming high rate of growth, due to this popularity the online social networks is facing the issues of spamming, which has leads to unsubstantial economic loss to this menace of spam and spammers activities. It has leads to uncontrollable dissemination of viruses and malwares, promotional ads, phishing, and scams. spam activities has enter a new dangerous dimension, the spammers have step up their games and tactics online social networks, it consumes large amounts of network bandwidth leading to less revenue and significant economic loss to both private and public sectors. From the previous scholars work on spammer classification taxonomy, various machine learning techniques have been extensively used to detect spam activities and spammers in online social networks. There are various classifier that are learn over content-based features extracted from the user's interactions and profiles to label them as spam/spammers or legitimate. But recently, new network structural bench mark features have been proposed for spammer detection task, but their importance using structural bench mark learning methods has not been extensively evaluated yet. In this research work, we evaluate the the metric performance of some structural bench mark learning methods using scientific and strategic approach based attributes extracted from an interaction network for the task of spammer detection in online social network.
This document outlines a presentation on detecting phishing websites using machine learning. The goals of the project are to develop a machine learning model to identify phishing URLs and integrate it into a web application. It will discuss collecting and preprocessing data, selecting machine learning algorithms, developing the web app, and addressing challenges with existing phishing detection techniques. The project aims to help reduce online theft and educate users on phishing risks and countermeasures.
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORKijcseit
The World Wide Web has become an important part of our everyday life for information communication
and knowledge dissemination. It helps to transact information timely, rapidly and easily. Identifying theft
and identity fraud are referred as two sides of cyber-crime in which hackers and malicious users obtain the
personal data of existing legitimate users to attempt fraud or deception motivation for financial gain.
Malicious URLs host unsolicited content (spam, phishing, drive-by exploits, etc.) and lure unsuspecting
users to become victims of scams (monetary loss, theft of private information, and malware installation),
and cause losses of billions of dollars every year. To detect such crimes systems should be fast and precise
with the ability to detect new malicious content. Traditionally, this detection is done mostly through the
usage of blacklists. However, blacklists cannot be exhaustive, and lack the ability to detect newly generated
malicious URLs. To improve the generality of malicious URL detectors, machine learning techniques have
been explored with increasing attention in recent years. In this paper, I use a simple algorithm to detect
and predicting URLs it is good or bad and compared with two other algorithms to know (SVM, LR).
Online Attack Types of Data Breach and Cyberattack Prevention MethodsBRNSSPublicationHubI
This document summarizes research on online attack types and cyberattack prevention methods. It discusses how phishing attacks are commonly used by cybercriminals to steal private information by tricking users. The document then reviews several related works on detecting different cyber threat types using data analysis techniques like machine learning. It surveys recent methods for strengthening the identification of phishing websites using algorithms like oppositional cuckoo search and fuzzy logic classification.
Phishing Website Detection using Classification AlgorithmsIRJET Journal
This document discusses using machine learning algorithms to classify phishing websites. It begins with background on phishing and then discusses prior research applying algorithms like random forest, decision trees, SVM and KNN to detect phishing websites. The paper aims to address phishing website classification using various classifiers and ensemble learning approaches. It tests classifiers like random forest, decision tree, KNN, AdaBoost and GradientBoost on a phishing testing dataset and evaluates performance using metrics like accuracy, f1-score, precision and recall. The proposed approach achieves 97% accuracy in classifying phishing websites according to experimental results.
A SURVEY ON WEB SPAM DETECTION METHODS: TAXONOMYIJNSA Journal
Web spam refers to some techniques, which try to manipulate search engine ranking algorithms in order to raise web page position in search engine results. In the best case, spammers encourage viewers to visit their sites, and provide undeserved advertisement gains to the page owner. In the worst case, they use malicious contents in their pages and try to install malware on the victim’s machine. Spammers use three kinds of spamming techniques to get higher score in ranking. These techniques are Link based techniques, hiding techniques and Content-based techniques. Existing spam pages cause distrust to search engine results. This not only wastes the time of visitors, but also wastes lots of search engine resources. Hence spam detection methods have been proposed as a solution for web spam in order to reduce negative effects of spam pages. Experimental results show that some of these techniques are
working well and can find spam pages more accurate than the others. This paper classifies web spam techniques and the related detection methods.
COMPARATIVE ANALYSIS OF ANOMALY BASED WEB ATTACK DETECTION METHODSIJCI JOURNAL
In the present scenario, protection of websites from web-based attacks is a great challenge due to the bad intention of the malicious user over the Internet. Researchers are trying to find the optimum solution to prevent these web attack activities. There are several techniques available to prevent the web attacks from happening like firewalls, but most of the firewall is not designed to prevent the attack against the websites. Moreover, firewalls mostly work on signature-based detection method. In this paper, we have analyzed different anomaly-based detection methods for the detection of web-based attacks initiated by malicious users. Working of these methods is in a different direction to the signature-based detection method which only detects the web-based attacks for which a signature has been previously created. In this paper, we have introduced two methods: Attribute Length Method (ALM) and Attribute Character Distribution Method (ACDM) which is based on attribute values. Further, we have done the mathematical analysis of three different web attacks and compare their False Accept Rate (FAR) results for both the methods. Results analysis reveals that ALM is more efficient method than ACDM in the detection of web-based attacks.
Seminar on detecting fake accounts in social media using machine learningParvathi Sanil Nair
The document discusses research on detecting fake identities using machine learning on social media platforms. It begins with an abstract stating that little research has been done to detect fake human accounts specifically. The research applies machine learning models to attributes of fake accounts in hopes of advancing detection. It then outlines the contents which include introduction, related work on spam detection, research on detecting fake identities, research results, and conclusions. Key points are that identity deception is a major problem on social media, and while models can detect bot accounts, they are less successful at detecting fake human accounts using only profile attributes without behavioral data. The random forest model achieved the best results with an F1 score of 49.75% for detecting fake human accounts.
IRJET- Identification of Clone Attacks in Social Networking SitesIRJET Journal
This document discusses identifying clone attacks in social networking sites. It proposes a two-level approach using fuzzy-sim algorithm for profile matching and three algorithms (Predictive FP Growth, Eclat, and Apriori) for user activity matching. An experiment compares the execution times of the three algorithms, finding that Predictive FP Growth has the fastest time of 181 milliseconds, making it best for user activity matching.
This document provides an overview of CAPTCHAs (Completely Automated Public Turing test to tell Computers and Humans Apart). It discusses why CAPTCHAs are used, defines them, describes the different types including text-based, graphic-based and audio-based CAPTCHAs. It also outlines some of the major applications of CAPTCHAs, such as preventing comment spam in blogs, protecting website registration, and protecting email addresses from scrapers.
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNINGijcsit
With the advent of the Internet and social media, while hundreds of people have benefitted from the vast sources of information available, there has been an enormous increase in the rise of cyber-crimes, particularly targeted towards women. According to a 2019 report in the [4] Economics Times, India has witnessed a 457% rise in cybercrime in the five year span between 2011 and 2016. Most speculate that this is due to impact of social media such as Facebook, Instagram and Twitter on our daily lives. While these definitely help in creating a sound social network, creation of user accounts in these sites usually needs just an email-id. A real life person can create multiple fake IDs and hence impostors can easily be made. Unlike the real world scenario where multiple rules and regulations are imposed to identify oneself in a unique manner (for example while issuing one’s passport or driver’s license), in the virtual world of social media, admission does not require any such checks. In this paper, we study the different accounts of Instagram, in particular and try to assess an account as fake or real using Machine Learning techniques namely Logistic Regression and Random Forest Algorithm.
IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...IRJET Journal
This document discusses approaches for detecting phishing websites using machine learning. It describes three main approaches: 1) analyzing features of the URL, 2) checking the legitimacy of the website by examining the hosting and management details, and 3) using visual appearance analysis to check the genuineness of the website. It then proposes a hybrid approach that uses blacklist/whitelist screening, heuristic analysis of website features, and visual similarity comparisons to flag potential phishing sites.
Research of usability of Mashup Tools done for Kent County Council as part of the Pic and Mix Pilot (2009), opening up Kent related datasets for all to use and exploit.
AN INTELLIGENT CLASSIFICATION MODEL FOR PHISHING EMAIL DETECTIONIJNSA Journal
Phishing attacks are one of the trending cyber-attacks that apply socially engineered messages that are
communicated to people from professional hackers aiming at fooling users to reveal their sensitive
information, the most popular communication channel to those messages is through users’ emails. This
paper presents an intelligent classification model for detecting phishing emails using knowledge discovery,
data mining and text processing techniques. This paper introduces the concept of phishing terms weighting
which evaluates the weight of phishing terms in each email. The pre-processing phase is enhanced by
applying text stemming and WordNet ontology to enrich the model with word synonyms. The model applied
the knowledge discovery procedures using five popular classification algorithms and achieved a notable
enhancement in classification accuracy; 99.1% accuracy was achieved using the Random Forest algorithm
and 98.4% using J48, which is –to our knowledge- the highest accuracy rate for an accredited data set.
This paper also presents a comparative study with similar proposed classification techniques.
An adaptive anomaly request detection framework based on dynamic web applica...IJECEIAES
Web application firewall is a highly effective application in protecting the application layer and database layer of websites from attack access. This paper proposes a new web application firewall deploying method based on dynamic web application profiling (DWAP) analysis technique. This is a method to deploy a firewall based on analyzing website access data. DWAP is improved to integrate deeply into the structure of the website to increase the compatibility of the anomaly detection system into each website, thereby improving the ability to detect abnormal requests. To improve the compatibility of the web application firewall with protected objects, the proposed system consists of two parts with the main tasks are: i) Detect abnormal access in web application (WA) access; ii) Semi-automatic update the attack data to the abnormal access detection system during WA access. This new method is applicable in real-time detection systems where updating of new attack data is essential since web attacks are increasingly complex and sophisticated.
This document discusses using machine learning and deep learning techniques for classification tasks in R. It first uses decision trees to identify the minimal effective parameters for detecting phishing websites, finding that server form handler (SFH) and pop-up windows were most important. It then trains a convolutional neural network on the CIFAR-10 dataset for image classification. Some challenges included limited phishing website data and hardware constraints for deep learning models. Future work involves executing deep learning models on a GPU.
This document evaluates the performance of various classification techniques for detecting phishing websites. It uses a dataset from the UCI Machine Learning Repository containing attributes of phishing websites and a target variable to classify websites as phishing or legitimate. Several classification algorithms from the Weka tool are applied to the dataset, including Naive Bayes, decision trees, and k-nearest neighbor. Their performances are compared to determine the most accurate technique for phishing detection. Feature selection and data preprocessing are also discussed as important steps for building an effective classification model.
An iac approach for detecting profile cloningIJNSA Journal
Nowadays, Online Social Networks (OSNs) are popular websites on the internet, which millions of users
register on and share their own personal information with others. Privacy threats and disclosing personal
information are the most important concerns of OSNs’ users. Recently, a new attack which is named
Identity Cloned Attack is detected on OSNs. In this attack the attacker tries to make a fake identity of a real
user in order to access to private information of the users’ friends which they do not publish on the public
profiles. In today OSNs, there are some verification services, but they are not active services and they are
useful for users who are familiar with online identity issues. In this paper, Identity cloned attacks are
explained in more details and a new and precise method to detect profile cloning in online social networks
is proposed. In this method, first, the social network is shown in a form of graph, then, according to
similarities among users, this graph is divided into smaller communities. Afterwards, all of the similar
profiles to the real profile are gathered (from the same community), then strength of relationship (among
all selected profiles and the real profile) is calculated, and those which have the less strength of
relationship will be verified by mutual friend system. In this study, in order to evaluate the effectiveness of
proposed method, all steps are applied on a dataset of Facebook, and finally this work is compared with
two previous works by applying them on the dataset.
This document describes a proposed system for detecting identity crime and credit application fraud. The system uses a communal detection (CD) algorithm to find real social relationships and reduce suspicion scores to combat synthetic identities. It includes modules for credit applications, whitelist creation, and verification. The system is designed to address limitations of existing rules-based detection systems and better handle changing criminal behaviors.
This document discusses information security as it relates to e-commerce applications. It covers several technical security attack methods that e-commerce applications can be vulnerable to, including financial frauds, spam, phishing, bots, DDoS attacks, brute force attacks, SQL injections, XSS, and Trojan horses. It also discusses vulnerability assessments, penetration testing stages and methods, and ISO/IEC 27001:2013 as an international standard for managing information security risks.
Social network has become so popular with overwhelming high rate of growth, due to this popularity the online social networks is facing the issues of spamming, which has leads to unsubstantial economic loss to this menace of spam and spammers activities. It has leads to uncontrollable dissemination of viruses and malwares, promotional ads, phishing, and scams. spam activities has enter a new dangerous dimension, the spammers have step up their games and tactics online social networks, it consumes large amounts of network bandwidth leading to less revenue and significant economic loss to both private and public sectors. From the previous scholars work on spammer classification taxonomy, various machine learning techniques have been extensively used to detect spam activities and spammers in online social networks. There are various classifier that are learn over content-based features extracted from the user's interactions and profiles to label them as spam/spammers or legitimate. But recently, new network structural bench mark features have been proposed for spammer detection task, but their importance using structural bench mark learning methods has not been extensively evaluated yet. In this research work, we evaluate the the metric performance of some structural bench mark learning methods using scientific and strategic approach based attributes extracted from an interaction network for the task of spammer detection in online social network.
This document outlines a presentation on detecting phishing websites using machine learning. The goals of the project are to develop a machine learning model to identify phishing URLs and integrate it into a web application. It will discuss collecting and preprocessing data, selecting machine learning algorithms, developing the web app, and addressing challenges with existing phishing detection techniques. The project aims to help reduce online theft and educate users on phishing risks and countermeasures.
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORKijcseit
The World Wide Web has become an important part of our everyday life for information communication
and knowledge dissemination. It helps to transact information timely, rapidly and easily. Identifying theft
and identity fraud are referred as two sides of cyber-crime in which hackers and malicious users obtain the
personal data of existing legitimate users to attempt fraud or deception motivation for financial gain.
Malicious URLs host unsolicited content (spam, phishing, drive-by exploits, etc.) and lure unsuspecting
users to become victims of scams (monetary loss, theft of private information, and malware installation),
and cause losses of billions of dollars every year. To detect such crimes systems should be fast and precise
with the ability to detect new malicious content. Traditionally, this detection is done mostly through the
usage of blacklists. However, blacklists cannot be exhaustive, and lack the ability to detect newly generated
malicious URLs. To improve the generality of malicious URL detectors, machine learning techniques have
been explored with increasing attention in recent years. In this paper, I use a simple algorithm to detect
and predicting URLs it is good or bad and compared with two other algorithms to know (SVM, LR).
Online Attack Types of Data Breach and Cyberattack Prevention MethodsBRNSSPublicationHubI
This document summarizes research on online attack types and cyberattack prevention methods. It discusses how phishing attacks are commonly used by cybercriminals to steal private information by tricking users. The document then reviews several related works on detecting different cyber threat types using data analysis techniques like machine learning. It surveys recent methods for strengthening the identification of phishing websites using algorithms like oppositional cuckoo search and fuzzy logic classification.
Phishing Website Detection using Classification AlgorithmsIRJET Journal
This document discusses using machine learning algorithms to classify phishing websites. It begins with background on phishing and then discusses prior research applying algorithms like random forest, decision trees, SVM and KNN to detect phishing websites. The paper aims to address phishing website classification using various classifiers and ensemble learning approaches. It tests classifiers like random forest, decision tree, KNN, AdaBoost and GradientBoost on a phishing testing dataset and evaluates performance using metrics like accuracy, f1-score, precision and recall. The proposed approach achieves 97% accuracy in classifying phishing websites according to experimental results.
Malicious-URL Detection using Logistic Regression TechniqueDr. Amarjeet Singh
Over the last few years, the Web has seen a
massive growth in the number and kinds of web services.
Web facilities such as online banking, gaming, and social
networking have promptly evolved as has the faith upon them
by people to perform daily tasks. As a result, a large amount
of information is uploaded on a daily to the Web. As these
web services drive new opportunities for people to interact,
they also create new opportunities for criminals. URLs are
launch pads for any web attacks such that any malicious
intention user can steal the identity of the legal person by
sending the malicious URL. Malicious URLs are a keystone
of Internet illegitimate activities. The dangers of these sites
have created a mandates for defences that protect end-users
from visiting them. The proposed approach is that classifies
URLs automatically by using Machine-Learning algorithm
called logistic regression that is used to binary classification.
The classifiers achieves 97% accuracy by learning phishing
URLs
This document discusses the development of a machine learning model to accurately detect phishing websites in real-time. It begins with an introduction to the problem of phishing and the need for reliable phishing detection systems. It then discusses using supervised machine learning, specifically logistic regression, to classify websites as legitimate or phishing based on discriminatory features. The goal is to determine an optimal feature combination to train a classifier with high performance at detecting phishing sites. Previous literature on phishing detection is reviewed, including techniques using fuzzy rough set theory and a three-tiered approach using web crawler traffic data, content analysis, and URL analysis.
A Hybrid Approach For Phishing Website Detection Using Machine Learning.vivatechijri
In this technical age there are many ways where an attacker can get access to people’s sensitive information illegitimately. One of the ways is Phishing, Phishing is an activity of misleading people into giving their sensitive information on fraud websites that lookalike to the real website. The phishers aim is to steal personal information, bank details etc. Day by day it’s getting more and more risky to enter your personal information on websites fearing that it might be a phishing attack and can steal your sensitive information. That’s why phishing website detection is necessary to alert the user and block the website. An automated detection of phishing attack is necessary one of which is machine learning. Machine Learning is one of the efficient techniques to detect phishing attack as it removes drawback of existing approaches. Efficient machine learning model with content based approach proves very effective to detect phishing websites.
Our proposed system uses Hybrid approach which combines machine learning based method and content based method. The URL based features will be extracted and passed to machine learning model and in content based approach, TF-IDF algorithm will detect a phishing website by using the top keywords of a web page. This hybrid approach is used to achieve highly efficient result. Finally, our system will notify and alert user if the website is Phishing or Legitimate.
Deep learning in phishing mitigation: a uniform resource locator-based predic...IJECEIAES
To mitigate the evolution of phish websites, various phishing prediction8 schemes are being optimized eventually. However, the optimized methods produce gratuitous performance overhead due to the limited exploration of advanced phishing cues. Thus, a phishing uniform resource locator-based predictive model is enhanced by this work to defeat this deficiency using deep learning algorithms. This model’s architecture encompasses preprocessing of the effective feature space that is made up of 60 mutual uniform resource locator (URL) phishing features, and a dual deep learningbased model of convolution neural network with bi-directional long shortterm memory (CNN-BiLSTM). The proposed predictive model is trained and tested on a dataset of 14,000 phish URLs and 28,074 legitimate URLs. Experimentally, the performance outputs are remarked with a 0.01% false positive rate (FPR) and 99.27% testing accuracy.
PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...IJCNCJournal
Phishing scams are increasing drastically, which affects Internet users in compromising personal
credentials. This paper proposes a novel feature utilization method for phishing URL detection called the
Polymorphic property of features. In the initial stage, the URL-related features (46 features) were
extracted. Later, a subset of features (19 out of 46) with the polymorphic property of features was
identified, and they were extracted from different parts of the URL (the domain and path). After extracting
the features, various machine learning classification algorithms were applied to build the machine
learning model using monomorphic treatment of features, polymorphic treatment of features, and both
monomorphic and polymorphic treatment of features. By the polymorphic property of features, we mean
that the same feature provides different interpretations when considered in different parts of the URL. The
machine learning models were built on two different datasets. A comparison of the machine learning
models derived from the two datasets reveals the fact that the model built with both monomorphic and
polymorphic treatment of features yielded higher accuracy in Phishing URL detection than the existing
works.
PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...IJCNCJournal
Phishing scams are increasing drastically, which affects Internet users in compromising personal credentials. This paper proposes a novel feature utilization method for phishing URL detection called the Polymorphic property of features. In the initial stage, the URL-related features (46 features) were extracted. Later, a subset of features (19 out of 46) with the polymorphic property of features was identified, and they were extracted from different parts of the URL (the domain and path). After extracting the features, various machine learning classification algorithms were applied to build the machine learning model using monomorphic treatment of features, polymorphic treatment of features, and both monomorphic and polymorphic treatment of features. By the polymorphic property of features, we mean that the same feature provides different interpretations when considered in different parts of the URL. The machine learning models were built on two different datasets. A comparison of the machine learning models derived from the two datasets reveals the fact that the model built with both monomorphic and polymorphic treatment of features yielded higher accuracy in Phishing URL detection than the existing works
Detecting Phishing Websites Using Machine LearningIRJET Journal
1. The document proposes a system to detect phishing websites in real-time using machine learning. It trains a random forest classifier on a dataset of URLs and associated features to classify websites as legitimate or phishing.
2. The system collects URL and page content features from a user's browser and sends them to a cloud-based random forest model. The model was trained on over 40,000 URLs and achieved 97.36% accuracy at detecting phishing sites.
3. The proposed system provides advantages like real-time detection, using a large dataset for training, detecting new phishing sites, and independence from third-party services. It aims to protect users by identifying phishing sites before sensitive information is entered.
PDMLP: PHISHING DETECTION USING MULTILAYER PERCEPTRONIJNSA Journal
A phishing website is a significant problem on the internet. It’s one of the Cyber-attack types where attackers try to obtain sensitive information such as username and password or credit card information. The recent growth in deploying a Detection phishing URL system on many websites has resulted in a massive amount of available data to predict phishing websites. In this paper, we purpose a new method to develop a phishing detection system called phishing detection based on a multilayer perceptron (PDMLP), which used on two types of datasets. The performance of these mechanisms evaluated in terms of Accuracy, Precision, Recall, and F-measure. Results showed that PDMLP provides better performance in comparison to KNN, SVM, C4.5 Decision Tree, RF, and RoF to classifiers.
Detecting malicious URLs using binary classification through ada boost algori...IJECEIAES
Malicious Uniform Resource Locator (URL) is a frequent and severe menace to cybersecurity. Malicious URLs are used to extract unsolicited information and trick inexperienced end users as a sufferer of scams and create losses of billions of money each year. It is crucial to identify and appropriately respond to such URLs. Usually, this discovery is made by the practice and use of blacklists in the cyber world. However, blacklists cannot be exhaustive, and cannot recognize zero-day malicious URLs. So to increase the observation of malicious URL indicators, machine learning procedures should be incorporated. In this study, we have developed a complete prototype of Malicious URL Detection using machine learning methods. In particular, we have attempted an exact formulation of Malicious URL exposure from a machine learning perspective and proposed an approach using the AdaBoost algorithm - the proposed approach has brought forward more accuracy than other existing algorithms.
The document discusses the development of a browser extension to detect phishing URLs. It proposes using machine learning models trained on datasets of legitimate and phishing websites to classify URLs in real time. The extension will analyze features of URLs like those in the address bar, for abnormalities, and in HTML/JavaScript to identify phishing sites. The project uses an iterative development process over 10 weeks to create a Chrome extension that complies with guidelines and notifies users of detected phishing URLs. It concludes the system can handle new attack methods and recommends expanding it to be cross-platform and take online user reports to improve accuracy.
Phishing URL Detection using LSTM Based Ensemble Learning ApproachesIJCNCJournal
Increasing incidents of phishing attacks tempt a significant challenge for cybersecurity personals. Phishing is a deceitful venture with an intention to steal confidential information of an organization or an individual. Many works have been performed to build anti-phishing solutions over the years, but attackers are coming with new manoeuvres from time to time. Many of the existing techniques are experimented based on limited set of URLs and dependent on other software to collect domain related information of the URLs. In this paper, with an aim to build a more accurate and effective phishing attack detection system, we used the concept of ensemble learning using Long Short-Term Memory (LSTM) models. We proposed ensemble of LSTM models using bagging approach and stacking approach. For performing classification using LSTM method, no separate feature extraction is done. Ensemble models are built integrating the predictions of multiple LSTM models. Performances of proposed ensemble LSTM methods are compared with five different machine learning classification methods. To implement these machine learning algorithms, different URL based lexical features are extracted. Mutual Information based feature selection algorithm is used to select more relevant features to perform classifications. Both the bagging and the stacking approaches of ensemble learning using LSTM models outperform other machine learning techniques. The results are compared with other anti-phishing solutions implemented using deep learning methods. Our approaches have proved to be the more accurate one with a low false positive rate of less than 0.15% performed comparatively on a larger dataset.
PHISHING URL DETECTION USING LSTM BASED ENSEMBLE LEARNING APPROACHESIJCNCJournal
Increasing incidents of phishing attacks tempt a significant challenge for cybersecurity personals. Phishing
is a deceitful venture with an intention to steal confidential information of an organization or an
individual. Many works have been performed to build anti-phishing solutions over the years, but attackers
are coming with new manoeuvres from time to time. Many of the existing techniques are experimented
based on limited set of URLs and dependent on other software to collect domain related information of the
URLs. In this paper, with an aim to build a more accurate and effective phishing attack detection system,
we used the concept of ensemble learning using Long Short-Term Memory (LSTM) models. We proposed
ensemble of LSTM models using bagging approach and stacking approach. For performing classification
using LSTM method, no separate feature extraction is done. Ensemble models are built integrating the
predictions of multiple LSTM models. Performances of proposed ensemble LSTM methods are compared
with five different machine learning classification methods. To implement these machine learning
algorithms, different URL based lexical features are extracted. Mutual Information based feature selection
algorithm is used to select more relevant features to perform classifications. Both the bagging and the
stacking approaches of ensemble learning using LSTM models outperform other machine learning
techniques. The results are compared with other anti-phishing solutions implemented using deep learning
methods. Our approaches have proved to be the more accurate one with a low false positive rate of less
than 0.15% performed comparatively on a larger dataset.
This document summarizes a research paper that proposes a phishing detector plugin called PHISCAN that uses machine learning to detect phishing websites. The plugin is developed for the Chrome browser using JavaScript and HTML. It extracts features from URLs to train classifiers like random forest that can accurately classify URLs as phishing or benign in less than a second while maintaining user privacy. The paper conducts a literature review of existing phishing detection systems and techniques using blacklists, heuristics, or machine learning. It motivates the need for the proposed plugin by discussing the increasing prevalence and sophistication of phishing attacks.
This document summarizes a research paper that proposes a machine learning approach for detecting phishing websites. It discusses using heuristic features from CANTINA to train machine learning models. A new domain top-page similarity feature is introduced to improve accuracy. Various modules are described, including site training, site capturing, a phishing dictionary, and image correlation to measure similarity. Experimental results show the approach achieves up to 92.5% f-measure and a 7.5% error rate for phishing detection.
This document summarizes a research paper that proposes a machine learning approach for detecting phishing websites. It discusses using heuristic features from CANTINA to train machine learning models. A new domain top-page similarity feature is introduced to improve accuracy. Various modules are described, including site training, site capturing, a phishing dictionary, and image correlation to measure similarity. Experimental results show the approach achieves up to 92.5% f-measure and a 7.5% error rate for phishing detection.
This document presents a proposed system for detecting phishing websites using a Chrome extension. The system compares URLs to entries in two databases - the Phishtank database of known phishing sites, and a local IndexedDB of frequently visited sites. If a match is found in either database, the Chrome extension will flag the site as potentially malicious by changing color. The system was tested on 53 URLs, achieving an accuracy of 92.45% at detecting phishing sites. The proposed system aims to alert users to phishing sites and protect them from disclosing sensitive information to attackers.
Similar to A Deep Learning Technique for Web Phishing Detection Combined URL Features and Visual Similarity (20)
Rendezvous Sequence Generation Algorithm for Cognitive Radio Networks in Post...IJCNCJournal
Recent natural disasters have inflicted tremendous damage on humanity, with their scale progressively increasing and leading to numerous casualties. Events such as earthquakes can trigger secondary disasters, such as tsunamis, further complicating the situation by destroying communication infrastructures. This destruction impedes the dissemination of information about secondary disasters and complicates post-disaster rescue efforts. Consequently, there is an urgent demand for technologies capable of substituting for these destroyed communication infrastructures. This paper proposes a technique for generating rendezvous sequences to swiftly reconnect communication infrastructures in post-disaster scenarios. We compare the time required for rendezvous using the proposed technique against existing methods and analyze the average time taken to establish links with the rendezvous technique, discussing its significance. This research presents a novel approach enabling rapid recovery of destroyed communication infrastructures in disaster environments through Cognitive Radio Network (CRN) technology, showcasing the potential to significantly improve disaster response and recovery efforts. The proposed method reduces the time for the rendezvous compared to existing methods, suggesting that it can enhance the efficiency of rescue operations in post-disaster scenarios and contribute to life-saving efforts.
Blockchain Enforced Attribute based Access Control with ZKP for Healthcare Se...IJCNCJournal
The relationship between doctors and patients is reinforced through the expanded communication channels provided by remote healthcare services, resulting in heightened patient satisfaction and loyalty. Nonetheless, the growth of these services is hampered by security and privacy challenges they confront. Additionally, patient electronic health records (EHR) information is dispersed across multiple hospitals in different formats, undermining data sovereignty. It allows any service to assert authority over their EHR, effectively controlling its usage. This paper proposes a blockchain enforced attribute-based access control in healthcare service. To enhance the privacy and data-sovereignty, the proposed system employs attribute-based access control, zero-knowledge proof (ZKP) and blockchain. The role of data within our system is pivotal in defining attributes. These attributes, in turn, form the fundamental basis for access control criteria. Blockchain is used to keep hospital information in public chain but EHR related data in private chain. Furthermore, EHR provides access control by using the attributed based cryptosystem before they are stored in the blockchain. Analysis shows that the proposed system provides data sovereignty with privacy provision based on the attributed based access control.
EECRPSID: Energy-Efficient Cluster-Based Routing Protocol with a Secure Intru...IJCNCJournal
A revolutionary idea that has gained significance in technology for Internet of Things (IoT) networks backed by WSNs is the " Energy-Efficient Cluster-Based Routing Protocol with a Secure Intrusion Detection" (EECRPSID). A WSN-powered IoT infrastructure's hardware foundation is hardware with autonomous sensing capabilities. The significant features of the proposed technology are intelligent environment sensing, independent data collection, and information transfer to connected devices. However, hardware flaws and issues with energy consumption may be to blame for device failures in WSN-assisted IoT networks. This can potentially obstruct the transfer of data. A reliable route significantly reduces data retransmissions, which reduces traffic and conserves energy. The sensor hardware is often widely dispersed by IoT networks that enable WSNs. Data duplication could occur if numerous sensor devices are used to monitor a location. Finding a solution to this issue by using clustering. Clustering lessens network traffic while retaining path dependability compared to the multipath technique. To relieve duplicate data in EECRPSID, we applied the clustering technique. The multipath strategy might make the provided protocol more dependable. Using the EECRPSID algorithm, will reduce the overall energy consumption, minimize the End-to-end delay to 0.14s, achieve a 99.8% Packet Delivery Ratio, and the network's lifespan will be increased. The NS2 simulator is used to run the whole set of simulations. The EECRPSID method has been implemented in NS2, and simulated results indicate that comparing the other three technologies improves the performance measures.
Analysis and Evolution of SHA-1 Algorithm - Analytical TechniqueIJCNCJournal
A 160-bit (20-byte) hash value, sometimes called a message digest, is generated using the SHA-1 (Secure Hash Algorithm 1) hash function in cryptography. This value is commonly represented as 40 hexadecimal digits. It is a Federal Information Processing Standard in the United States and was developed by the National Security Agency. Although it has been cryptographically cracked, the technique is still in widespread usage. In this work, we conduct a detailed and practical analysis of the SHA-1 algorithm's theoretical elements and show how they have been implemented through the use of several different hash configurations.
Optimizing CNN-BiGRU Performance: Mish Activation and Comparative AnalysisIJCNCJournal
Deep learning is currently extensively employed across a range of research domains. The continuous advancements in deep learning techniques contribute to solving intricate challenges. Activation functions (AF) are fundamental components within neural networks, enabling them to capture complex patterns and relationships in the data. By introducing non-linearities, AF empowers neural networks to model and adapt to the diverse and nuanced nature of real-world data, enhancing their ability to make accurate predictions across various tasks. In the context of intrusion detection, the Mish, a recent AF, was implemented in the CNN-BiGRU model, using three datasets: ASNM-TUN, ASNM-CDX, and HOGZILLA. The comparison with Rectified Linear Unit (ReLU), a widely used AF, revealed that Mish outperforms ReLU, showcasing superior performance across the evaluated datasets. This study illuminates the effectiveness of AF in elevating the performance of intrusion detection systems.
An Hybrid Framework OTFS-OFDM Based on Mobile Speed EstimationIJCNCJournal
The Future wireless communication systems face the challenging task of simultaneously providing high-quality service (QoS) and broadband data transmission, while also minimizing power consumption, latency, and system complexity. Although Orthogonal Frequency Division Multiplexing (OFDM) has been widely adopted in 4G and 5G systems, it struggles to cope with a significant delay and Doppler spread in high mobility scenarios. To address these challenges, a novel waveform named Orthogonal Time Frequency Space (OTFS). Designers aim to outperform OFDM by closely aligning signals with the channel behaviour. In this paper, we propose a switching strategy that empowers operators to select the most appropriate waveform based on an estimated speed of the mobile user. This strategy enables the base station to dynamically choose the waveform that best suits the mobile user’s speed. Additionally, we suggest retaining an Integrated Sensing and Communication (ISAC) radar approach for accurate Doppler estimation. This provides precise information to facilitate the waveform selection procedure. By leveraging the switching strategy and harnessing the Doppler estimation capabilities of an ISAC radar.Our proposed approach aims to enhance the performance of wireless communication systems in high mobility cases. Considering the complexity of waveform processing, we introduce an optimized hybrid system that combines OTFS and OFDM, resulting in reduced complexity while still retaining performance benefits.This hybrid system presents a promising solution for improving the performance of wireless communication systems in higher mobility.The simulation results validate the effectiveness of our approach, demonstrating its potential advantages for future wireless communication systems. The effectiveness of the proposed approach is validated by simulation results as it will be illustrated.
Enhanced Traffic Congestion Management with Fog Computing - A Simulation-Base...IJCNCJournal
Accurate latency computation is essential for the Internet of Things (IoT) since the connected devices generate a vast amount of data that is processed on cloud infrastructure. However, the cloud is not an optimal solution. To overcome this issue, fog computing is used to enable processing at the edge while still allowing communication with the cloud. Many applications rely on fog computing, including traffic management. In this paper, an Intelligent Traffic Congestion Mitigation System (ITCMS) is proposed to address traffic congestion in heavily populated smart cities. The proposed system is implemented using fog computing and tested in a crowdedCairo city. The results obtained indicate that the execution time of the simulation is 4,538 seconds, and the delay in the application loop is 49.67 seconds. The paper addresses various issues, including CPU usage, heap memory usage, throughput, and the total average delay, which are essential for evaluating the performance of the ITCMS. Our system model is also compared with other models to assess its performance. A comparison is made using two parameters, namely throughput and the total average delay, between the ITCMS, IOV (Internet of Vehicle), and STL (Seasonal-Trend Decomposition Procedure based on LOESS). Consequently, the results confirm that the proposed system outperforms the others in terms of higher accuracy, lower latency, and improved traffic efficiency.
Rendezvous Sequence Generation Algorithm for Cognitive Radio Networks in Post...IJCNCJournal
Recent natural disasters have inflicted tremendous damage on humanity, with their scale progressively increasing and leading to numerous casualties. Events such as earthquakes can trigger secondary disasters, such as tsunamis, further complicating the situation by destroying communication infrastructures. This destruction impedes the dissemination of information about secondary disasters and complicates post-disaster rescue efforts. Consequently, there is an urgent demand for technologies capable of substituting for these destroyed communication infrastructures. This paper proposes a technique for generating rendezvous sequences to swiftly reconnect communication infrastructures in post-disaster scenarios. We compare the time required for rendezvous using the proposed technique against existing methods and analyze the average time taken to establish links with the rendezvous technique, discussing its significance. This research presents a novel approach enabling rapid recovery of destroyed communication infrastructures in disaster environments through Cognitive Radio Network (CRN) technology, showcasing the potential to significantly improve disaster response and recovery efforts. The proposed method reduces the time for the rendezvous compared to existing methods, suggesting that it can enhance the efficiency of rescue operations in post-disaster scenarios and contribute to life-saving efforts.
Vehicle Ad Hoc Networks (VANETs) have become a viable technology to improve traffic flow and safety on the roads. Due to its effectiveness and scalability, the Wingsuit Search-based Optimised Link State Routing Protocol (WS-OLSR) is frequently used for data distribution in VANETs. However, the selection of MultiPoint Relays (MPRs) plays a pivotal role in WS-OLSR's performance. This paper presents an improved MPR selection algorithm tailored to WS-OLSR, designed to enhance the overall routing efficiency and reduce overhead. The analysis found that the current OLSR protocol has problems such as redundancy of HELLO and TC message packets or failure to update routing information in time, so a WS-OLSR routing protocol based on improved-MPR selection algorithm was proposed. Firstly, factors such as node mobility and link changes are comprehensively considered to reflect network topology changes, and the broadcast cycle of node HELLO messages is controlled through topology changes. Secondly, a new MPR selection algorithm is proposed, considering link stability issues and nodes. Finally, evaluate its effectiveness in terms of packet delivery ratio, end-to-end delay, and control message overhead. Simulation results demonstrate the superior performance of our improved MR selection algorithm when compared to traditional approaches.
May 2024, Volume 16, Number 3 - The International Journal of Computer Network...IJCNCJournal
The International Journal of Computer Networks & Communications (IJCNC) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of Computer Networks & Communications. The journal focuses on all technical and practical aspects of Computer Networks & data Communications. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced networking concepts and establishing new collaborations in these areas.
Vehicle Ad Hoc Networks (VANETs) have become a viable technology to improve traffic flow and safety on the roads. Due to its effectiveness and scalability, the Wingsuit Search-based Optimised Link State Routing Protocol (WS-OLSR) is frequently used for data distribution in VANETs. However, the selection of MultiPoint Relays (MPRs) plays a pivotal role in WS-OLSR's performance. This paper presents an improved MPR selection algorithm tailored to WS-OLSR, designed to enhance the overall routing efficiency and reduce overhead. The analysis found that the current OLSR protocol has problems such as redundancy of HELLO and TC message packets or failure to update routing information in time, so a WS-OLSR routing protocol based on improved-MPR selection algorithm was proposed. Firstly, factors such as node mobility and link changes are comprehensively considered to reflect network topology changes, and the broadcast cycle of node HELLO messages is controlled through topology changes. Secondly, a new MPR selection algorithm is proposed, considering link stability issues and nodes. Finally, evaluate its effectiveness in terms of packet delivery ratio, end-to-end delay, and control message overhead. Simulation results demonstrate the superior performance of our improved MR selection algorithm when compared to traditional approaches.
A Novel Medium Access Control Strategy for Heterogeneous Traffic in Wireless ...IJCNCJournal
So far, Wireless Body Area Networks (WBANs) have played a pivotal role in driving the development of intelligent healthcare systems with broad applicability across various domains. Each WBAN consists of one or more types of sensors that can be embedded in clothing, attached directly to the body, or even implanted beneath an individual's skin. These sensors typically serve asingle application. However, the traffic generated by each sensor may have distinct requirements. This diversity necessitates a dual approach: tailored treatment based on the specific needs of each traffic typeand the fulfillment of application requirements, such asreliability and timeliness. Never the less, the presence of energy constraints and the unreliable nature of wireless communications make QoS provisioning under such networks a non-trivial task. In this context, the current paper introduces a novel Medium AccessControl (MAC) strategy for the regular traffic applications of WBANs, designed to significantly enhance efficiency when compared to the established MAC protocols IEEE 802.15.4 and IEEE 802.15.6, with a particular focus on improving reliability, timeliness, and energy efficiency.
May_2024 Top 10 Read Articles in Computer Networks & Communications.pdfIJCNCJournal
The International Journal of Computer Networks & Communications (IJCNC) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of Computer Networks & Communications. The journal focuses on all technical and practical aspects of Computer Networks & data Communications. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced networking concepts and establishing new collaborations in these areas.
A Topology Control Algorithm Taking into Account Energy and Quality of Transm...IJCNCJournal
The efficient use of energy in wireless sensor networks is critical for extending node lifetime. The network topology is one of the factors that have a significant impact on the energy usage at the nodes and the quality of transmission (QoT) in the network. We propose a topology control algorithm for software-defined wireless sensor networks (SDWSNs) in this paper. Our method is to formulate topology control algorithm as a nonlinear programming (NP) problem with the objective to optimizing two metrics, maximum communication range, and desired degree. This NP problem is solved at the SDWSN controller by employing the genetic algorithm (GA) to determine the best topology. The simulation results show that the proposed algorithm outperforms the MaxPower algorithm in terms of average node degree and energy expansion ratio.
Multi-Server user Authentication Scheme for Privacy Preservation with Fuzzy C...IJCNCJournal
The integration of artificial intelligence technology with a scalable Internet of Things (IoT) platform facilitates diverse smart communication services, allowing remote users to access services from anywhere at any time. The multi-server environment within IoT introduces a flexible security service model, enabling users to interact with any server through a single registration. To ensure secure and privacy preservation services for resources, an authentication scheme is essential. Zhao et al. recently introduced a user authentication scheme for the multi-server environment, utilizing passwords and smart cards, claiming resilience against well-known attacks. This paper conducts cryptanalysis on Zhao et al.'s scheme, focusing on denial of service and privacy attacks, revealing a lack of user-friendliness. Subsequently, we propose a new multi-server user authentication scheme for privacy preservation with fuzzy commitment over the IoT environment, addressing the shortcomings of Zhao et al.'s scheme. Formal security verification of the proposed scheme is conducted using the ProVerif simulation tool. Through both formal and informal security analyses, we demonstrate that the proposed scheme is resilient against various known attacks and those identified in Zhao et al.'s scheme.
Advanced Privacy Scheme to Improve Road Safety in Smart Transportation SystemsIJCNCJournal
In -Vehicle Ad-Hoc Network (VANET), vehicles continuously transmit and receive spatiotemporal data with neighboring vehicles, thereby establishing a comprehensive 360-degree traffic awareness system. Vehicular Network safety applications facilitate the transmission of messages between vehicles that are near each other, at regular intervals, enhancing drivers' contextual understanding of the driving environment and significantly improving traffic safety. Privacy schemes in VANETs are vital to safeguard vehicles’ identities and their associated owners or drivers. Privacy schemes prevent unauthorized parties from linking the vehicle's communications to a specific real-world identity by employing techniques such as pseudonyms, randomization, or cryptographic protocols. Nevertheless, these communications frequently contain important vehicle information that malevolent groups could use to Monitor the vehicle over a long period. The acquisition of this shared data has the potential to facilitate the reconstruction of vehicle trajectories, thereby posing a potential risk to the privacy of the driver. Addressing the critical challenge of developing effective and scalable privacy-preserving protocols for communication in vehicle networks is of the highest priority. These protocols aim to reduce the transmission of confidential data while ensuring the required level of communication. This paper aims to propose an Advanced Privacy Vehicle Scheme (APV) that periodically changes pseudonyms to protect vehicle identities and improve privacy. The APV scheme utilizes a concept called the silent period, which involves changing the pseudonym of a vehicle periodically based on the tracking of neighboring vehicles. The pseudonym is a temporary identifier that vehicles use to communicate with each other in a VANET. By changing the pseudonym regularly, the APV scheme makes it difficult for unauthorized entities to link a vehicle's communications to its real-world identity. The proposed APV is compared to the SLOW, RSP, CAPS, and CPN techniques. The data indicates that the efficiency of APV is a better improvement in privacy metrics. It is evident that the AVP offers enhanced safety for vehicles during transportation in the smart city.
April 2024 - Top 10 Read Articles in Computer Networks & CommunicationsIJCNCJournal
The International Journal of Computer Networks & Communications (IJCNC) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of Computer Networks & Communications. The journal focuses on all technical and practical aspects of Computer Networks & data Communications. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced networking concepts and establishing new collaborations in these areas.
DEF: Deep Ensemble Neural Network Classifier for Android Malware DetectionIJCNCJournal
Malware is one of the threats to security of computer networks and information systems. Since malware instances are available sufficiently, there is increased interest among researchers on usage of Artificial Intelligence (AI). Of late AI-enabled methods such as machine learning (ML) and deep learning paved way for solving many real-world problems. As it is a learning-based approach, accumulated training samples help in improving thequality of training and thus leveraging malware detection accuracy. Existing deep learning methods are focusing on learning-based malware detection systems. However, there is need for improving the state of the art through ensemble approach. Towards this end, in this paper we proposed a framework known as Deep Ensemble Framework (DEF) for automatic malware detection. The framework obtains features from training samples. From given malware instance a grayscale image is generated. There is another process to extract the opcode sequences. Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) techniques are used to obtain grayscale image and opcode sequence respectively. Afterwards, a stacking ensemble is employed in order to achieve efficient malware detection and classification. Malware samples collected fromthe Internet sources and Microsoft are used for theempirical study. An algorithm known as Ensemble Learning for Automatic Malware Detection (EL-AML) is proposed to realize our framework. Another algorithm named Pre-Process is proposed to assist the EL-AML algorithm for obtaining intermediate features required by CNN and LSTM.Empirical study reveals that our framework outperforms many existing methods in terms of speed-up and accuracy.
High Performance NMF Based Intrusion Detection System for Big Data IOT TrafficIJCNCJournal
With the emergence of smart devices and the Internet of Things (IoT), millions of users connected to the network produce massive network traffic datasets. These vast datasets of network traffic, Big Data are challenging to store, deal with and analyse using a single computer. In this paper we developed parallel implementation using a High Performance Computer (HPC) for the Non-Negative Matrix Factorization technique as an engine for an Intrusion Detection System (HPC-NMF-IDS). The large IoT traffic datasets of order of millions samples are distributed evenly on all the computing cores for both storage and speedup purpose. The distribution of computing tasks involved in the Matrix Factorization takes into account the reduction of the communication cost between the computing cores. The experiments we conducted on the proposed HPC-IDS-NMF give better results than the traditional ML-based intrusion detection systems. We could train the HPC model with datasets of one million samples in only 31 seconds instead of the 40 minutes using one processor), that is a speed up of 87 times. Moreover, we have got an excellent detection accuracy rate of 98% for KDD dataset.
A Novel Medium Access Control Strategy for Heterogeneous Traffic in Wireless ...IJCNCJournal
So far, Wireless Body Area Networks (WBANs) have played a pivotal role in driving the development of intelligent healthcare systems with broad applicability across various domains. Each WBAN consists of one or more types of sensors that can be embedded in clothing, attached directly to the body, or even implanted beneath an individual's skin. These sensors typically serve asingle application. However, the traffic generated by each sensor may have distinct requirements. This diversity necessitates a dual approach: tailored treatment based on the specific needs of each traffic typeand the fulfillment of application requirements, such asreliability and timeliness. Never the less, the presence of energy constraints and the unreliable nature of wireless communications make QoS provisioning under such networks a non-trivial task. In this context, the current paper introduces a novel Medium AccessControl (MAC) strategy for the regular traffic applications of WBANs, designed to significantly enhance efficiency when compared to the established MAC protocols IEEE 802.15.4 and IEEE 802.15.6, with a particular focus on improving reliability, timeliness, and energy efficiency.
বাংলাদেশের অর্থনৈতিক সমীক্ষা ২০২৪ [Bangladesh Economic Review 2024 Bangla.pdf] কম্পিউটার , ট্যাব ও স্মার্ট ফোন ভার্সন সহ সম্পূর্ণ বাংলা ই-বুক বা pdf বই " সুচিপত্র ...বুকমার্ক মেনু 🔖 ও হাইপার লিংক মেনু 📝👆 যুক্ত ..
আমাদের সবার জন্য খুব খুব গুরুত্বপূর্ণ একটি বই ..বিসিএস, ব্যাংক, ইউনিভার্সিটি ভর্তি ও যে কোন প্রতিযোগিতা মূলক পরীক্ষার জন্য এর খুব ইম্পরট্যান্ট একটি বিষয় ...তাছাড়া বাংলাদেশের সাম্প্রতিক যে কোন ডাটা বা তথ্য এই বইতে পাবেন ...
তাই একজন নাগরিক হিসাবে এই তথ্য গুলো আপনার জানা প্রয়োজন ...।
বিসিএস ও ব্যাংক এর লিখিত পরীক্ষা ...+এছাড়া মাধ্যমিক ও উচ্চমাধ্যমিকের স্টুডেন্টদের জন্য অনেক কাজে আসবে ...
How to Manage Your Lost Opportunities in Odoo 17 CRMCeline George
Odoo 17 CRM allows us to track why we lose sales opportunities with "Lost Reasons." This helps analyze our sales process and identify areas for improvement. Here's how to configure lost reasons in Odoo 17 CRM
This slide is special for master students (MIBS & MIFB) in UUM. Also useful for readers who are interested in the topic of contemporary Islamic banking.
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...PECB
Denis is a dynamic and results-driven Chief Information Officer (CIO) with a distinguished career spanning information systems analysis and technical project management. With a proven track record of spearheading the design and delivery of cutting-edge Information Management solutions, he has consistently elevated business operations, streamlined reporting functions, and maximized process efficiency.
Certified as an ISO/IEC 27001: Information Security Management Systems (ISMS) Lead Implementer, Data Protection Officer, and Cyber Risks Analyst, Denis brings a heightened focus on data security, privacy, and cyber resilience to every endeavor.
His expertise extends across a diverse spectrum of reporting, database, and web development applications, underpinned by an exceptional grasp of data storage and virtualization technologies. His proficiency in application testing, database administration, and data cleansing ensures seamless execution of complex projects.
What sets Denis apart is his comprehensive understanding of Business and Systems Analysis technologies, honed through involvement in all phases of the Software Development Lifecycle (SDLC). From meticulous requirements gathering to precise analysis, innovative design, rigorous development, thorough testing, and successful implementation, he has consistently delivered exceptional results.
Throughout his career, he has taken on multifaceted roles, from leading technical project management teams to owning solutions that drive operational excellence. His conscientious and proactive approach is unwavering, whether he is working independently or collaboratively within a team. His ability to connect with colleagues on a personal level underscores his commitment to fostering a harmonious and productive workplace environment.
Date: May 29, 2024
Tags: Information Security, ISO/IEC 27001, ISO/IEC 42001, Artificial Intelligence, GDPR
-------------------------------------------------------------------------------
Find out more about ISO training and certification services
Training: ISO/IEC 27001 Information Security Management System - EN | PECB
ISO/IEC 42001 Artificial Intelligence Management System - EN | PECB
General Data Protection Regulation (GDPR) - Training Courses - EN | PECB
Webinars: https://pecb.com/webinars
Article: https://pecb.com/article
-------------------------------------------------------------------------------
For more information about PECB:
Website: https://pecb.com/
LinkedIn: https://www.linkedin.com/company/pecb/
Facebook: https://www.facebook.com/PECBInternational/
Slideshare: http://www.slideshare.net/PECBCERTIFICATION
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPRAHUL
This Dissertation explores the particular circumstances of Mirzapur, a region located in the
core of India. Mirzapur, with its varied terrains and abundant biodiversity, offers an optimal
environment for investigating the changes in vegetation cover dynamics. Our study utilizes
advanced technologies such as GIS (Geographic Information Systems) and Remote sensing to
analyze the transformations that have taken place over the course of a decade.
The complex relationship between human activities and the environment has been the focus
of extensive research and worry. As the global community grapples with swift urbanization,
population expansion, and economic progress, the effects on natural ecosystems are becoming
more evident. A crucial element of this impact is the alteration of vegetation cover, which plays a
significant role in maintaining the ecological equilibrium of our planet.Land serves as the foundation for all human activities and provides the necessary materials for
these activities. As the most crucial natural resource, its utilization by humans results in different
'Land uses,' which are determined by both human activities and the physical characteristics of the
land.
The utilization of land is impacted by human needs and environmental factors. In countries
like India, rapid population growth and the emphasis on extensive resource exploitation can lead
to significant land degradation, adversely affecting the region's land cover.
Therefore, human intervention has significantly influenced land use patterns over many
centuries, evolving its structure over time and space. In the present era, these changes have
accelerated due to factors such as agriculture and urbanization. Information regarding land use and
cover is essential for various planning and management tasks related to the Earth's surface,
providing crucial environmental data for scientific, resource management, policy purposes, and
diverse human activities.
Accurate understanding of land use and cover is imperative for the development planning
of any area. Consequently, a wide range of professionals, including earth system scientists, land
and water managers, and urban planners, are interested in obtaining data on land use and cover
changes, conversion trends, and other related patterns. The spatial dimensions of land use and
cover support policymakers and scientists in making well-informed decisions, as alterations in
these patterns indicate shifts in economic and social conditions. Monitoring such changes with the
help of Advanced technologies like Remote Sensing and Geographic Information Systems is
crucial for coordinated efforts across different administrative levels. Advanced technologies like
Remote Sensing and Geographic Information Systems
9
Changes in vegetation cover refer to variations in the distribution, composition, and overall
structure of plant communities across different temporal and spatial scales. These changes can
occur natural.
This presentation was provided by Steph Pollock of The American Psychological Association’s Journals Program, and Damita Snow, of The American Society of Civil Engineers (ASCE), for the initial session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session One: 'Setting Expectations: a DEIA Primer,' was held June 6, 2024.
This presentation includes basic of PCOS their pathology and treatment and also Ayurveda correlation of PCOS and Ayurvedic line of treatment mentioned in classics.
Strategies for Effective Upskilling is a presentation by Chinwendu Peace in a Your Skill Boost Masterclass organisation by the Excellence Foundation for South Sudan on 08th and 09th June 2024 from 1 PM to 3 PM on each day.
Liberal Approach to the Study of Indian Politics.pdf
A Deep Learning Technique for Web Phishing Detection Combined URL Features and Visual Similarity
1. International Journal of Computer Networks & Communications (IJCNC) Vol.12, No.5, September 2020
DOI: 10.5121/ijcnc.2020.12503 41
A DEEP LEARNING TECHNIQUE FOR WEB
PHISHING DETECTION COMBINED URL
FEATURES AND VISUAL SIMILARITY
Saad Al-Ahmadi1
and Yasser Alharbi2
1
College of Computer and Information Science, Computer Science Department,
King Saud University, Riyadh, Saudi Arabia
2
College of Computer and Information Science, Computer Engineering Department,
King Saud University, Riyadh, Saudi Arabia
ABSTRACT
The most popular way to deceive online users nowadays is phishing. Consequently, to increase
cybersecurity, more efficient web page phishing detection mechanisms are needed. In this paper, we
propose an approach that rely on websites image and URL to deals with the issue of phishing website
recognition as a classification challenge. Our model uses webpage URLs and images to detect a phishing
attack using convolution neural networks (CNNs) to extract the most important features of website images
and URLs and then classifies them into benign and phishing pages. The accuracy rate of the results of the
experiment was 99.67%, proving the effectiveness of the proposed model in detecting a web phishing
attack.
KEYWORDS
Phishing detection, URL, visual similarity, deep learning, convolution neural network.
1. INTRODUCTION
The deployment and use of website services has increased rapidly in recent years and this growth
has created a high level of risk for internet users globally. Web phishing attacks are the most
severe and the most common cybersecurity threat over the internet. Web phishing can be
described as a way to deceive and steal users’ sensitive personal information through a phished
webpage that looks similar to a valid web page. Phishers use a diversity of techniques in order to
execute an effective deception.
The lack of awareness of phishing attacks by internet users and innovative phishing methods
highlight the need for efficient technology-based detection techniques. The setting up an
appropriate mechanism to detect spoofed websites would be vital in reducing the probable
enormous damage caused to internet users [1].
In this research, we propose a technique for recognizing web phishing using two convolution
neural networks (CNNs) based on websites image and URL. The first CNN is used to extract the
website URL features and identify whether the webpage is a phishing webpage or a benign
webpage. The second CNN is used to simultaneously extract the visual features of the webpage
and classify the website as either legitimate or malicious. The results of the first and the second
CNN are combined and on the basis of the outcome it is decided whether the webpage should be
reported as a phishing webpage. The motivation to use the URL based approach with visual
2. International Journal of Computer Networks & Communications (IJCNC) Vol.12, No.5, September 2020
42
similarity based method is that attackers generally use a fake URL that looks similar to a
legitimate URL and deploy webpages that look visually similar to legitimate websites with the
goal of deceiving internet users to obtain sensitive information.
This paper is structured as follows: section 2 presents some background information; Section 3
reviews research on detecting phishing attacks; Section 4 describes the method used; Section 5
presents the results of the experiment and comparison with other research; while Section 6
presents the conclusion.
2. BACKGROUND
The impact of a phishing attack on an internet user is significant, causing vital damage and
financial loss when their confidential information is stolen. Hence, deploying an appropriate anti-
phishing technique is crucial. Web phishing detection mechanisms can be categorised under five
approaches: firstly, the whitelist based approach, which is a way to classify phishing websites
depending on a predefined list of legitimate websites. This approach is ineffective since it is
impossible to have a whitelist that contains all the legitimate websites in the world. Secondly, the
Blacklist based approach, which uses a predefined list of anonymous websites. This list needs to
be updated very frequently to make it possible to detect all newly created phishing pages, which
makes the zero-hour phishing attack a major issue in this approach [2],[3]. Thirdly, the Content
based approach that relies on the component of the webpage that needs to be extracted to use in
retrieving the corresponding legitimate site and to detect phishing. This method requires more
run-time overhead to extract contents, the use ofsearch engines to locate the domains and, finally,
a system through which judgements are made on whether the website is legitimate [4]. Fourthly,
the URLs based approach that executes by embedding sensitive words or characters in a link that
mimic legitimate URLs (spelling mistakes, reliable keywords, redirecting to other websites or
shortened URLs). This method relies on the lexical and host-based features of the URLs to detect
a website phishing attack. The lexical features includes properties that malicious URLs use to
look like legitimate URLs, whereas the host-based features are properties that belong to website
hosts. Fifthly, the Visual similarity-based approach, which detects phishing by comparing the
visual characteristics of suspicious websites and legitimate websites to identify phishing and non-
phishing websites [2],[3].
The phishing detection approach that is based on visual similarity can be categorised into six
main types. The first type depends on the document object model (DOM) tree which defines the
structure of webpages. This technique judges the phishing attack on the basis of a comparison of
the DOM tree of unreliable websites and of reliable websites; if there is a match, the phishing
attack is reported. The second type is the visual feature-based technique that focuses on
comparing text features such as the font size, background colours and image features such as
height, width and position of images on the website, in order to detect a phished webpage. The
third type is relays on Cascading Style Sheets (CSS) which is used to create a uniform
appearance (text styles, font and colours) across several webpages. In this mechanism, if the
result of the matching of the CSS layout of a suspicious website and a legitimate website are
identical, then the suspicious page is reported as a phishing page. The fourth type is based on
images of websites and works by comparing the similarity of unreliable website images with the
images of a legitimate website. If the outcome of comparison is low, then both sites are reliable;
otherwise, the unreliable website is considered to be a phishing attack.The fifth type is the visual
perception method which relies on the theory that defines how people perceive visual
components. This method views websites in a holistic manner rather than as a collection of
distinct features, as in the other methods. This method determines the extent of similarity
between websites from a reader's point of view. The sixth type is the hybrid method that
3. International Journal of Computer Networks & Communications (IJCNC) Vol.12, No.5, September 2020
43
combines two or more kinds of phishing detection mechanisms with the goal of increasing the
accuracy rate [5].
Recently, several machine learning approaches have been used by researchers as an assistant tool
to detect web phishing attacks. Naive Bayes, support vector machine, random forest and
convolution neural network are commonly used algorithms [3].
In this research, we used the convolution neural network (CNN) to detect web phishing attacks
based on the URLs and the screenshot of websites. The CNN is one of the most popular types of
deep learning mechanisms in particular for high dimensional data, such as videos and images. It
is suitable to use to extract the local features of images and it uses these features to distinguish
between different images. It has also been tremendously successful in extracting textual features
and recognizing sentences. A general CNN architecture typically consists of alternating
convolution and pooling layers, followed by one or more fully connected layers. These layers
automatically represent high-level features of inputs and enable the CNN to carry out
theclassification task. The basic three layers of CNN can be described as follows: First there is
the Convolution layer which is responsible for extracting features of input and consists of a series
of convolution kernels that divide the input into a small block, referred to as receptive fields.
Convolving the input with a kernel creates a feature map which highlights the existence of a
specific feature in the input. Second there is the Pooling layer that usually helps to reduce the
features map's dimensionality. Several types of pooling functions can be performed, such as max
pooling, average pooling and sum pooling. Third is a fully connected layer (FCL), which is a
typical neural network layer used mostly for classification. It receives input from the previous
extraction features step and analyses the output of all the preceding layers in order to create a
non-linear combination of selected features. Consequently, it can calculate the class scores and
produce an 1-D size array match number of classes [6].
3. LITERATURE REVIEW
The need to secure web applications has increased significantly given the fact that web
applications are commonly used as a key interface between users and platforms. However, users
have an issue in remembering all reliable URL pages on the web which puts them at risk for a
wide variety of attacks. The phishing attack is one of the most common threats that users
experience while browsing online websites.The attacker deceives users by using a phished
website to acquire sensitive user credentials, for example, usernames and passwords. A variety of
anti-phishing strategies have been introduced in an effort to address this problem based on the
URL ,the Blacklist, the DNS, the Whitelist or the Visual appearance. The visual similarity anti-
phishing technique relieson the fact that the attacker always uses web pages that look like
genuine websites to trick internet users into inputting their sensitive information [7].
A number of researchers have investigated the use of visual similarity as an approach to detect
phishing web pages. The authors in [8] analysed images in webpages by measuring similarity
scores using an image processing mechanism. The snapshot of legitimate and suspected pages
was divided into blocks and matched using Earth Mover’s Distance algorithm. Their results show
a high detection rate (99.6%) and they concluded that the detection of a phishing attack is more
robust when the image-based approach is used than when the HTML-based technique is used.
The authors in [9] used a compression algorithm called Normalised Compression Distance
(NCD) to compute similarities based on distance between the image of a requested website and
the image of a cached benign website. The results were submitted to a classifier (C4.5, Ripper,
Logistic and SVM) which set off an alarm if the webpage was marked as a phish. The
researchers’ results showed a high true positive rate of 99.99%; however, their method of
4. International Journal of Computer Networks & Communications (IJCNC) Vol.12, No.5, September 2020
44
calculating visual similarity is complex and unpractical and cannot be implemented with real-
time browsing.
Recently, computer vision based mechanisms have been used to analyse and evaluate website
similarities and detect phishing attacks. The authors in [10] used a Histogram of Oriented
Gradients (HOG) descriptor to extract website features and to compute the similarity metric
between legitimate and suspicious webpages. The results highlighted the effectiveness of using
the HOG descriptor for webpage phishing detection, especially for zero-hour attacks. An
extension to the research study in [10] was presented by the authors of [11].They conducted a
comparative study on the performance of five compact visual descriptors (SCD, FCTH, JCD,
CEDD and CLD) with two machine learning techniques (random forest and SVM) to detect web
phishing attacks from screenshots of legitimate and phished websites. According to their results,
the SCD with RF delivered the highest F1 score at 0.895 for the analysis of the whole website
image.
Instead of investigating the similarity of the whole webpage, the authors in [12] proposed
focusing on the logo of a website to detect phishing. Their study encompassed two processes:
logo-extraction and web-site identity verification. They used a machine learning algorithm
(SVM) to detect and extract the correct logo image from all downloaded web site images.
Furthermore, to obtain the corresponding domains of the right logo image, they used the Google
image search engine to find the domains of reliable websites and compare with the domains of
the suspected webpages. The results revealed the effectiveness of using a website logo to detect
phishing websites. Moreover, the authors in [13]used a similar method to analyse a favicon image
instead of the whole webpage image to detect phishing websites. The authors extract favicon
photos from the webpage and used the Google engine to acquire features that can be used to
distinguish between suspect and legal websites. However, since both research [12] and [13] rely
on a search engine, the threat of DNS spoofing is of concern.
The authors in [14] combined the global-image (snapshot image) feature as in [8],[9] and local-
image (logo) feature as in [12], but relied purely on the image level to detect phishing webpages.
The authors used a novel technique to detect a logo and a modified EMD algorithm to calculate
the similarity score for a snapshot of websites. If suspected webpages have a logo as a legitimate
page and the global similarity score exceeds the defined threshold, then that page is marked as a
phishing webpage. The results proved the effectiveness of the proposed method with up to 90%
true positive rate and 97% true negative rate.
Other research focused on page layout similarity to detect phishing pages. A phishing webpage
often has a similar appearance to the legitimate webpage, and hence, the page layouts of the two
webpages are expected to be identical. In [15], the authors conducted research based on the
Document Object Model (DOM) tree in order to detect malicious pages. In their proposed
technique, when a user reuses the same information (user credentials) on a website that has a
different domain from that of previously visited pages, the DOM tree of the first webpage where
the user credentials were initially entered is compared with that of the recent webpage to detect a
phishing attack. Their results show a high positive rate, but their approach is ineffective if the
attacker changes the DOM tree for phishing pages.
Similar to [15] in terms of using page layout similarity to recognise a phishing website, but
focusing on the similarity of Cascading Style Sheets, the authors in [16] presented an algorithm
to compare the similarity of Cascading Style Sheets of malicious and genuine pages. In their
proposed approach, if the CSS layout of both a suspicious page and an original page exceeds a
defined threshold and the two pages have a different URL, the suspicious page is reported as
5. International Journal of Computer Networks & Communications (IJCNC) Vol.12, No.5, September 2020
45
phishing. The results revealed a high detection rate. However, depending solely on the CSS to
detect phishing attacks is insufficient since not all websites have a CSS layout.
The authors in [17] combined page layout similarity and a snapshot of websites to detect phishing
attacks. They conducted research to propose a phishing detection approach based on image and
CSS layout. They used a database to store the CSS layout and screenshots of legitimate web
pages. Their approach helps detect suspicious websites that match the visual appearance or CSS
layout of original websites. The results revealed that 22% of phishing webpages had been
classified incorrectly because many phishing websites do not have a CSS layout.
In addition, several researchers have used a hybrid approach with a visual similarity mechanism
in order to increase the effectiveness of detecting a webpage phishing attack. The authors in
[18]proposed an approach that depends on the visual and textual features of pages. Their
approach included words that appear in a webpage and a set of visual features such as images,
page layout and logos. The research was conducted using a naïve Bayes classifier to extract
textual items from webpages and Earth Mover's Distance classifier to deal with the image of the
webpage. Moreover, they used a Bayesian method as a fusion algorithm to aggregate the
outcomes of the two classifiers and distinguish between phishing and legitimate pages. Their
results illustrated the capability of the proposed mechanism to increase the accuracy of detection
of a phishing attack. Furthermore, in [19], the authors computed a single similarity score called
the signature, which consists of website text pieces, images and visual appearance, to detect a
phishing attack. The authors matched between the signature of a suspect website and the saved
signatures of legal webpages. If the two signatures had a high rate of similarity a phishing attack
will be identified. The findings showed the efficacy of the suggested strategy, with a 0% false
positive rate and a 0%false negative rate. Table 1 presents a summary of web phishing detection
approaches based on visual similarity.
Table 1. Summary of web phishing detection approaches based on visual similarity.
Paper Features
Search engine
dependence
Advantages Drawbacks
[8] Image (EMD) No High detection rate
(99.6%)
Image processing complex.
Can’t detect new phishing.
[9] Image (NCD) No High true positive
rate of 99.99%
Complex method and can't
implement with real time
browsing.
[10] Image (HOG) No Detect zero-hour
phishing attack
Define the similarity
threshold value for phishing
alarm
[11] Image (compact
visual descriptors)
No Detect zero-hour
phishing attack
It is time-consuming.
[12] Logo Yes Can detect new
phishing webpage
High false negative rate of
13%
[13] Favicon Yes Detect new phishing
webpage
Not all websites have
Favicon
[14] Image and logo Yes 90% true positive
rate, 97% true
negative rate.
FN increase if phisher page
does not contain an official
logo
6. International Journal of Computer Networks & Communications (IJCNC) Vol.12, No.5, September 2020
46
[15] DOM layout No High true positive
rate (100%)
Fail if the attacker changed
DOM.
Fail if the phisher page uses
images only.
[16] CSS layout Yes High true positive
rate 99%
Can’t detect new phishing
webpage
[17] image and CSS
layout
No Using image and
CSS layout
22% of phishing webpages
classified incorrectly
[18] Hybrid (Textual
and image)
No Used the Bayesian
theory to define
threshold
Can’t detect new phishing
webpage
time consuming
[19] Hybrid (text,
image & visual
appearance)
No Detect embedded
objects
Consume a long time to
measure signature of the
websites
Approaches to detect web phishing attacks based on URLs mainly differ in three things: type of
URL features, method deployed to extract features and the classification algorithms used.
Several researchers focus on URL string analysis and external information to define the feature
vectors of the URLs. The authors of [20]employed a method using a stacked restricted Boltzmann
machine to select a feature from the vector format of URL lexical and host-based features and
then fed these features into four classification algorithms: the artificial neural network (ANN),
the deep belief network (DBN), the Naïve Bayes (NB) and the support vector machine (SVM) in
order to detect malicious URLs. Their results demonstrate that the performance of the deep DBN
exceeds the NB, SVM and ANN in terms of accuracy, the true positive rate and the false positive
rate. The merit of their model is the ability to detect each class of malicious URLs: spam,
phishing and malware attacks. Furthermore, the authors of [21] analysed suspicious websites
using a method referred to as SHLR, which consists of three parts: first, recognising a benign
website solely by using the title tag content of the website as the main entry to a search engine;
second, if the website could not be identified as un-phishing, extracting the URLs features (word
lists) and using seven heuristic rules to detect phishing webpages; and, finally, using three
classifiers to recognise the residual websites, namely: a logistic regression, Naive Bayes and
SVM, and extracting URL features from Who is, HTML, DNS, lexical features and similarity
with phishing vocabulary. Their results show a high accuracy rate of 98.8%, but their approach
depends on a search engine and a third party. Using web host characteristics and the features of
the URL string, the authors of [22] proposed a method that uses a bidirectional LSTM algorithm
based on a recurrent neural network and the convolution neural network that encompasses three
features: the URL static vocabulary feature, the texture fingerprint feature, and the URL word
vector feature. These three features are combined and applied to the model in order to detect a
phished website. Their results show a high accuracy rate of 99.45%.
Unlike the studies outlined above, which detected malicious URLs based on groups of the URL
lexical and host-based features, some researchers use a maximum of one or two URL external
features with lexical features. The authors of [23] proposed an approach that uses a support
vector machine algorithm with five lexical URL features and a similarity index feature to detect a
web phishing attack. Their results show that using the similarity index feature increased the
system detection rate by 21.8% to reach the highest recognition rate of 95.80%. Moreover, the
researchers in [24] proposed a method to detect phishing attacks based on a random forest
algorithm and by using eight features from the URL string and the webpage (page rank and
Google index). They evaluated the classifier in terms of various metrics using the random forest
algorithm with an accuracy rate of 95%.However,their technique relies on a third party service.
7. International Journal of Computer Networks & Communications (IJCNC) Vol.12, No.5, September 2020
47
Most recent studies focused solely on the lexical features extracted from the URL string since this
can be extracted quickly, does not require any execution for the URL and has a good performance
[25]. The authors of [26] proposed a method that uses the URL string as a raw short character
which inputs directly to a convolution neural network that extracts and aggregates locally
detected features and then classifies the malicious URLs. They compared their results, based on a
method that extracted features automatically, with two manual feature extraction methods and
concluded the effectiveness of their approach based on the convolution neural networks. The
authors in [27] proposed a method that extracted a byte values vector from URL characters and
fed them into a neural network algorithm to classify the input URL as a benign or phishing URL.
They investigated their model with three optimizers: SGD, Adam and AdaDelta, and declared
that the best optimizer for the deployed neural network model was Adam, with an accuracy rate
of 94.18%.
In [28] the authors extended what was done in [27] by using the character-level text features of
the URL string and investigating the performance of five deep learning algorithms: long short-
term memory (LSTM), convolutional neural network-long short-term memory (CNN-LSTM),
convolution neural network (CNN), identity-recurrent neural network (I-RNN) and recurrent
neural network (RNN) to differentiate between phishing and un-phishing webpages. Their
experimental results proved that extracting URL features by using any deep learning algorithm
outperformed the manual feature extraction methods in particular with the LSTM and the CNN-
LSTM model, which obtained the best accuracy rates at 99.96% and 99.95% respectively.
In contrast to the research by [27] and [28], the authors in [29] focused on NLP and word vector
features extracted from the URL string. They investigated the performance of seven machine-
learning algorithms (Adaboost, Naive Bayes, SMO, Random Forest, K-star, KNN and Decision
Tree) with three different types of URL string feature sets: Word Vectors, NLP and Hybrid (NLP
and word vectors) to detect phishing attacks. It is obvious from their results that using NLP or
hybrid features delivers higher performance compared to the use of Word Vectors features and
the best accuracy rate (97.98%) acquired by combing the RF classifier with NLP features. Their
method is independent of third-party services and can be executed as a real time anti-phishing
detection model.
A mix of the lexical URL features (characters and words features) has been used by the authors in
[25] to propose an URLNet technique that uses convolution neural networks as a way to
automatically learn features and do the classification task in order to distinguish malicious URLs
from benign URLs. Their results show that the performance of the URLNet model with
characters and words features significantly outperforms the URLNet model based on character
features only and the URLNet model based on word features only. Furthermore, the author of
[30] relies on structure and semantic features of the URL string to propose a method using a
bidirectional LSTM network based on the recurrent neural network )RNN( to extract the global
features of the URL string and, after that, using the convulsion neural network to extract the local
features of the URL string, then merging the extracted characteristics into a fixed length vector
which is used to classify the URLs into legitimate or phishing webpages. Their results show that
their approach achieved a high accuracy rate of 97%. Table 2 provides a summary of web
phishing detection approaches based on URLs.
8. International Journal of Computer Networks & Communications (IJCNC) Vol.12, No.5, September 2020
48
Table 2. A summary of web phishing detection approaches based on URLs.
Paper Features
Search engine
or third-
party
dependence
Advantages Drawbacks
[20] lexical and host-
based
Yes Ability to detect each
class of malicious
URLs
Complex method
[21] lexical and host-
based
Yes High accuracy rate
98.8%,
Consumes a time to extract
features
[22] lexical and host-
based
Yes High accuracy rate
of 99.45%.
Consumes a time to extract
features
[23] lexical and one
external
information
Yes Recognition rate
95.80%.
Small dataset size
[24] lexical and two
external
information
Yes high accuracy rate
95%
Low performance
[26] lexical features
(characters)
No Automatic way to
extract features
Ignores extracting features from
words in the URLs.
[27] lexical features
(characters)
No High accuracy
95.17%
When dataset changed, the
results changed
[28] lexical features
(characters)
No High accuracy
99.96%
Complex architecture
[29] lexical features
(NLP and words)
No A real time anti-
phishing detection
model.
High accuracy rate
97.98
NLP can't directly handle
symbols in the URLs.
[25] lexical features
(characters and
words)
No Automatic way to
extract features
Requires large data sets to
works in an end-to-end manner
[30] lexical features
(structure and
semantic
features)
No Fast
High accuracy rate
of 97%.
If the URL lacks the relevant
semantics, it may cause the
wrong classification.
4. PROPOSED METHOD
Since the goal of our proposed method is to detect phished websites by using the URL and the
screenshots of suspicious webpages, we formulated this issue as a binary classification problem
with the URL and images of websites as input leading to their classification as either legitimate
websites or phished websites. The classification task is done by training a network to produces 0
if the input URL or website image is classified as a legitimate webpage or 1 if the URL or the
screenshot of the website is classified as a phishing webpage.
Our proposed method can be divided into three parts, as shown in Figure 1. In the first part, the
pre-processing task is done with the URL string representing each character individually as a
vector. The entire URL (a series of characters) is transformed into a matrix representation,
whereas for the website image, the pre-processing task is completed by creating a matrix of pixel
9. International Journal of Computer Networks & Communications (IJCNC) Vol.12, No.5, September 2020
49
values. In the second part, CNN1 receives the website image as input, then extracts its features.
The output is the classification of the webpage as either legitimate or malicious. Simultaneously,
the CNN2 receives the URL of the website as an entry and then extracts its features. The outcome
is identification of the webpage as phishing or legitimate. Finally, the two results are combined to
decide whether there has been a phishing attack. This is the case if the classification outcomes of
the first or the second CNN is that it is a phishing webpage; otherwise, the website is considered
to be legitimate.
Figure 1. Structure of our model
For building blocks of CCN1 and CCN2, we created a convolution neural network (CNN)
containing two convolutional layers and two max pooling layers to extract the most important
website image features and URL features. These features are fed into a fully connected layer to
classify websites as phishing websites or legitimate webpages.
The design of the convolution neural network that we deployed in each block was identical, but
CNN1 deals with the website image, whereas the CNN2 works with the URLs. The operations
that are executed in each block can be split into three phases: In the initial phase, website's
images and URLs will pass over a set of convolution layers with kernels (filters) one by one in
order to extract the local features for each image or URL. Using a collection of weights, the filter
convolves with the input images or URLs in order to compute a new feature map. The outcomes
of the convolution are then passed on via a nonlinear activation function that is used to accelerate
the learning process. More formally, the convolution procedure to compute a feature map can be
defined as follows:
Yi = f (Wi * m) (1)
Where m refers to input, Wi refers to the convolution filter associated with the ith feature map;
the multiplication sign refers to the operation of computing a new feature map by the ith filter; f
refers to the activation function and Yi refers to the ith output feature map.
In this research, we used two convolutional layers with filter size 5x5 and rectified linear units as
an activation function. The second phase was done by applying a pooling layer to minimise the
feature dimension and extract the most significant features. We used two pooling layers with a
max pooling. The reciprocal between convolution layers and pooling layers makes it possible to
combine the outputs of the two convolution filters to generate the final features vector which is a
one-dimensional vector representing the outputs of the two coevolution layers and pooling layers
[6],[31].
10. International Journal of Computer Networks & Communications (IJCNC) Vol.12, No.5, September 2020
50
In the third phase, since the feature vector was extracted in the previous phase, we used a fully
connected layer with a sigmoid function in order to create a non-linear combination of extracted
features and classify the input as legitimate or phishing webpages. In order to evaluate the
performance of our model, we used accuracy as the main metric. This is the most commonly used
experimental criteria and is given by the following equation:
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑃 + 𝑇𝑁
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
(2)
In addition, we evaluated the model using the three famous metrics used in website phishing
detection research: precision, recall and F1 score, the descriptions of which are given by the
following equations [32]:
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑃
𝑇𝑃 + 𝐹𝑃
(3)
𝑅𝑒𝑐𝑎𝑙𝑙 =
𝑇𝑃
𝑇𝑃 + 𝐹𝑁
(4)
𝐹1 =
2 ∗ 𝑅𝑒𝑐𝑎𝑙𝑙 ∗ 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛
𝑅𝑒𝑐𝑎𝑙𝑙 + 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛
(5)
5. EXPERIMENT RESULTS AND COMPARISON
To implement our approach we used Python Programming Language and a dataset
containing2000 screenshots and URLs of legitimate and phishing websites. We measured the
performance of the proposed method in terms of accuracy, F1 score, precision and recall. The
experiment result is shown in Table 3. The results reflect the effectiveness of using CNNs as an
automatic way to extract features and carry out a classification task to distinguish between
reliable and phishing webpages.
Table 3. The results of our model
Metric Value
Accuracy 99.67 %
Precision 99.43 %
F1 score 99.28 %
Recall 99.47 %
Moreover, since several variables have a significant role in the performance of the convolution
neural network structure, the effects of changing the batch size on the proposed model have been
investigated. As shown in Figure 2, the accuracy of our model decreased as the batch size
increased and our proposed model obtained the highest accuracy when the batch size was16.
11. International Journal of Computer Networks & Communications (IJCNC) Vol.12, No.5, September 2020
51
Figure 2. Batch sizes impact on the model
A great deal of research has been conducted using the convolution neural networks (CNNs) on its
own or combined with other algorithms to detect a web phishing attack based on URLs or content
of websites. The novelty of our proposed method is its use of an URL and a screenshot of a
website to detect a phishing attack using only CNNs. We compared our results with [33], [34],
[30] and [28] where only CNNs were used as shown in Figures 3,4, 5 and 6.
Figure 3. The accuracy of our work with other methods
Figure 4. The precision of our work with other methods
97.5
98
98.5
99
99.5
100
16 32 128 256
Accuracy%
Batch size
88
90
92
94
96
98
100
Contents [33] Contents [34] URLs [30] URLs [28] Our work
Accuracy%
Features
88
90
92
94
96
98
100
Contents [33] Contents [34] URLs [30] URLs [28] Our work
Precision%
Features
12. International Journal of Computer Networks & Communications (IJCNC) Vol.12, No.5, September 2020
52
Figure 5. The F1 score of our work with other methods
Figure 6. The Recall of our work with other methods
It is clear from Figures 3, 4, 5 and6 that our proposed model is the optimal one. It performed
better than the models used in [33], [34], [30] and [28] in almost all aspects. Furthermore, our
model relies only on an URL and a screenshot of a website, which makes it independent of the
website language and contents. It can detect phishing even if the URLs are shortened or hidden.
6. CONCLUSIONS
This study explored the possibility of detecting a phishing attack by distinguishing between the
URLs and images of legitimate websites and the URL and images of phishing websites using
CNNs. We proposed a way that can be used to detect newly created phishing webpages based
only on the URL and the screenshot of suspicious websites. The proposed model shows a
classification accuracy of 99.67%.
Based on the obtained results, it can be concluded that combining URLs features and visual
similarity using convolution neural networks is a highly effective approach which is superior to
other approaches when it comes to performing automatic features extraction and the classification
of websites into phishing or legitimate websites. It is clear, furthermore, that increasing batch
sizes leads to the lowering of the accuracy of the model.
For future work, we suggest improving the model by finding a way to automatically identify the
lowest URL length and the smallest screenshot size of webpages thus contributing to the
optimum performance of the proposed method.
85
90
95
100
Contents [33] Contents [34] URLs [30] URLs [28] Our work
F1score% Features
88
90
92
94
96
98
100
Contents [33] Contents [34] URLs [30] URLs [28] Our work
Recall%
Features
13. International Journal of Computer Networks & Communications (IJCNC) Vol.12, No.5, September 2020
53
CONFLICT OF INTEREST
The authors declare no conflict of interest.
REFERENCES
[1] H. Thakur, “Available Online at www.ijarcs.info A Survey Paper On Phishing Detection,” vol. 7, no.
4, pp. 64–68, 2016.
[2] G. Varshney, M. Misra, and P. K. Atrey, “A survey and classification of web phishing detection
schemes,” Security and Communication Networks. 2016, doi: 10.1002/sec.1674.
[3] E. S. Aung, T. Zan, and H. Yamana, “A Survey of URL-based Phishing Detection,” pp. 1–8, 2019,
[Online]. Available: https://db-event.jpn.org/deim2019/post/papers/201.pdf.
[4] S. Nakayama, H. Yoshiura, and I. Echizen, “Preventing false positives in content-based phishing
detection,” in IIH-MSP 2009 - 2009 5th International Conference on Intelligent Information Hiding
and Multimedia Signal Processing, 2009, doi: 10.1109/IIH-MSP.2009.147.
[5] A. K. Jain and B. B. Gupta, “Phishing detection: Analysis of visual similarity based approaches,”
Security and Communication Networks. 2017, doi: 10.1155/2017/5421046.
[6] A. Khan, A. Sohail, U. Zahoora, and A. S. Qureshi, “A Survey of the Recent Architectures of Deep
Convolutional Neural Networks,” pp. 1–68, 2019, doi: 10.1007/s10462-020-09825-6.
[7] J. Mao et al., “Phishing page detection via learning classifiers from page layout feature,” Eurasip J.
Wirel. Commun. Netw., 2019, doi: 10.1186/s13638-019-1361-0.
[8] I. F. Lam, W. C. Xiao, S. C. Wang, and K. T. Chen, “Counteracting phishing page polymorphism: An
image layout analysis approach,” in Lecture Notes in Computer Science (including subseries Lecture
Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2009, doi: 10.1007/978-3-642-
02617-1_28.
[9] T. C. Chen, T. Stepan, S. Dick, and J. Miller, “An anti-phishing system employing diffused
information,” ACM Trans. Inf. Syst. Secur., vol. 16, no. 4, 2014, doi: 10.1145/2584680.
[10] A. S. Bozkir and E. A. Sezer, “Use of HOG descriptors in phishing detection,” in 4th International
Symposium on Digital Forensics and Security, ISDFS 2016 - Proceeding, 2016, doi:
10.1109/ISDFS.2016.7473534.
[11] F. C. Dalgic, A. S. Bozkir, and M. Aydos, “Phish-IRIS: A New Approach for Vision Based Brand
Prediction of Phishing Web Pages via Compact Visual Descriptors,” ISMSIT 2018 - 2nd Int. Symp.
Multidiscip. Stud. Innov. Technol. Proc., 2018, doi: 10.1109/ISMSIT.2018.8567299.
[12] K. L. Chiew, E. H. Chang, S. N. Sze, and W. K. Tiong, “Utilisation of website logo for phishing
detection,” Comput. Secur., 2015, doi: 10.1016/j.cose.2015.07.006.
[13] K. L. Chiew, J. S. F. Choo, S. N. Sze, and K. S. C. Yong, “Leverage Website Favicon to Detect
Phishing Websites,” Secur. Commun. Networks, 2018, doi: 10.1155/2018/7251750.
[14] Y. Zhou, Y. Zhang, J. Xiao, Y. Wang, and W. Lin, “Visual similarity based anti-phishing with the
combination of local and global features,” in Proceedings - 2014 IEEE 13th International Conference
on Trust, Security and Privacy in Computing and Communications, TrustCom 2014, 2015, doi:
10.1109/TrustCom.2014.28.
[15] A. P. E. Rosiello, E. Kirda, C. Kruegel, and F. Ferrandi, “A layout-similarity-based approach for
detecting phishing pages,” in Proceedings of the 3rd International Conference on Security and
Privacy in Communication Networks, SecureComm, 2007, doi: 10.1109/SECCOM.2007.4550367.
[16] J. Mao, P. Li, K. Li, T. Wei, and Z. Liang, “BaitAlarm: Detecting phishing sites using similarity in
fundamental visual features,” in Proceedings - 5th International Conference on Intelligent
Networking and Collaborative Systems, INCoS 2013, 2013, doi: 10.1109/INCoS.2013.151.
[17] S. Haruta, H. Asahina, and I. Sasase, “Visual Similarity-Based Phishing Detection Scheme Using
Image and CSS with Target Website Finder,” in 2017 IEEE Global Communications Conference,
GLOBECOM 2017 - Proceedings, 2017, doi: 10.1109/GLOCOM.2017.8254506.
[18] H. Zhang, G. Liu, T. W. S. Chow, and W. Liu, “Textual and visual content-based anti-phishing: A
Bayesian approach,” IEEE Trans. Neural Networks, 2011, doi: 10.1109/TNN.2011.2161999.
[19] E. Medvet, E. Kirda, and C. Kruegel, “Visual-similarity-based phishing detection,” Proc. 4th Int.
Conf. Secur. Priv. Commun. Networks, Secur., no. September 2008, 2008, doi:
10.1145/1460877.1460905.
14. International Journal of Computer Networks & Communications (IJCNC) Vol.12, No.5, September 2020
54
[20] S. G. Selvaganapathy, M. Nivaashini, and H. P. Natarajan, “Deep belief network based detection and
categorization of malicious URLs,” Inf. Secur. J., vol. 27, no. 3, pp. 145–161, 2018, doi:
10.1080/19393555.2018.1456577.
[21] Y. Ding, N. Luktarhan, K. Li, and W. Slamu, “A keyword-based combination approach for detecting
phishing webpages,” Comput. Secur., vol. 84, pp. 256–275, 2019, doi: 10.1016/j.cose.2019.03.018.
[22] H. huan Wang, L. Yu, S. wei Tian, Y. fang Peng, and X. jun Pei, “Bidirectional LSTM Malicious
webpages detection algorithm based on convolutional neural network and independent recurrent
neural network,” Appl. Intell., vol. 49, no. 8, pp. 3016–3026, 2019, doi: 10.1007/s10489-019-01433-
4.
[23] M. Zouina and B. Outtaj, “A novel lightweight URL phishing detection system using SVM and
similarity index,” Human-centric Comput. Inf. Sci., vol. 7, no. 1, pp. 1–13, 2017, doi:
10.1186/s13673-017-0098-1.
[24] S. Parekh, D. Parikh, S. Kotak, and S. Sankhe, “A New Method for Detection of Phishing Websites:
URL Detection,” Proc. Int. Conf. Inven. Commun. Comput. Technol. ICICCT 2018, no. Icicct, pp.
949–952, 2018, doi: 10.1109/ICICCT.2018.8473085.
[25] H. Le, Q. Pham, D. Sahoo, and S. C. H. Hoi, “URLNet: Learning a URL Representation with Deep
Learning for Malicious URL Detection,” no. i, 2018, [Online]. Available:
http://arxiv.org/abs/1802.03162.
[26] J. Saxe and K. Berlin, “eXpose: A Character-Level Convolutional Neural Network with Embeddings
For Detecting Malicious URLs, File Paths and Registry Keys,” 2017, [Online]. Available:
http://arxiv.org/abs/1702.08568.
[27] K. Shima et al., “Classification of URL bitstreams using bag of bytes,” 21st Conf. Innov. Clouds,
Internet Networks, ICIN 2018, pp. 1–5, 2018, doi: 10.1109/ICIN.2018.8401597.
[28] R. Vinayakumar, K. P. Soman, and P. Poornachandran, “Evaluating deep learning approaches to
characterize and classify malicious URL’s,” J. Intell. Fuzzy Syst., vol. 34, no. 3, pp. 1333–1343,
2018, doi: 10.3233/JIFS-169429.
[29] O. K. Sahingoz, E. Buber, O. Demir, and B. Diri, “Machine learning based phishing detection from
URLs,” Expert Syst. Appl., vol. 117, pp. 345–357, 2019, doi: 10.1016/j.eswa.2018.09.029.
[30] W. Wang, F. Zhang, X. Luo, and S. Zhang, “PDRCNN: Precise Phishing Detection with Recurrent
Convolutional Neural Networks,” Secur. Commun. Networks, 2019, doi: 10.1155/2019/2595794.
[31] S. Khan, H. Rahmani, S. A. A. Shah, and M. Bennamoun, “A Guide to Convolutional Neural
Networks for Computer Vision,” Synth. Lect. Comput. Vis., 2018, doi:
10.2200/s00822ed1v01y201712cov015.
[32] V. Karthikeyani and S. Nagarajan, “Machine Learning Classification Algorithms to Recognize Chart
Types in Portable Document Format (PDF) Files,” Int. J. Comput. Appl., 2012, doi: 10.5120/4789-
6997.
[33] M. A. Adebowale, K. T. Lwin, and M. A. Hossain, “Deep learning with convolutional neural network
and long short-term memory for phishing detection,” 2019 13th Int. Conf. Software, Knowledge, Inf.
Manag. Appl. Ski. 2019, no. March 2019, doi: 10.1109/SKIMA47702.2019.8982427.
[34] C. Opara, B. Wei, and Y. Chen, “HTMLPhish: Enabling Phishing Web Page Detection by Applying
Deep Learning Techniques on HTML Analysis,” no. October 2018, 2019, [Online]. Available:
http://arxiv.org/abs/1909.01135.