SlideShare a Scribd company logo
1 of 5
Download to read offline
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 06 | June 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 269
DETECTING MALICIOUS URLS USING MACHINE
LEARNING TECHNIQUES: A COMPARATIVE LITERATURE REVIEW
Lekshmi A R1, Seena Thomas2
1M.Tech Student in Computer Science and Engineering at LBS Institute of Technology for Women, Trivandrum,
Kerala.
2Associaste professor in Computer Science and Engineering department at LBS Institute of Technology for Women,
Trivandrum, Kerala.
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Today the most important concern in the field of
cyber security is finding the serious problemsthatmakeloss in
secure information. It is mainly due to malicious URLs.
Malicious URLs are generated daily. This URLs are having a
short life span. Various techniques are used by researchers for
detecting such threats in a timely manner. Blacklist method is
famous among them. Researchers uses this blacklist method
for easily identifying the harmful URLs. They are very simple
and easy method. Due to their simplicity they are used as a
traditional method for detecting such URLs. But this method
suffers from many problems. The lack of ability in detecting
newly generated malicious URLs is one of the maindrawbacks
of Blacklist method. Heuristic approach is also used for
identifying some common attacks. It is an advancedtechnique
of Blacklist method. But this method cannot be used for all
type of attacks. So this method is used very shortly. For a good
experience, the researchers introduce machine learning
techniques. Machine Learning techniques go through several
phases and detect the malicious URLs in an accurate manner.
This method also gives the details about the falsepositiverate.
This review paper studies the different phases such as feature
extraction phase and feature representationphaseof machine
learning techniques for detecting malicious URLs. Different
machine learning algorithms used for such detection is also
discuss in this paper. And also gives a better understanding
about the advantage of using machine learning over other
techniques for detecting malicious URLs and problems it
suffers.
Key Words: Blacklist, Cyber Security, Malicious URL.
1. INTRODUCTION
The growth and promotion of businesses spanning across
many applications including online-banking, e-commerce,
and social networking due to the advent of new
communication technologies. The use of the World Wide
Web has increasing day by day. By using the Internet, most
of the time malicious software, shortly named malware, or
attacks are propagated. Delivering malicious content on the
web has become a usual technique for bad actors due to
increased internet access by more than half of the world
population. The explicit hacking attempts, drive-by exploits,
social engineering, phishing, watering hole, man-in-the
middle, SQL injections, loss/theft of devices, denial of
service, distributed denial of service, and many others are
variety of techniquesusedto implementwebsiteattacks.The
limitations of traditional security management technologies
are becoming more and more serious given this exponential
growth of new security threats, rapid changes of new IT
technologies, and significant shortage of security
professionals. By spreading compromised URLs most of
these attacking techniques are realized.
Malicious URLs are compromised URLs that are used for
cyber-attacks. To avoid information loss the URL
identification is the best solution. So the identification of
malicious URL is always a hot area of information security.
Drive-by Download, Phishing and Social Engineering, and
Spam are most popular types of attacks using malicious
URLs. Downloading of malware by visiting a URL is referred
as Drive-by-download. By exploiting vulnerabilities in
plugins or inserting malicious code through JavaScript,
Drive-by-download attack is usually carried out. For
affecting the genuine web pages, the phishing and social
engineering attacks ploy users into disclosingtheirsensitive
information. Spam is used as voluntary messages for the
purpose of advertising or phishing. Every year this types of
attacks leads several problems. So the main concern exists
today is detecting such malicious URLs in a timely manner.
In this paper we mainly discuss about the different methods
used for detecting malicious URLs. This paper mainly focus
on the advantage of machine learning techniques in thefield
of detecting malicious URLs over other techniques. The
classification of feature representation stage of machine
learning techniques is explained in detail. The methodical
classification different feature extraction stage of machine
learning techniques is well explained. The paper provide a
good explanation about different machine learning
techniques used for detection. Moreover this paper gives an
additional information about the difficulties caused by user
in using machine learning techniques for the malicious URL
detection.
2. METHODS OF DETECTING MALICIOUS URLS
The main methods for detecting maliciousURLsareblacklist
method, heuristic method and machine learning method.
These are explained as follows.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 06 | June 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 270
2.1 Blacklist Approach
This is the most common method currently used for
malicious URL detection. This is one of the classical method
for detecting malicious URLs. According to Jian Zhang et al.
[1] the blacklist method is having a database consistsofa set
of URLs that are malicious in past. This URLs that are
malicious in past. This method is very fast and easy to
implement. Whenever a user visits into a new URL, then a
search for database is performed in blacklist. If the new URL
is already a member in the blacklist then a warning will be
generated to show that the URL is malicious otherwise the
URL is a benign one. This technique can have very low false
positive rates. The attackers making complications on
Blacklist Approaches.
S. Ganera et al. [2] identifies mainly four types of
complications in this approach. The first one is that using an
IP it makes confusion on host. Second reason for
complication is confusing the hostwithotherdomains.Third
one is making confusion on host by using large host names.
Final one is vitiate. A new complications in Blacklist
approach is found by the recent increase in the significant
extent of URL in a considerable amount i.e., the URL is made
substantially shorter and direct into a required page. Y.
Alshboul et al. [3] discovers this new complication on
blacklist approach that the battering of malicious URL
beyond a short URL. Fast-flux and algorithmic generation of
new URLs are some other techniques used by attackers to
avoid the blacklist approach.
S. Sinha et al. [4] proposes a reputation-based blacklist
approach. Here the compromised hosts as well as malicious
contents in URLs, network, and host can be identified. Then
using this information the access of web, emails and other
activities in malicious networks or host can be blocked.
Many organizations like intrusion detection, spamdetection
uses this type of approach. A SpamAssassin and DSpam are
two spam detector used in this approach for detecting
malicious URL in user’s mail. Manual training is needed for
DSpam for detection. A number of spam detectors are used
by SpamAssassin and scores of each detectors can be
assigned. According to his view a Searching for databases in
blacklist are done by IP address reversing, blacklistzone can
be appended and then a DNS lookup is done. But there
occurs many problems in this approach.Themostimportant
drawback of this approachis maintainingcomprehensive list
of malicious URLs is very difficult.
The blacklist approach is useless of predicting newly
generated new URLs. According to S. Sheng et al. [5] view it
is impossible to detect new ultimatum in daily generated
URLs. The other drawbacks of Blacklist method are which
halts the signature based tools from detecting the attacksby
complicating the code. More attacks are bring out by the
attackers and which amend the attack signature.
2.2 Heuristic or Rule-based Approach
The blacklist method can have the lack of ability to detect
newly generated malicious URLs. Soa supplement methodis
used by C. Seifert et al. [6] that is the Heuristic approach.
This method is mainly used for identifying the phishingsites
by extracting phishing site features. This method creates a
blacklist of signatures whenever a new URL is arrived. Then
it is analyzed with the available list of signatures. If there is a
match exists, then the URL is referred as malicious. In this
approach, a signature is assigned to each of the recognized
common attacks based on its behavior. Two methods are
used here to detect malicious URLsusingheuristicapproach.
They are Signature based method and Behavior based
method. The signature based detection method is described
in P. Gutmann [7]. It is like fingerprint property i.e., it’s an
eccentric property. From malicious sites, different patterns
are extracted. This type of method can have small error rate.
But this method can have a major disadvantage that it
requires additional amount of time, money and work force
for bring out distinctive signatures.
The behavior based detection method is described in [8].
Based on the behavior property it is concluded that whether
the URL is malicious or benign. Here the URLs with some
same behavior are collected for detection. Because of using
the system resources and services in a similarmanner,these
type of technique can be used in detectingURLsgeneratedin
a deviant form. According to G. Jacob et al. [9], this method
have a data collector, an interpreter and a matcher. The data
collector gather together both static and dynamic
information for execution. Interpreter is usedfortranslating
data collection module into intermediate representation.
Matcher examine the above representation with behavior
signature. The main goal of this method is to detect the
exotic and distinct malicious variants. But it uses large
amount of time for scanning, and it doesn’t contains details
about false positive ratios. Data mining techniquesas well as
machine learning techniques are used Heuristic approach to
acquire knowledge about the executable files behavior.In M.
Schultz et al. [11] describestheheuristicapproachuses Nave
Bayes and Multi Nave Bayes for classifying the URLs as
malicious or benign one. Nave Bayes is a classification
algorithm for binary and multi class classification problems.
While working on a dataset with millions of records with
some attributes this type of classification algorithm is used.
Nave Bayes method is from Bayes theorem. This classifier
assumes that all features are unrelated to each other.i.e.,the
presence or absence of a feature does not influence the
presence or absence of any other feature. But this algorithm
has a disadvantage that it cannot learns the relationship
between features since it consider all the feature to be
unrelated.
The Nguyen et al. [12] explains heuristic approach in his
paper. According to his view, the heuristic based detection
technique analyses and extracts phishing site features and
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 06 | June 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 271
detects phishing sites using that information. Then extract
features of URL in user-requested page and applies those
features to determine whether a requested site is a phishing
site. It helps to reduce damage caused due to phishing
attacks.
According to Nida Khan1 et al. [13], there are two module
for heuristic approach for detecting malicious URLs. In this
approach, the URL and DNS matching module is the first
module. This module contains a white-list like blacklist
method. The increase in running time and decrease in false
negative rate can be managed by the use of white-list. It
contains two parameters: domain name tie in with IP
address. When a user enters into a website, then the system
finding a match with domain name of current website and
white-list. If the website present in white-list then it can be
accessed by user, then for checkingDNSpoisoning attack the
system equates IP address of corresponding domain. At the
beginning the white-list start with zero. Because there is no
domain in the list. When a user access new web page or
enters a new URL, then the white-list start increasing. The
most important point is that while the website or URL
accessed by the user at the first time, the URL domain will
not present in the white-list. The second module is starts
from identification module. This module detect the phishing
web page or URL by extracting hyperlinks of web page and
applying the phishing detection algorithm such as ID3 or
C4.5 and finally a decision is taken such as whether the URL
of the web page is malicious or benign. If it is malicious a
warning will be generated. But the disadvantage of heuristic
approach is that it uses large amount of system resources
and the whenever visit in a website an immediate attack is
launched. Due to this the generated attack may go
undetected.
2.3 Machine Learning Approach
The drawbacks of blacklist methodandheuristic approachis
discussed in the previous two sections. To overcome these
problems, most of the researchers started applyingmachine
learning techniques in detectingmaliciousURLsfrombenign
one. From [14] the machine learning can have high demand
on Artificial Intelligence (AI). These technique provide a
system to learn by itself and improve from experience
without having any specific program. The main goal of
machine learning is to provide ability to the computer to
learn automatically without any interference with humans.
From [15] the important machine learning techniques are
supervised earning, unsupervised learning and semi
supervised learning. The supervised learning have a teacher
for giving guidance. This method take some data set and act
as a teacher. It guides the model or give training to the
machine. After proper guidance it starts take decisions on
the arrived new data. In unsupervised learning, they can
learn without the help of a teacher. I.e., from observations
and some structure of data. The working of this method is
different from supervised learning. This method create
clusters by finding patterns as well as relationships from a
given set of data. Semi supervised learning [14] is act
between supervised and unsupervisedlearning.Fortraining
it takes both labeled and unlabeled data. This method of
machine learning gives the better performance and high
accuracy. The machine learning techniques does the
following [15].
1. To create a model, the various machine learning
algorithm is trained by using a set of training data.
2. Once a new input data is entered into the machine
learning algorithm it takes some prediction on the basis of
the trained model.
3. The prediction taken in step 2 can be evaluated for
checking accuracy.
4. If the estimated accuracy is once tolerable, then the
machine learning algorithm is deployed. Otherwise using an
enhanced set of training data the machine learning
algorithm is again and again trained.
D. Sahoo et al. [10] states that for training the data, machine
learning techniques uses a set of URLs and for classifying a
URL as malicious or benign this method learns a prediction
model. This is mainly based on the statistical property of
URLs. The main advantage of this type method is that the
ability to specify new URLs.
3. MALICIOUS URL DETECTION: ALGORITHMS AND
PHASES
3.1. Batch learning algorithm and online learning
algorithm: The main algorithms used in machine learning
technique to detect malicious URLs are batch learning and
online learning. The batch learning in machine learning has
an assumption that before the training task the trainingdata
is available. SVM (Support Vector Machine) algorithm and
Nave Bayes algorithm are the two main families of batch
learning algorithm. SVM [17] is a supervised learning
method. It can make full use of structural risk minimization
principle with a maximum margin learning approach. The
main aim of SVM is to find a hyper plane for classifying the
data points in N dimensional space. So the main concern
keep in mind while finding a hyper plane is the hyper plane
can have maximum margin. That means the malicious class
and benign class can have maximum distance between the
data points. The hyper plane depends on the number of
input features. If the number of input feature is 2, then the
hyper plane is just a line. If the number of input feature is 3,
then the hyper plane becomes 2-dimensional. But it is very
difficult to imagine a hyper plane with more than 3 input
features. Due to this for considering the whole features of a
URL, SVM produces less accuracy. Nave Bayes [16] is an
important classification algorithm. It takes the tested URLs
as input and it gives the output that the testing domain
names with their attack type. It can do
the following steps.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 06 | June 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 272
1. Using the training set, it estimate the sub-features for
training purpose and using some Gaussian distribution, a
classifier is created.
2. Estimate the mean and variance of sub features found in
step 1.
3. For classification, it takes the calculated features for
testing sample.
4. Calculate the anatomy for benign, spam, malware classes.
5. Examine the values of anatomy.
6. For testing domain, highest value obtained in step 6 can
be assigned.
This method gives more accuracy than SVM for all features
of a URL. Because in SVM considering the whole features
of URL makes some complication. So it may affect in whole
performance. But there is no such issues in nave Bayes
method. They can improve performance of the entire
technique and gives more accuracy. Online learning
algorithms [18] are used for handling data with highvolume
data high velocity. It’s a family of scalable learning
algorithms. This algorithm is mainly trying to find
a solution for classification problems. During training, this
type of algorithm make a label prediction at each step. Then
they receives an actual label. Perceptron algorithm, Logistic
regression with SGD (Stochastic Gradient Descent), and
ConfidenceWeighted (CW) algorithm are some family of
algorithms included in online learning. The perceptron
algorithms are classical algorithms. It update the weight
vector whenever a malicious URL is detected. It is a linear
classifier algorithm. The rule of update in perceptron
algorithm is very simple. This algorithm cannot have any
chance for misclassification since it update rate are fixed. So
this method gives better results for detecting malicious
URLs. That is the false positive ratio is low here. The Logistic
Regression with SGD is an online method for providing an
approximation of gradient descent used in batch learning
algorithms. This method express the sum over individual
examples. So this algorithm is gives the details of all false
positives and false negatives in detail. It increases the
accuracy. But its performance is least when compared to
perceptron algorithm. For each feature, the CW algorithm
maintains a different confidence measure. Here more
confidence weights are more submissive than less confident
weight. In contrast with other algorithms it describes per-
feature confidence with a Gaussian distribution. So the
classification happens here in an efficient manner. The
accuracy of performance is high. The advantages of online
learning algorithm when compared to batch learning is that
its computation speed is high because in this algorithm
makes one pass over the entire data. Sincethebatchlearning
algorithm possess multi-pass on data. Due to the one pass
behavior of online learning, this type of algorithms are easy
to implement. Sometimes the online learning makes hard to
learn. Because for several cases most of the algorithms
cannot act correctly or may be it misbehaves. Both the batch
learning algorithm and online learning algorithmgothrough
two phases of machine learning to detect malicious URLs.
They are feature extraction and feature representation. The
machine learning technique first extracts all features of a
URL, then represent those features and then implement the
above algorithms.
3.2. Feature extraction phase: The Feature Extraction
Phase is to extract the different features of a URL. From
Anjali et al. [16] the feature extraction module mainly
consists of six features. They are lexical features, link
popularity features, webpage content features, network
features, DNS features and DNS fluxiness features. The
lexical features is mainly based on URL string or URL
name properties. The length of URL, length of each
component of URL such as host name, top-level domain and
primary domain, the number ofspecial charactersinURLare
the most commonly used statistical propertiesofURLstring.
These statistical properties areincludedintraditional lexical
features. In this feature a dictionary is constructed for each
types of words. So each of this word become a feature. A
BOW (Bag-Of-Words) model is used in this feature for
showing the value of feature as 0 or 1.i.e., while the word in
the dictionary is present in URL, then the feature value
shows 1 otherwise 0. According to [10] some advanced
lexical features are also used. It include directory related
feature like length of directory and number of subdirectory,
file name features like length of file name and the number of
delimiters and finally the argument features such as length
of arguments and number of variables.
From [16], the link popularity means the total amount of
incoming links from possible websites. Link popularity
features helps to identify malicious URL and benign one.
Because the benign URL can have large amount of link
popularity when compared to the malicious one. Malicious
URLs only possess a few amount of link quality. In this
method, the URLs domain link popularity and address link
popularity is employed.
Host based features [10] are very important in detecting
malicious URL detection. This feature includes properties of
IP address, WHOIS information, properties of domain name,
location information and connection speed.
DNS fluxiness features [16] are include in host based
features. This DNS fluxiness features are used for proxy
network identification and host changing. This property is
helpful for identifying URLs fluxiness nature. Webpage
content feature is included in content based features. This
type of features are heavy weighted. Here the main concern
is to extend lot of information within a safely manner. The
content based features gives the information about the
threads in earlier even if the detection of malicious URLs
using URLs featuresfails.Thewebpagecontentfeatureshave
the ability to count the number of HTML tags, iframe, lines
and hyperlinks in webpage.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 06 | June 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 273
DNS features and Network features are referred by the
authors in [16]. DNS features related to the name of an
address. It includes the count of resolved IP. Network
features gives the information about network such aslength
of downloaded packet content, number of bytes in
downloaded file, speed of download and domain lookup
time.
3.3. Feature representation phase: The Feature
Representation phase [10] includes featurescollectionstage
and feature pre-processing stage. The feature collection
stage collects the most important information about a URL.
The presence of URL in blacklist, URL string information,
host information, HTML content and JavaScript content of
the website, link popularity are included in this phase. The
feature pre-processing phase contains the URLs textual
description. i.e., the URLs unstructured information. This
information can fed into the various machine learning
algorithms by converting to a numerical vector.
4. CONCLUSION
Today the detection of malicious URL is very important. The
cybercrimes are increased day by day due to the unknown
malicious URLs usage in cyber security field. Various type of
methods are used in detecting this type of attacks. Among
this, machine learning technique is a favorable solution for
such attacks. In this survey different types of methods used
for detecting malicious URLs are discussed and also
identified the drawbacks of each method and advantages of
machine learning over all other methods. Even though
machine learning techniques arebetterandsuitablesolution
for detecting malicious URLs.
REFERENCES
[1] Jian Zhang, Phillip A. Porras, and Johannes Ullrich.Highly
predictive blacklisting. In Proceedings of the 17th USENIX
Security Symposium, July 28-August 1, 2008, San Jose, CA,
USA, pages 107122, 2008.
[2] S. Garera, N. Provos, M. Chew, and A. D. Rubin, A
framework for detection and measurement of phishing
attacks, in Proceedings of the 2007 ACM workshop on
Recurring malcode. ACM, 2007, pp. 18.
[3] Y. Alshboul, R. Nepali, and Y. Wang, Detecting malicious
short URLs on twitter, 2015.
[4] S. Sinha, M. Bailey, and F. Jahanian, Shades of grey:Onthe
effectiveness of reputation based blacklists,inMaliciousand
Unwanted Software, 2008. MALWARE 2008. 3rd
International Conference on. IEEE, 2008, pp. 5764.
[5] S. Sheng, B. Wardman, G. Warner, L.F.Cranor, J.Hong,and
C. Zhang, An empirical analysis of phishing blacklists, in
Proceedings of Sixth Conference on Email and AntiSpam
(CEAS), 2009.
[6] C. Seifert, I. Welch, and P. Komisarczuk, Identification of
malicious web pages with static heuristics, in
Telecommunication Networks and ApplicationsConference,
2008. ATNAC 2008. Australasian. IEEE, 2008, pp. 9196.
[7] P. Gutmann. The Commercial Malware Industry., 2007.
[8] KALPA, Introduction to Malware, 2011.
[9] G. Jacob, H. Debar, and E. Filiol, Behavioral detection of
malware: from a survey towards an established taxonomy,
Journal in Computer Virology, pp. 251266, 2008.
[10] Doyen Sahoo, Chenghao Liu, Steven C.H. Hoi, Malicious
URL Detection using Machine Learning: A Survey, 2017
International Conference on. IEEE.
[11] M. Schultz, E. Eskin, E. Zadok, and S. Stolfo, Data mining
methods for detection of new malicious executables.InIEEE
Symposium on Security and Privacy, pages 38-49. IEEE
COMPUTER SOCIETY, 2001.
[12] Nguyen, Luong Anh Tuan,”A novel approach for
phishing detection using URL-based heuristic.” Computing,
Management and Telecommunications (ComManTel), 2014
International Conference on. IEEE, 2014.
[13] Nida Khan1, Manliv Kaur1, Riddhi Panchal1, Prashant
Kumar Rai1, Nilesh Rathod2 Heuristic Based Approach for
Fraud Detection using Machine Learning International
Journal of Innovative Research in Computer and
Communication Engineering, Vol. 6, Issue 2, February 2018.
[14] Marco Varone, Daniel Mayer, Andrea Melegari
https://www.expertsystem.com/machinelearning-
definition/
[15] Atul Published on Jan 09, 2019
https://www.edureka.co/blog/what-ismachine-learning/
[16] Anjali B. Sayamber, Arati M. Dixit, Malicious URL
Detection and Identification International Journal of
Computer Applications, August 2014.
[17] M. A .Hearst, S. T. Dumais, E. Osman, J. PlattB.Scholkopf,
Support Vector Machines, Intelligent systems and their
applications, IEEE.vol.13, no.4.pp.18-28,1998.
[18] Justin M A, Berkeley Lawrence K. Saul, Stefan Savage
and Geoffrey M. Voelker, Learning to Detect Malicious URLs
ACM transactions on intelligent systemsandtechnology, vol.
2, no. 3, article 30, publication date: April.

More Related Content

What's hot

Malware Dectection Using Machine learning
Malware Dectection Using Machine learningMalware Dectection Using Machine learning
Malware Dectection Using Machine learningShubham Dubey
 
Internet worm-case-study
Internet worm-case-studyInternet worm-case-study
Internet worm-case-studyIan Sommerville
 
Introduction To OWASP
Introduction To OWASPIntroduction To OWASP
Introduction To OWASPMarco Morana
 
Understanding Cross-site Request Forgery
Understanding Cross-site Request ForgeryUnderstanding Cross-site Request Forgery
Understanding Cross-site Request ForgeryDaniel Miessler
 
Malware Detection - A Machine Learning Perspective
Malware Detection - A Machine Learning PerspectiveMalware Detection - A Machine Learning Perspective
Malware Detection - A Machine Learning PerspectiveChong-Kuan Chen
 
Traditional Firewall vs. Next Generation Firewall
Traditional Firewall vs. Next Generation FirewallTraditional Firewall vs. Next Generation Firewall
Traditional Firewall vs. Next Generation Firewall美兰 曾
 
Intrusion Detection And Prevention
Intrusion Detection And PreventionIntrusion Detection And Prevention
Intrusion Detection And PreventionNicholas Davis
 
Malware classification using Machine Learning
Malware classification using Machine LearningMalware classification using Machine Learning
Malware classification using Machine LearningJapneet Singh
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.pptneelamoberoi1030
 
Wired and Wireless Network Forensics
Wired and Wireless Network ForensicsWired and Wireless Network Forensics
Wired and Wireless Network ForensicsSavvius, Inc
 
Deep Web
Deep WebDeep Web
Deep WebSt John
 
cyber security and forensic tools
cyber security and forensic toolscyber security and forensic tools
cyber security and forensic toolsSonu Sunaliya
 
Footprinting and reconnaissance
Footprinting and reconnaissanceFootprinting and reconnaissance
Footprinting and reconnaissanceNishaYadav177
 
Anti-Money Laundering Solution
Anti-Money Laundering SolutionAnti-Money Laundering Solution
Anti-Money Laundering SolutionSri Ambati
 
cyber security notes
cyber security notescyber security notes
cyber security notesSHIKHAJAIN163
 

What's hot (20)

Sécurité des applications web
Sécurité des applications webSécurité des applications web
Sécurité des applications web
 
Malware Dectection Using Machine learning
Malware Dectection Using Machine learningMalware Dectection Using Machine learning
Malware Dectection Using Machine learning
 
Internet worm-case-study
Internet worm-case-studyInternet worm-case-study
Internet worm-case-study
 
DDoS.pptx
DDoS.pptxDDoS.pptx
DDoS.pptx
 
Introduction To OWASP
Introduction To OWASPIntroduction To OWASP
Introduction To OWASP
 
Understanding Cross-site Request Forgery
Understanding Cross-site Request ForgeryUnderstanding Cross-site Request Forgery
Understanding Cross-site Request Forgery
 
Malware Detection - A Machine Learning Perspective
Malware Detection - A Machine Learning PerspectiveMalware Detection - A Machine Learning Perspective
Malware Detection - A Machine Learning Perspective
 
Traditional Firewall vs. Next Generation Firewall
Traditional Firewall vs. Next Generation FirewallTraditional Firewall vs. Next Generation Firewall
Traditional Firewall vs. Next Generation Firewall
 
Phishing ppt
Phishing pptPhishing ppt
Phishing ppt
 
Intrusion Detection And Prevention
Intrusion Detection And PreventionIntrusion Detection And Prevention
Intrusion Detection And Prevention
 
Malware classification using Machine Learning
Malware classification using Machine LearningMalware classification using Machine Learning
Malware classification using Machine Learning
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.ppt
 
Wired and Wireless Network Forensics
Wired and Wireless Network ForensicsWired and Wireless Network Forensics
Wired and Wireless Network Forensics
 
Deep Web
Deep WebDeep Web
Deep Web
 
cyber security and forensic tools
cyber security and forensic toolscyber security and forensic tools
cyber security and forensic tools
 
OWASP TOP 10 VULNERABILITIS
OWASP TOP 10 VULNERABILITISOWASP TOP 10 VULNERABILITIS
OWASP TOP 10 VULNERABILITIS
 
Footprinting and reconnaissance
Footprinting and reconnaissanceFootprinting and reconnaissance
Footprinting and reconnaissance
 
malware analysis
malware  analysismalware  analysis
malware analysis
 
Anti-Money Laundering Solution
Anti-Money Laundering SolutionAnti-Money Laundering Solution
Anti-Money Laundering Solution
 
cyber security notes
cyber security notescyber security notes
cyber security notes
 

Similar to Machine Learning Techniques for Detecting Malicious URLs

Detecting malicious URLs using binary classification through ada boost algori...
Detecting malicious URLs using binary classification through ada boost algori...Detecting malicious URLs using binary classification through ada boost algori...
Detecting malicious URLs using binary classification through ada boost algori...IJECEIAES
 
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORK
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORKMALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORK
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORKijcseit
 
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORK
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORKMALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORK
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORKijcseit
 
State of the Art Analysis Approach for Identification of the Malignant URLs
State of the Art Analysis Approach for Identification of the Malignant URLsState of the Art Analysis Approach for Identification of the Malignant URLs
State of the Art Analysis Approach for Identification of the Malignant URLsIOSRjournaljce
 
IRJET- Detecting the Phishing Websites using Enhance Secure Algorithm
IRJET- Detecting the Phishing Websites using Enhance Secure AlgorithmIRJET- Detecting the Phishing Websites using Enhance Secure Algorithm
IRJET- Detecting the Phishing Websites using Enhance Secure AlgorithmIRJET Journal
 
Malicious Link Detection System
Malicious Link Detection SystemMalicious Link Detection System
Malicious Link Detection SystemIRJET Journal
 
PHISHING URL DETECTION USING LSTM BASED ENSEMBLE LEARNING APPROACHES
PHISHING URL DETECTION USING LSTM BASED ENSEMBLE LEARNING APPROACHESPHISHING URL DETECTION USING LSTM BASED ENSEMBLE LEARNING APPROACHES
PHISHING URL DETECTION USING LSTM BASED ENSEMBLE LEARNING APPROACHESIJCNCJournal
 
Phishing URL Detection using LSTM Based Ensemble Learning Approaches
Phishing URL Detection using LSTM Based Ensemble Learning ApproachesPhishing URL Detection using LSTM Based Ensemble Learning Approaches
Phishing URL Detection using LSTM Based Ensemble Learning ApproachesIJCNCJournal
 
IRJET- Preventing Phishing Attack using Evolutionary Algorithms
IRJET-  	  Preventing Phishing Attack using Evolutionary AlgorithmsIRJET-  	  Preventing Phishing Attack using Evolutionary Algorithms
IRJET- Preventing Phishing Attack using Evolutionary AlgorithmsIRJET Journal
 
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...IRJET Journal
 
detection of malicious URLs.pptx
detection of malicious URLs.pptxdetection of malicious URLs.pptx
detection of malicious URLs.pptxmanash40
 
IRJET- Phishing Website Detection based on Machine Learning
IRJET- Phishing Website Detection based on Machine LearningIRJET- Phishing Website Detection based on Machine Learning
IRJET- Phishing Website Detection based on Machine LearningIRJET Journal
 
IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...
IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...
IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...IRJET Journal
 
IRJET - Chrome Extension for Detecting Phishing Websites
IRJET -  	  Chrome Extension for Detecting Phishing WebsitesIRJET -  	  Chrome Extension for Detecting Phishing Websites
IRJET - Chrome Extension for Detecting Phishing WebsitesIRJET Journal
 
IRJET- Phishing Website Detection System
IRJET- Phishing Website Detection SystemIRJET- Phishing Website Detection System
IRJET- Phishing Website Detection SystemIRJET Journal
 
Knowledge base compound approach against phishing attacks using some parsing ...
Knowledge base compound approach against phishing attacks using some parsing ...Knowledge base compound approach against phishing attacks using some parsing ...
Knowledge base compound approach against phishing attacks using some parsing ...csandit
 
KNOWLEDGE BASE COMPOUND APPROACH AGAINST PHISHING ATTACKS USING SOME PARSING ...
KNOWLEDGE BASE COMPOUND APPROACH AGAINST PHISHING ATTACKS USING SOME PARSING ...KNOWLEDGE BASE COMPOUND APPROACH AGAINST PHISHING ATTACKS USING SOME PARSING ...
KNOWLEDGE BASE COMPOUND APPROACH AGAINST PHISHING ATTACKS USING SOME PARSING ...cscpconf
 
IRJET - Phishing Attack Detection and Prevention using Linkguard Algorithm
IRJET - Phishing Attack Detection and Prevention using Linkguard AlgorithmIRJET - Phishing Attack Detection and Prevention using Linkguard Algorithm
IRJET - Phishing Attack Detection and Prevention using Linkguard AlgorithmIRJET Journal
 
Phishing Website Detection Using Machine Learning
Phishing Website Detection Using Machine LearningPhishing Website Detection Using Machine Learning
Phishing Website Detection Using Machine LearningIRJET Journal
 

Similar to Machine Learning Techniques for Detecting Malicious URLs (20)

Detecting malicious URLs using binary classification through ada boost algori...
Detecting malicious URLs using binary classification through ada boost algori...Detecting malicious URLs using binary classification through ada boost algori...
Detecting malicious URLs using binary classification through ada boost algori...
 
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORK
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORKMALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORK
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORK
 
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORK
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORKMALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORK
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORK
 
State of the Art Analysis Approach for Identification of the Malignant URLs
State of the Art Analysis Approach for Identification of the Malignant URLsState of the Art Analysis Approach for Identification of the Malignant URLs
State of the Art Analysis Approach for Identification of the Malignant URLs
 
IRJET- Detecting the Phishing Websites using Enhance Secure Algorithm
IRJET- Detecting the Phishing Websites using Enhance Secure AlgorithmIRJET- Detecting the Phishing Websites using Enhance Secure Algorithm
IRJET- Detecting the Phishing Websites using Enhance Secure Algorithm
 
Malicious Link Detection System
Malicious Link Detection SystemMalicious Link Detection System
Malicious Link Detection System
 
PHISHING URL DETECTION USING LSTM BASED ENSEMBLE LEARNING APPROACHES
PHISHING URL DETECTION USING LSTM BASED ENSEMBLE LEARNING APPROACHESPHISHING URL DETECTION USING LSTM BASED ENSEMBLE LEARNING APPROACHES
PHISHING URL DETECTION USING LSTM BASED ENSEMBLE LEARNING APPROACHES
 
Phishing URL Detection using LSTM Based Ensemble Learning Approaches
Phishing URL Detection using LSTM Based Ensemble Learning ApproachesPhishing URL Detection using LSTM Based Ensemble Learning Approaches
Phishing URL Detection using LSTM Based Ensemble Learning Approaches
 
IRJET- Preventing Phishing Attack using Evolutionary Algorithms
IRJET-  	  Preventing Phishing Attack using Evolutionary AlgorithmsIRJET-  	  Preventing Phishing Attack using Evolutionary Algorithms
IRJET- Preventing Phishing Attack using Evolutionary Algorithms
 
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
 
detection of malicious URLs.pptx
detection of malicious URLs.pptxdetection of malicious URLs.pptx
detection of malicious URLs.pptx
 
IRJET- Phishing Website Detection based on Machine Learning
IRJET- Phishing Website Detection based on Machine LearningIRJET- Phishing Website Detection based on Machine Learning
IRJET- Phishing Website Detection based on Machine Learning
 
IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...
IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...
IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...
 
IRJET - Chrome Extension for Detecting Phishing Websites
IRJET -  	  Chrome Extension for Detecting Phishing WebsitesIRJET -  	  Chrome Extension for Detecting Phishing Websites
IRJET - Chrome Extension for Detecting Phishing Websites
 
IRJET- Phishing Website Detection System
IRJET- Phishing Website Detection SystemIRJET- Phishing Website Detection System
IRJET- Phishing Website Detection System
 
Knowledge base compound approach against phishing attacks using some parsing ...
Knowledge base compound approach against phishing attacks using some parsing ...Knowledge base compound approach against phishing attacks using some parsing ...
Knowledge base compound approach against phishing attacks using some parsing ...
 
KNOWLEDGE BASE COMPOUND APPROACH AGAINST PHISHING ATTACKS USING SOME PARSING ...
KNOWLEDGE BASE COMPOUND APPROACH AGAINST PHISHING ATTACKS USING SOME PARSING ...KNOWLEDGE BASE COMPOUND APPROACH AGAINST PHISHING ATTACKS USING SOME PARSING ...
KNOWLEDGE BASE COMPOUND APPROACH AGAINST PHISHING ATTACKS USING SOME PARSING ...
 
[IJET V2I5P15] Authors: V.Preethi, G.Velmayil
[IJET V2I5P15] Authors: V.Preethi, G.Velmayil[IJET V2I5P15] Authors: V.Preethi, G.Velmayil
[IJET V2I5P15] Authors: V.Preethi, G.Velmayil
 
IRJET - Phishing Attack Detection and Prevention using Linkguard Algorithm
IRJET - Phishing Attack Detection and Prevention using Linkguard AlgorithmIRJET - Phishing Attack Detection and Prevention using Linkguard Algorithm
IRJET - Phishing Attack Detection and Prevention using Linkguard Algorithm
 
Phishing Website Detection Using Machine Learning
Phishing Website Detection Using Machine LearningPhishing Website Detection Using Machine Learning
Phishing Website Detection Using Machine Learning
 

More from IRJET Journal

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...IRJET Journal
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTUREIRJET Journal
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...IRJET Journal
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsIRJET Journal
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...IRJET Journal
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...IRJET Journal
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...IRJET Journal
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...IRJET Journal
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASIRJET Journal
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...IRJET Journal
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProIRJET Journal
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...IRJET Journal
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemIRJET Journal
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesIRJET Journal
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web applicationIRJET Journal
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...IRJET Journal
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.IRJET Journal
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...IRJET Journal
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignIRJET Journal
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...IRJET Journal
 

More from IRJET Journal (20)

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
 

Recently uploaded

Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...ranjana rawat
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...Call girls in Ahmedabad High profile
 

Recently uploaded (20)

Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...
 

Machine Learning Techniques for Detecting Malicious URLs

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 06 | June 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 269 DETECTING MALICIOUS URLS USING MACHINE LEARNING TECHNIQUES: A COMPARATIVE LITERATURE REVIEW Lekshmi A R1, Seena Thomas2 1M.Tech Student in Computer Science and Engineering at LBS Institute of Technology for Women, Trivandrum, Kerala. 2Associaste professor in Computer Science and Engineering department at LBS Institute of Technology for Women, Trivandrum, Kerala. ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - Today the most important concern in the field of cyber security is finding the serious problemsthatmakeloss in secure information. It is mainly due to malicious URLs. Malicious URLs are generated daily. This URLs are having a short life span. Various techniques are used by researchers for detecting such threats in a timely manner. Blacklist method is famous among them. Researchers uses this blacklist method for easily identifying the harmful URLs. They are very simple and easy method. Due to their simplicity they are used as a traditional method for detecting such URLs. But this method suffers from many problems. The lack of ability in detecting newly generated malicious URLs is one of the maindrawbacks of Blacklist method. Heuristic approach is also used for identifying some common attacks. It is an advancedtechnique of Blacklist method. But this method cannot be used for all type of attacks. So this method is used very shortly. For a good experience, the researchers introduce machine learning techniques. Machine Learning techniques go through several phases and detect the malicious URLs in an accurate manner. This method also gives the details about the falsepositiverate. This review paper studies the different phases such as feature extraction phase and feature representationphaseof machine learning techniques for detecting malicious URLs. Different machine learning algorithms used for such detection is also discuss in this paper. And also gives a better understanding about the advantage of using machine learning over other techniques for detecting malicious URLs and problems it suffers. Key Words: Blacklist, Cyber Security, Malicious URL. 1. INTRODUCTION The growth and promotion of businesses spanning across many applications including online-banking, e-commerce, and social networking due to the advent of new communication technologies. The use of the World Wide Web has increasing day by day. By using the Internet, most of the time malicious software, shortly named malware, or attacks are propagated. Delivering malicious content on the web has become a usual technique for bad actors due to increased internet access by more than half of the world population. The explicit hacking attempts, drive-by exploits, social engineering, phishing, watering hole, man-in-the middle, SQL injections, loss/theft of devices, denial of service, distributed denial of service, and many others are variety of techniquesusedto implementwebsiteattacks.The limitations of traditional security management technologies are becoming more and more serious given this exponential growth of new security threats, rapid changes of new IT technologies, and significant shortage of security professionals. By spreading compromised URLs most of these attacking techniques are realized. Malicious URLs are compromised URLs that are used for cyber-attacks. To avoid information loss the URL identification is the best solution. So the identification of malicious URL is always a hot area of information security. Drive-by Download, Phishing and Social Engineering, and Spam are most popular types of attacks using malicious URLs. Downloading of malware by visiting a URL is referred as Drive-by-download. By exploiting vulnerabilities in plugins or inserting malicious code through JavaScript, Drive-by-download attack is usually carried out. For affecting the genuine web pages, the phishing and social engineering attacks ploy users into disclosingtheirsensitive information. Spam is used as voluntary messages for the purpose of advertising or phishing. Every year this types of attacks leads several problems. So the main concern exists today is detecting such malicious URLs in a timely manner. In this paper we mainly discuss about the different methods used for detecting malicious URLs. This paper mainly focus on the advantage of machine learning techniques in thefield of detecting malicious URLs over other techniques. The classification of feature representation stage of machine learning techniques is explained in detail. The methodical classification different feature extraction stage of machine learning techniques is well explained. The paper provide a good explanation about different machine learning techniques used for detection. Moreover this paper gives an additional information about the difficulties caused by user in using machine learning techniques for the malicious URL detection. 2. METHODS OF DETECTING MALICIOUS URLS The main methods for detecting maliciousURLsareblacklist method, heuristic method and machine learning method. These are explained as follows.
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 06 | June 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 270 2.1 Blacklist Approach This is the most common method currently used for malicious URL detection. This is one of the classical method for detecting malicious URLs. According to Jian Zhang et al. [1] the blacklist method is having a database consistsofa set of URLs that are malicious in past. This URLs that are malicious in past. This method is very fast and easy to implement. Whenever a user visits into a new URL, then a search for database is performed in blacklist. If the new URL is already a member in the blacklist then a warning will be generated to show that the URL is malicious otherwise the URL is a benign one. This technique can have very low false positive rates. The attackers making complications on Blacklist Approaches. S. Ganera et al. [2] identifies mainly four types of complications in this approach. The first one is that using an IP it makes confusion on host. Second reason for complication is confusing the hostwithotherdomains.Third one is making confusion on host by using large host names. Final one is vitiate. A new complications in Blacklist approach is found by the recent increase in the significant extent of URL in a considerable amount i.e., the URL is made substantially shorter and direct into a required page. Y. Alshboul et al. [3] discovers this new complication on blacklist approach that the battering of malicious URL beyond a short URL. Fast-flux and algorithmic generation of new URLs are some other techniques used by attackers to avoid the blacklist approach. S. Sinha et al. [4] proposes a reputation-based blacklist approach. Here the compromised hosts as well as malicious contents in URLs, network, and host can be identified. Then using this information the access of web, emails and other activities in malicious networks or host can be blocked. Many organizations like intrusion detection, spamdetection uses this type of approach. A SpamAssassin and DSpam are two spam detector used in this approach for detecting malicious URL in user’s mail. Manual training is needed for DSpam for detection. A number of spam detectors are used by SpamAssassin and scores of each detectors can be assigned. According to his view a Searching for databases in blacklist are done by IP address reversing, blacklistzone can be appended and then a DNS lookup is done. But there occurs many problems in this approach.Themostimportant drawback of this approachis maintainingcomprehensive list of malicious URLs is very difficult. The blacklist approach is useless of predicting newly generated new URLs. According to S. Sheng et al. [5] view it is impossible to detect new ultimatum in daily generated URLs. The other drawbacks of Blacklist method are which halts the signature based tools from detecting the attacksby complicating the code. More attacks are bring out by the attackers and which amend the attack signature. 2.2 Heuristic or Rule-based Approach The blacklist method can have the lack of ability to detect newly generated malicious URLs. Soa supplement methodis used by C. Seifert et al. [6] that is the Heuristic approach. This method is mainly used for identifying the phishingsites by extracting phishing site features. This method creates a blacklist of signatures whenever a new URL is arrived. Then it is analyzed with the available list of signatures. If there is a match exists, then the URL is referred as malicious. In this approach, a signature is assigned to each of the recognized common attacks based on its behavior. Two methods are used here to detect malicious URLsusingheuristicapproach. They are Signature based method and Behavior based method. The signature based detection method is described in P. Gutmann [7]. It is like fingerprint property i.e., it’s an eccentric property. From malicious sites, different patterns are extracted. This type of method can have small error rate. But this method can have a major disadvantage that it requires additional amount of time, money and work force for bring out distinctive signatures. The behavior based detection method is described in [8]. Based on the behavior property it is concluded that whether the URL is malicious or benign. Here the URLs with some same behavior are collected for detection. Because of using the system resources and services in a similarmanner,these type of technique can be used in detectingURLsgeneratedin a deviant form. According to G. Jacob et al. [9], this method have a data collector, an interpreter and a matcher. The data collector gather together both static and dynamic information for execution. Interpreter is usedfortranslating data collection module into intermediate representation. Matcher examine the above representation with behavior signature. The main goal of this method is to detect the exotic and distinct malicious variants. But it uses large amount of time for scanning, and it doesn’t contains details about false positive ratios. Data mining techniquesas well as machine learning techniques are used Heuristic approach to acquire knowledge about the executable files behavior.In M. Schultz et al. [11] describestheheuristicapproachuses Nave Bayes and Multi Nave Bayes for classifying the URLs as malicious or benign one. Nave Bayes is a classification algorithm for binary and multi class classification problems. While working on a dataset with millions of records with some attributes this type of classification algorithm is used. Nave Bayes method is from Bayes theorem. This classifier assumes that all features are unrelated to each other.i.e.,the presence or absence of a feature does not influence the presence or absence of any other feature. But this algorithm has a disadvantage that it cannot learns the relationship between features since it consider all the feature to be unrelated. The Nguyen et al. [12] explains heuristic approach in his paper. According to his view, the heuristic based detection technique analyses and extracts phishing site features and
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 06 | June 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 271 detects phishing sites using that information. Then extract features of URL in user-requested page and applies those features to determine whether a requested site is a phishing site. It helps to reduce damage caused due to phishing attacks. According to Nida Khan1 et al. [13], there are two module for heuristic approach for detecting malicious URLs. In this approach, the URL and DNS matching module is the first module. This module contains a white-list like blacklist method. The increase in running time and decrease in false negative rate can be managed by the use of white-list. It contains two parameters: domain name tie in with IP address. When a user enters into a website, then the system finding a match with domain name of current website and white-list. If the website present in white-list then it can be accessed by user, then for checkingDNSpoisoning attack the system equates IP address of corresponding domain. At the beginning the white-list start with zero. Because there is no domain in the list. When a user access new web page or enters a new URL, then the white-list start increasing. The most important point is that while the website or URL accessed by the user at the first time, the URL domain will not present in the white-list. The second module is starts from identification module. This module detect the phishing web page or URL by extracting hyperlinks of web page and applying the phishing detection algorithm such as ID3 or C4.5 and finally a decision is taken such as whether the URL of the web page is malicious or benign. If it is malicious a warning will be generated. But the disadvantage of heuristic approach is that it uses large amount of system resources and the whenever visit in a website an immediate attack is launched. Due to this the generated attack may go undetected. 2.3 Machine Learning Approach The drawbacks of blacklist methodandheuristic approachis discussed in the previous two sections. To overcome these problems, most of the researchers started applyingmachine learning techniques in detectingmaliciousURLsfrombenign one. From [14] the machine learning can have high demand on Artificial Intelligence (AI). These technique provide a system to learn by itself and improve from experience without having any specific program. The main goal of machine learning is to provide ability to the computer to learn automatically without any interference with humans. From [15] the important machine learning techniques are supervised earning, unsupervised learning and semi supervised learning. The supervised learning have a teacher for giving guidance. This method take some data set and act as a teacher. It guides the model or give training to the machine. After proper guidance it starts take decisions on the arrived new data. In unsupervised learning, they can learn without the help of a teacher. I.e., from observations and some structure of data. The working of this method is different from supervised learning. This method create clusters by finding patterns as well as relationships from a given set of data. Semi supervised learning [14] is act between supervised and unsupervisedlearning.Fortraining it takes both labeled and unlabeled data. This method of machine learning gives the better performance and high accuracy. The machine learning techniques does the following [15]. 1. To create a model, the various machine learning algorithm is trained by using a set of training data. 2. Once a new input data is entered into the machine learning algorithm it takes some prediction on the basis of the trained model. 3. The prediction taken in step 2 can be evaluated for checking accuracy. 4. If the estimated accuracy is once tolerable, then the machine learning algorithm is deployed. Otherwise using an enhanced set of training data the machine learning algorithm is again and again trained. D. Sahoo et al. [10] states that for training the data, machine learning techniques uses a set of URLs and for classifying a URL as malicious or benign this method learns a prediction model. This is mainly based on the statistical property of URLs. The main advantage of this type method is that the ability to specify new URLs. 3. MALICIOUS URL DETECTION: ALGORITHMS AND PHASES 3.1. Batch learning algorithm and online learning algorithm: The main algorithms used in machine learning technique to detect malicious URLs are batch learning and online learning. The batch learning in machine learning has an assumption that before the training task the trainingdata is available. SVM (Support Vector Machine) algorithm and Nave Bayes algorithm are the two main families of batch learning algorithm. SVM [17] is a supervised learning method. It can make full use of structural risk minimization principle with a maximum margin learning approach. The main aim of SVM is to find a hyper plane for classifying the data points in N dimensional space. So the main concern keep in mind while finding a hyper plane is the hyper plane can have maximum margin. That means the malicious class and benign class can have maximum distance between the data points. The hyper plane depends on the number of input features. If the number of input feature is 2, then the hyper plane is just a line. If the number of input feature is 3, then the hyper plane becomes 2-dimensional. But it is very difficult to imagine a hyper plane with more than 3 input features. Due to this for considering the whole features of a URL, SVM produces less accuracy. Nave Bayes [16] is an important classification algorithm. It takes the tested URLs as input and it gives the output that the testing domain names with their attack type. It can do the following steps.
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 06 | June 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 272 1. Using the training set, it estimate the sub-features for training purpose and using some Gaussian distribution, a classifier is created. 2. Estimate the mean and variance of sub features found in step 1. 3. For classification, it takes the calculated features for testing sample. 4. Calculate the anatomy for benign, spam, malware classes. 5. Examine the values of anatomy. 6. For testing domain, highest value obtained in step 6 can be assigned. This method gives more accuracy than SVM for all features of a URL. Because in SVM considering the whole features of URL makes some complication. So it may affect in whole performance. But there is no such issues in nave Bayes method. They can improve performance of the entire technique and gives more accuracy. Online learning algorithms [18] are used for handling data with highvolume data high velocity. It’s a family of scalable learning algorithms. This algorithm is mainly trying to find a solution for classification problems. During training, this type of algorithm make a label prediction at each step. Then they receives an actual label. Perceptron algorithm, Logistic regression with SGD (Stochastic Gradient Descent), and ConfidenceWeighted (CW) algorithm are some family of algorithms included in online learning. The perceptron algorithms are classical algorithms. It update the weight vector whenever a malicious URL is detected. It is a linear classifier algorithm. The rule of update in perceptron algorithm is very simple. This algorithm cannot have any chance for misclassification since it update rate are fixed. So this method gives better results for detecting malicious URLs. That is the false positive ratio is low here. The Logistic Regression with SGD is an online method for providing an approximation of gradient descent used in batch learning algorithms. This method express the sum over individual examples. So this algorithm is gives the details of all false positives and false negatives in detail. It increases the accuracy. But its performance is least when compared to perceptron algorithm. For each feature, the CW algorithm maintains a different confidence measure. Here more confidence weights are more submissive than less confident weight. In contrast with other algorithms it describes per- feature confidence with a Gaussian distribution. So the classification happens here in an efficient manner. The accuracy of performance is high. The advantages of online learning algorithm when compared to batch learning is that its computation speed is high because in this algorithm makes one pass over the entire data. Sincethebatchlearning algorithm possess multi-pass on data. Due to the one pass behavior of online learning, this type of algorithms are easy to implement. Sometimes the online learning makes hard to learn. Because for several cases most of the algorithms cannot act correctly or may be it misbehaves. Both the batch learning algorithm and online learning algorithmgothrough two phases of machine learning to detect malicious URLs. They are feature extraction and feature representation. The machine learning technique first extracts all features of a URL, then represent those features and then implement the above algorithms. 3.2. Feature extraction phase: The Feature Extraction Phase is to extract the different features of a URL. From Anjali et al. [16] the feature extraction module mainly consists of six features. They are lexical features, link popularity features, webpage content features, network features, DNS features and DNS fluxiness features. The lexical features is mainly based on URL string or URL name properties. The length of URL, length of each component of URL such as host name, top-level domain and primary domain, the number ofspecial charactersinURLare the most commonly used statistical propertiesofURLstring. These statistical properties areincludedintraditional lexical features. In this feature a dictionary is constructed for each types of words. So each of this word become a feature. A BOW (Bag-Of-Words) model is used in this feature for showing the value of feature as 0 or 1.i.e., while the word in the dictionary is present in URL, then the feature value shows 1 otherwise 0. According to [10] some advanced lexical features are also used. It include directory related feature like length of directory and number of subdirectory, file name features like length of file name and the number of delimiters and finally the argument features such as length of arguments and number of variables. From [16], the link popularity means the total amount of incoming links from possible websites. Link popularity features helps to identify malicious URL and benign one. Because the benign URL can have large amount of link popularity when compared to the malicious one. Malicious URLs only possess a few amount of link quality. In this method, the URLs domain link popularity and address link popularity is employed. Host based features [10] are very important in detecting malicious URL detection. This feature includes properties of IP address, WHOIS information, properties of domain name, location information and connection speed. DNS fluxiness features [16] are include in host based features. This DNS fluxiness features are used for proxy network identification and host changing. This property is helpful for identifying URLs fluxiness nature. Webpage content feature is included in content based features. This type of features are heavy weighted. Here the main concern is to extend lot of information within a safely manner. The content based features gives the information about the threads in earlier even if the detection of malicious URLs using URLs featuresfails.Thewebpagecontentfeatureshave the ability to count the number of HTML tags, iframe, lines and hyperlinks in webpage.
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 06 | June 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 273 DNS features and Network features are referred by the authors in [16]. DNS features related to the name of an address. It includes the count of resolved IP. Network features gives the information about network such aslength of downloaded packet content, number of bytes in downloaded file, speed of download and domain lookup time. 3.3. Feature representation phase: The Feature Representation phase [10] includes featurescollectionstage and feature pre-processing stage. The feature collection stage collects the most important information about a URL. The presence of URL in blacklist, URL string information, host information, HTML content and JavaScript content of the website, link popularity are included in this phase. The feature pre-processing phase contains the URLs textual description. i.e., the URLs unstructured information. This information can fed into the various machine learning algorithms by converting to a numerical vector. 4. CONCLUSION Today the detection of malicious URL is very important. The cybercrimes are increased day by day due to the unknown malicious URLs usage in cyber security field. Various type of methods are used in detecting this type of attacks. Among this, machine learning technique is a favorable solution for such attacks. In this survey different types of methods used for detecting malicious URLs are discussed and also identified the drawbacks of each method and advantages of machine learning over all other methods. Even though machine learning techniques arebetterandsuitablesolution for detecting malicious URLs. REFERENCES [1] Jian Zhang, Phillip A. Porras, and Johannes Ullrich.Highly predictive blacklisting. In Proceedings of the 17th USENIX Security Symposium, July 28-August 1, 2008, San Jose, CA, USA, pages 107122, 2008. [2] S. Garera, N. Provos, M. Chew, and A. D. Rubin, A framework for detection and measurement of phishing attacks, in Proceedings of the 2007 ACM workshop on Recurring malcode. ACM, 2007, pp. 18. [3] Y. Alshboul, R. Nepali, and Y. Wang, Detecting malicious short URLs on twitter, 2015. [4] S. Sinha, M. Bailey, and F. Jahanian, Shades of grey:Onthe effectiveness of reputation based blacklists,inMaliciousand Unwanted Software, 2008. MALWARE 2008. 3rd International Conference on. IEEE, 2008, pp. 5764. [5] S. Sheng, B. Wardman, G. Warner, L.F.Cranor, J.Hong,and C. Zhang, An empirical analysis of phishing blacklists, in Proceedings of Sixth Conference on Email and AntiSpam (CEAS), 2009. [6] C. Seifert, I. Welch, and P. Komisarczuk, Identification of malicious web pages with static heuristics, in Telecommunication Networks and ApplicationsConference, 2008. ATNAC 2008. Australasian. IEEE, 2008, pp. 9196. [7] P. Gutmann. The Commercial Malware Industry., 2007. [8] KALPA, Introduction to Malware, 2011. [9] G. Jacob, H. Debar, and E. Filiol, Behavioral detection of malware: from a survey towards an established taxonomy, Journal in Computer Virology, pp. 251266, 2008. [10] Doyen Sahoo, Chenghao Liu, Steven C.H. Hoi, Malicious URL Detection using Machine Learning: A Survey, 2017 International Conference on. IEEE. [11] M. Schultz, E. Eskin, E. Zadok, and S. Stolfo, Data mining methods for detection of new malicious executables.InIEEE Symposium on Security and Privacy, pages 38-49. IEEE COMPUTER SOCIETY, 2001. [12] Nguyen, Luong Anh Tuan,”A novel approach for phishing detection using URL-based heuristic.” Computing, Management and Telecommunications (ComManTel), 2014 International Conference on. IEEE, 2014. [13] Nida Khan1, Manliv Kaur1, Riddhi Panchal1, Prashant Kumar Rai1, Nilesh Rathod2 Heuristic Based Approach for Fraud Detection using Machine Learning International Journal of Innovative Research in Computer and Communication Engineering, Vol. 6, Issue 2, February 2018. [14] Marco Varone, Daniel Mayer, Andrea Melegari https://www.expertsystem.com/machinelearning- definition/ [15] Atul Published on Jan 09, 2019 https://www.edureka.co/blog/what-ismachine-learning/ [16] Anjali B. Sayamber, Arati M. Dixit, Malicious URL Detection and Identification International Journal of Computer Applications, August 2014. [17] M. A .Hearst, S. T. Dumais, E. Osman, J. PlattB.Scholkopf, Support Vector Machines, Intelligent systems and their applications, IEEE.vol.13, no.4.pp.18-28,1998. [18] Justin M A, Berkeley Lawrence K. Saul, Stefan Savage and Geoffrey M. Voelker, Learning to Detect Malicious URLs ACM transactions on intelligent systemsandtechnology, vol. 2, no. 3, article 30, publication date: April.