detection of malicious URLs.pptx

DETECTION OF MALICIOUS URL
Guided by
Prof. Kalpita Mane
(Guide E&TC)
Presented by
Manash Pratim Saikia
M.E. (VLSI & Embedded system)
A.Y. 2022-23

CONTENT
 Introduction
 Literature Survey
 Problem Definition
 Objectives
 Methodology
 Conclusion
 Reference

INTRODUCTION
 A lot of rogue websites trick users into revealing
sensitive information which lead to theft of money or
identity or installing malware in the user’s system
 URL (Uniform Reource Locator) is the global address
of documents (resources) on the world wide web.
 A URL has two components
 protocol identifier
 resource name(specifies the IP address or the
domain name where resource is located)

INTRODUCTION
 Types of attacks using malicious URLs include
 Drive by download
Phishing and Social Engineering
Spam
 Drive by Download is unintentional download of
malware upon just visiting the URL.These attacks are
carried out by exploiting vulnerabilities in plug ins or
inserting malicious code through JavaScript

INTRODUCTION
 Phishing and social engineering attacks trick the user
into revealing private information by pretending to be
genuine web pages
 Spam is the usage of unsolicited message for the
purpose of advertising or phishing

LITERATURE SURVEY
Title of
Paper
Details of
Publication
Description
Malicious
URL
detection
using
Machine
Learning :
A survey
Doyen
Sahoo,Chengh
ao Liu &
Steven CH Hoi
August 2019
The authors presented a
survey on malicious URL
detection using machine
learning techniques. They
discussed the existing
studies for malicious URL
detection paritcularly in the
forms of developing new
feature representation &
designing new learning
algorithm

LITERATURE SURVEY
Title of
Paper
Details of
Publication
Description
Automatic
Detection for
JavaScript
obfuscation.
Attacks in
web pages
through
string
pattern
analysis
Choi,Young
Han,Tae
Ghyoon Kim
, Seok Jin
Choi .
The author presents an
analysis system to
detect lexical and string
obfuscation in Java
malware. They identify a
set of 11 features that
characterize obfuscated
code and use it to train a
machine learning
classifier

LITERATURE SURVEY
Title of Paper Details of
Publication
Description
Kopis: Detecting
malware
domains at the
upper DNS
Hierarchy
Antonakakis
,Manos
The author s propose
a novel detection
system called Kopis
for detecting
malware related
domain names .
Kopis passively
monitors DNS traffic
at upper levels of
DNS hierarchy

PROBLEM DEFINITION
The current situation has required significant
information security since many people have suffered
from leakage of personal information
Detection of malicious URLs and identification of threat
types using machine learning are critical to thwart cyber
attacks like spamming,phishing and malware

OBJECTIVES
The main objective of our work is to
 Survey a varying trend of malicious URL detection
 To analyse a variety of detection techniques
changing over time

METHODOLOGY
 The categories of strategies used for detecting
malicious URLs are
 Blacklists( & Heuristics )
 Machine Learning

METHODOLOGY
 Blacklisting or Heuristic Approaches: These
approaches maintain a list of URLs that are known to
be malicious . Whenever a new URL is visited , a
database lookup is performed . If the URL is present in
the blacklist ,it is considered to be malicious and then
a warning will be generated ; else if is assumed to be
benign .Blacklisting suffers from the inability to
maintain an exhaustive list of all possible malicious
URLs as new URLs can be easily generated daily, thus
making it impossible for them to detect new threats.

METHODOLOGY
 This is particularly of critical concern when attackers
generate new URLs algorithmically and can thus bypass
all blacklists. Despite several problems faced by
blacklisting , due to their simplicity and efficiency , they
continue to be one of the most commonly used
techniques by many anti-virus systems today.

METHODOLOGY
Heuristic approaches are a kind of extension of
Blacklist methods, wherein the idea is to create a
blacklist of signatures. Common attacks are identified
and a signature is assigned to this attack type. Intrusion
Detection Systems can scan the web pages for such
signatures and raise a flag if some suspicious behaviour
is found .These methods have better generalization
capabilities than blacklisting ,as they have the ability to
detect threats in new URLs as well. However, such
methods can be designed for only a limited number of
common threats , and cannot generalize to all types of
(novel) attacks. Moreover using obfuscation techniques ,
it is not difficult to bypass them

METHODOLOGY
A more specific version of heuristic approaches is
through analysis of execution dynamics of the webpage.
Here also, the idea is to look for a signature of malicious
activity such as unusual process creation, repeated
redirection etc.These methods require visiting the
webpage and thus the URLs actually can make an
attack. As a result , such techniques are often
implemented in controlled environment like a disposable
virtual machine .Such techniques are very resource
intensive and require all execution of the code .Another
drawback is that websites may not launch an attach
immediately after being visited and thus may go
undetected.

METHODOLOGY
Machine Learning Approaches: They analyze
information of a URL and its corresponding websites by
extracting good feature representations of URLs and
training a prediction model on training data of both
malicious and benign URLs . There are 2 types of
features – static features and dynamic features . In static
analysis we perform anlaysis of webpage based on
information available without extracting URL( i.e.
executing Java Script or other code). The features
extracted include lexical features from URL string , info
about host , and sometimes even HTML and Java Script
content. Since no execution is required , these methods
are safer than Dynamic methods.

METHODOLOGY
The underlying assumption is that distribution of these
features is different for malicious and benign URLs .
Using this distribution information,a prediction model can
be built , which can make predictions on new URLs. Due
to relatively safer environment for extracting important
information , and ability to generalize all types of
threats, static analysis techniques have been extensively
explored by applying machine learning techniques .
Dynamic analysis techniques include monitoring the
behaviour of systems which are potential victims , to
look for any anomaly. These include which monitor the
system call sequences for abnormal behaviour

METHODOLOGY
FEATURES: We develop 3 different categories of
features to detect malicious URLs
1. URL lexical features:
We approach the URL as an NLP problem .We use
term frequency – inverse document frequency i.e. tf- idf
to weigh the importance of a token in the URL as a way
to associate URL tokens with labels. Tokens include
anything in the URL, including both the domain and the
path .td-idf can be defined as
 tf*idf = tf (t,d)*idf(t,D) where we define tf and idf
 tf (t,d) = f (t,d) / max {f (w,d) : w subset d}
Idf (t,D) = log mag D/mag{d subst D: t subset D}

METHODOLOGY
We also exploit the hierarchical nature of the
subdomains by splitting along each separator and
saving a bigram consisting of any subdomain plus the
top level domain . We hope to run across phishing
patterns or other suspicious URLs in the process
2. Source code features: Java Script exploits are
typically obfuscated to prevent detection by automated
or manual analysis . Here is an example of one
exploitative script we found in our malicious sample

METHODOLOGY
{k=i;s+=String["fro"+"mCh"+"arCode"](n[k]/(i-
h*Math[f](i/h)+016));} if(018-0xf===3)eval(s);}
Fortunately, we are able to use the salience of obfuscation as a
proxy for exploitative behavior
Attackers use special characters to encode script, either as
direct ASCII or transformed by some simple character
-to-character function:
document.write(unescape(’ %3C%68%74%6D%6C%20
%6C%61%6E%67%3D%22
%65%6E%22%20%6...

METHODOLOGY
Thus, we can use the ratio of special character
subsequences (non English for ”en” websites) to script
length.
In addition, attackers who choose to reconstruct
functions before
calling them require the use of special functions, such as
fromCharCode, eval, document.write, escape, etc.
They can also include the malicious code in an iframe.
We count these keywords and use them as one feature

METHODOLOGY
3. Network features :
Although we have explored a variety of network features
including latency, DNS query data, domain registry
data, and payload size, we have only captured
payload size for our tests. Executable can be arbitrarily
long, and obfuscated script may add to payload size as
well.

METHODOLOGY
Attacker strategy :
The growing threat to mobile web users could be
mitigated by automatic URL detection.By using a trained
SVM, one could check URLs fast enough to deploy in a
realtime service
This means users can use a preemptive service without
impacting their mobile experience As the old saying
Goes an ounce of prevention is worth a pound of cure
but only if the solution is palatable. Attackers may
certainly make tradeoffs to outwit the features we have
selected. However, such elusion isn’t free. For example,
using more legitimate sounding URLs in phishing
attempts may bypass suspicious

METHODOLOGY
bigram detection, but may result in fewer click-throughs
by scrupulous users. Or, reducing special char code
sequences in obfuscation may work, but only by
increasing script size or by using less obfuscation and
risking detection by malicious code pattern detectors.
Our hope is that by adding the appropriate features, a
machine learning based system would be able to force
attackers to make tradeoffs in web-based attacks.

CONCLUSION
By using a trained SVM, it is possible to provide a
realtime service to check malware URLs, regardless of
the browsing device used. In general, using a machine
learning approach to discover malicious URLs and web
attackers is a potentially significant approach, especially
when considering the scale at which machines
themselves have been used to automatically generate,
obfuscate, or permute attacks.We hope to see more
research put forward in this endeavor to further reduce
the space of feasible attacks.

REFERENCE
[1] Antonakakis, Manos. ”Kopis: Detecting Malware
Domains at
theUpperDNSHierarchy.”http://static.usenix.org/events/s
ec11/tech/slides/antonakakis.pdf.
[2] Choi, YoungHan, TaeGhyoon Kim, SeokJin Choi .
Automatic Detection for JavaScript Obfuscation Attacks
in Web Pages through String Pattern Analysis.
http://www.sersc.org/journals/IJSIA/vol4 no2 2010/2.pdf.
[3] Doyen Sahoo,Chenghao Liu , Steven CH.Hoi2019 “
Malicious URL Detection Using Machine Learning : A
Survey Aug 2019 ,37 pages

detection of malicious URLs.pptx

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to detection of malicious URLs.pptx

Similar to detection of malicious URLs.pptx (20)

Recently uploaded

Recently uploaded (20)

detection of malicious URLs.pptx

Editor's Notes