SlideShare a Scribd company logo
1 of 10
Countermeasures against Phishing sites
Phishilla – An anti-phishing extension for Mozilla firefox
Nagarajan Kuppuswami
Department of Computer Science
Virginia Tech
Blacksburg, VA
nagara7@vt.edu
Venkatasubramaniam Ganesan
Department of Computer Science
Virginia Tech
Blacksburg, VA
venkatg@vt.edu
Ashwin Palani
Department of Computer Science
Virginia Tech
Blacksburg, VA
ashwinp7@vt.edu
Abstract
Phishing has been in prominence since 1987 and it has done
considerable damage to the internet user community. The
level of expertise of the adversaries in attacking sites has
increased along with the advancements in the security. The
attacks could be either from a malicious website or through
emails. There is an urgent need to combat such attacks as the
losses caused by them have been growing exponentially. The
first part of our project addresses the existing
countermeasures that are in place in various anti-phishing
tools, advantages of using them and possible disadvantages
that fraudsters could exploit. The second section explains the
features of the extension proposed by us, its working and
advantages. We have also evaluated our scheme by assessing
its performance on certain phishing as well as legitimate sites.
We conclude by mentioning the enhancements and
improvements that could be added to the current scheme.
I. INTRODUCTION TO PHISHING AND EXTENSIONS
Phishing is defined as a criminally fraudulent process that
attempts to acquire sensitive information such as usernames,
passwords and financial details such as credit card information,
etc. They typically attack users by means of fake URLs, emails
and instant messaging. Modern day browsers have developed
capabilities to detect such fraudulent sites and there are
schemes that could probably be missed by the browsers.
Hence, extensions/plug-ins could serve effectively in providing
these additional functionalities.
II. EXISTING COUNTERMEASURES AND SCHEMES
A. Black Lists and White Lists Check Scheme
This scheme uses a database or list published by a trusted
party, where known phishing web sites are blacklisted. Tools
include Websense, McAfee’s anti–phishing filter, Netcraft anti-
phishing system, Cloudmark SafetyBar, Microsoft Phishing
Filter.
A similar white list is also maintained for sites that are
valid and legal. White lists usually contain sites that have been
the targets of phishing attacks.
Advantages: Simple to implement and would just involve a
look up of domain against the blacklist
Disadvantages: The weakness of this approach is its poor
scalability and its timeliness. Phishing sites mushroom
randomly and last for only a few days.
B. Server based schemes
a) Server Authentication: This is used to verify the
credentials presented by a web server. This is issued by a
trusted third party that can vouch for the bearer’s identity.
Generally, it displays logos, icons, seals of the brand in the
browser window. This scheme is used by anti-phishing toolbars
such as Content Verification Certificates, GeoTrust ToolBar,
and Trustbar. Trust Watch is a toolbar which authenticates
through a third party.
Advantages: This scheme is one of the robust ways of
checking the authenticity of the server. It reduces the chances
of raising false alarms of legitimate sites and false negatives.
Disadvantages: Because of the lack of global public key
infrastructure, users may tend to blindly trust or reject the
credentials presented by the web server.
b) Shared Secret Schemes: This scheme is currently used in
Dynamic Security Skins. It works by visually comparing client
images by users with the ones provided by the server.
Advantages: User takes the decision in detection of the
phishing site through recall.
Disadvantages: User should be aware and also have prior
knowledge of the intended domain.
C. Information Retrieval Based Schemes
a) Term Frequency Calculation: This scheme calculates the tf-
idf weights and finds the maximum frequency terms and
searches for these terms in a search engine like Google and
finds if the domain comes within the top ‘n’ results.
b) Support Vector Machines: Perform binary classification into
sites as phishing or non-phishing based on the information
obtained through identity information like DOM objects (A
HREF, IMG) etc.
Advantages: The major advantage of these schemes is their
strong mathematical foundations and applied probabilistic
values and learning based techniques.
Disadvantages: The scheme raises false alarms and requires
manual classification of initial training data and specification
of rules.
D. Page Ranking based Schemes
PageRank is a link analysis algorithm, named after Larry Page,
used by the Google Internet search engine that assigns a
numerical weighting to each element of a hyperlinked set of
documents, such as the World Wide Web, with the purpose of
"measuring" its relative importance within the set. The
algorithm may be applied to any collection of entities with
reciprocal quotations and references. The numerical weight that
it assigns to any given element E is also called the PageRank of
E and denoted by PR(E).
Page rank determines the popularity of a URL in the web. The
higher the Page Rank, the more important is the page. Phishing
web pages most often either have a very low page rank or their
page rank does not exist. Very few phishing pages manage to
increase their page rank, possibly by using link spamming
techniques.
Page Index is defined as the number of pages from a particular
website that Google has in its database. Phishing web pages
usually are accessible only for a short period and hence many
might not be found in the index.
Advantages: Page Rank and Page Index values are strong
features for identifying if a URL is non-phishing especially if
the crawl is from a reputed search engine like Google.
Disadvantages: Freshly created pages especially in new
domains would rank very low and hence could sometimes
result in a false negative.
E. DOM objects Retrieval Schemes
a) Keywords and Meta Tags: This scheme searches for domain
information in meta tags with “name = description” and whose
name or http–equiv is “copyright”. It also searches for the title
tag, retrieves information and matches it with domain. If no
match is found, suspicion weight for that site is increased.
b) Request URLS: DOM elements like <img> tags load
information from other URLs. Most of these URLs would be
within the same domain or these objects would be loaded from
an image server for this domain.
i) Phishing sites in order to arrive at the same look and feel of
the phished domain specifies the phished domains image server
in its <img src>
ii) Also phishing sites mostly maintain only a single or few
URLs similar to that of targeted site. Hence the <img src>
URLs would be different from that of the domain.
The above two points help us to formulate a heuristic where the
number of external domain references (also different image
servers ) are checked and if the number crosses a threshold ,
the site can be marked against a phishing site and a degree of
suspicion assigned to it.
c) AURL – Anchor URLs: This scheme detects a phish based
on the Anchor URLs present in the site. Following deductions
can be made based on AURL and other information regarding
URLs in a webpage.
i) As already mentioned in the previous paragraph, the
number of external anchors in an illegitimate webpage would
be high. This property can be used to mark a page as a
phishing page.
ii) The hyperlink provides DNS domain names in the anchor
text, but the destination DNS name in the visible link doesn’t
match that in the actual link.
iii) Dotted decimals used in the URI could be calculated to
detect malicious websites.
d) Form Tags: Legitimate websites usually have the form’s
action set to a valid URL mostly within the domain.
Illegitimate sites have this action tag containing URL in
different domains from that of the page or sometimes null.
e) Body Tags: Some websites provide a description of
themselves in the body portion and this can be used to identify
a phishing site
f) SSL Certificates: Distinguished names of phished sites in
the certificate vary with that of claimed identity. This check
can be employed to detect any forgeries in the website.
Advantages: These schemes are fast and easy to detect
phishing sites. They perform checks irrespective of any date
change/manipulation in the website.
Disadvantages: False negatives - Certain legitimate sites could
have lengthy URLS, large number of dots etc.
F. URL Check Schemes
One method to detect phishing sites is by observing the URL
of the page and examining characteristics such as its length,
presence of suspicious punctuations, etc. Below are some of
the checks performed on the URL to determine its validity
against phishing attacks.
a) URL check to see presence of other domain names: The
URL is checked against a valid list of white listed sites and
presence of any of the sites in the URL path but not in the host
name of the URL. This serves as an indication that the URL
checked is trying to use a valid site.
b) Length of the URL: Abnormal length of the URL could raise
suspicion and sites carrying long host names or large string of
words is checked for phishing.
c) Presence of suspicious special characters: Adversaries use
the character ‘@’ in the path as a ‘@’ symbol in a URL causes
the string to the left to be disregarded, with the string on the
right treated as the actual URL for retrieving the page. This
combined with limited size of the browser address bar allows
the attacker to write URLs that appear valid within the address
bar but actually contains some malicious path after the @
symbol. The check for this symbol in the URL should help in
detecting phishing URLs.
d) Checking obfuscation of URLs: URLs can be obfuscated by
inserting hexadecimal characters instead of individual
characters. Attackers take advantage of the ignorance of certain
users regarding the structure of a URL. For example, the
symbol @ could be represented as %40 and the dots could be
replaced by the character ‘%2e’. IP addresses can also be
represented as hexadecimal characters and a suspected IP
address which is part of blacklisted sites can be hidden by these
characters and thus escapes the URL blacklist check. This
attack can be identified by maintaining a map containing the
hexadecimals and its possible conversions.
e) Suggestive word tokens in the URL: Phishing URLs aim to
extract confidential user information such as their usernames
and passwords in a particular domain. Hence, a check for
keywords such as login, sign-in, confirm, etc. in the path
suggests that the page looks for user information and the URL
is double checked for phishing. These tokens are extracted
from the blacklisted URL paths and it is found they occur more
frequently than other tokens.
f) Dots in URL: It is found that phishing sites use many dots in
their URL but legitimate sites do not. The given URL is
considered a phish if the number of dots in the URL exceeds
five.
These URL schemes are used in anti-phishing tools like
SpoofGuard, WebSpoof and SpoofStick.
Disadvantages: There are cases where some valid sites fail
some of the URL checks such as the length check.
G. WHOIS Lookup based Schemes
WHOIS is a query/response protocol which is widely used to
query databases to retrieve details of Internet resources such as
the domain name, IP address block and autonomous system
number. Primarily, it serves as an effective tool to search
domain information, registrar data, admin data and the name
servers used.
WHOIS lookup can be used in anti-phishing schemes to detect
information such as age of the domain and IP address
resolution.
Checking the age of the domain of the phishing site identifies
the validity of the phishing site. APWG states that the average
age of a phishing site is 4.5 days and there are also sites which
last only for few days. Hence, WHOIS lookup of the phishing
site can be performed to check the age of the domain. If the
page was registered longer than 12 months, it is considered
legitimate and if it is less, it could be checked stringently for
phishing. Some sites do not return data on a WHOIS lookup
and they could be considered a phish.
Providing the IP address to WHOIS database provides the
details of its domain and its registration details. This helps in
anti-phishing schemes where the path contains only the IP
address information.
Disadvantages: This check by contacting the WHOIS database
for each website fails in circumstances where the phishing site
is hosted on an existing valid server and when criminals
manage to break into the server. Here, the WHOIS lookup
yields a value that is valid and hence the check fails to find a
phish. The check also fails in cases where some businesses
outsource some of their web operations to contractors with
different domain names.
eBay Toolbar and SpoofGuard are tools using the WHOIS
lookup scheme.
H. Client side Defense schemes
These are anti-phishing schemes which require user to maintain
databases on objects that is generally present in a web page.
The user/browser stores information such as passwords,
images, etc. SpoofGuard uses the client side defense schemes
mentioned. Some of the schemes are briefed in the section
below.
a) Outgoing password check: By maintaining triplets (domain,
username, and password) for each domain in a database, a
phishing scheme can avoid the possibility of leaked
information. Every time a user enters a password into a
phishing site, the stored password which is hashed using
algorithms such as SHA-1 is compared and issues a warning to
the user that the same username-password combination is used
for a different domain.
This scheme is particularly helpful when the spoof site could
use an image of the word “password” instead of html text to
request user’s password. Since all the passwords are hashed
and stored in a database, this phishing site can be detected.
Disadvantages: It is practically not possible to include the
passwords for all domains and their usernames. This is also a
security risk where leakage of these stored passwords could
lead to greater impact.
b) History Check: Most of the above anti-phishing measures
are bound to raise alarms for legitimate sites. Hence, this is a
scheme that is employed to avoid any false alarms in phishing
schemes. It checks the user’s browser history and does not
issue any warnings to sites that are in the user’s history file.
Disadvantages: If the user inadvertently bypasses the initial
warning, this site will never be checked for phishing and might
considerable damage.
c) Domain Check: This scheme checks the history of the user’s
browser and checks if the domain of the current page closely
resembles any of the previously visited pages/domains. This
comparison is done by calculating the hamming distance. For
example, a site ‘wikifedia.org’ will raise a warning if the user
has previously visited the ‘wikipedia.org’.
This scheme is devised to prevent adversaries from hosting
sites that contains misspelled versions of popular sites.
Disadvantages: This check fails when there are legitimate sites
which have close resemblances in their domain names and this
raises false alarms.
d) Referring site check: The browser maintains a record of
referring pages. The referring pages are those links which is
followed by the user. Typical phishing attacks are from emails
and if the user reads an email from a phishing site, the referring
page is the email host. Use of IP addresses by phishing sites
can be tracked by doing a reverse DNS lookup. If the resolve
hostname is not listed in the referring sites listing, the site is
deemed as a phish.
e) Image-Domain associations: The scheme maintains a
database of images associated with each domain. The initial
static database is assembled by using crawler type tools and it
is augmented to an individual’s browser history. The database
maintained can contain either fixed database or hashed images.
The scheme helps in finding phishing sites, which might
contain images with different hash values than what is stored in
the database and this raises an alert.
Disadvantages: The possibility of storing images and its hash
values is infeasible and it is limited to the client-side
configuration and storage restrictions.
f) Profiling/Cache: The cache of a web site could be obtained
through means of Google - Cache. The last cache date could be
found. This information could be used in determining a
phishing site.
III. PHISHILLA FEATURES
A. Introduction to Phishilla
Phishilla is a Plug-in or extension for the web browser
Mozilla Firefox. It is embedded with the browser and runs in
the same memory context as the browser. It checks for any
malice in the site entered by the user and if found, provides a
popup box, warning the user from entering the site. Phishilla
uses features such as URL check, WHOIS lookup to retrieve
the site information, page rank, page index and a host of other
features which are described in detail in Section III.
A typical Firefox extension is packaged in a ZIP file or
bundle with a file extension .xpi. It follows a folder structure
and contains a XUL file that adds functionalities to the
browser. XUL is an XML grammar that provides user interface
widgets like buttons, menus, toolbars, trees, etc. These XUL
files contain references to JavaScript which provides additional
functions to the browser.
Phishilla uses the XPI file structure maintained for Firefox
extensions with an XUL file that calls a JavaScript function.
The JavaScript function performs the necessary checks on the
site entered by the user and returns a result based on a weighted
sum calculation. This weighted sum calculation is done for
each site based on the features mentioned in Section III.
Phishilla is a lightweight browser plug-in for Mozilla Firefox
and supports versions 1.through 3.5 and higher.
Figure 1 depicts the flow of working of the plugin.
The webpage is first checked in the maintained white list
and if found to be in the list, it extension proceeds to load the
page. Phishilla displays the location of the page in the status
bar of the browser and if the user doubts the location to be
somewhere else rather than the actual location that is being
displayed, he/she can click on the same and the browser shows
a popup warning whether to proceed.
The user takes a decision on proceeding and site is added to
the blacklist if the user accepts the warning. If not, Phishilla
proceeds to do the mandatory checks such as blacklist check,
DOM Objects check, Google Page rank check, Inbound links
check, Traffic information check, Page Index check, URL
information check and Domain Age check. A weighted sum is
calculated based on the outcome of all these checks and if the
weight is more than 35, Phishilla displays a warning to the user
and adds the site to the blacklist based on the user acceptance
of the warning. If the user proceeds to ignore the warning,
Phishilla asks for a confirmation on whether to add the site to
white list and proceeds to load the page on user confirmation.
Figure 1. Phishilla working design
B. Features in Phishilla
Phishilla has incorporated most of the above schemes in
section 1.A making use of client side checking.
Since no one method is good enough to detect a wide
variety of phishing sites we have used a combination of
schemes where a score is computed foe each web page which
is a weighted sum of the application of sets of heuristics.
a) White Lists: There is an initial list of trustworthy
/popular sites. The user can also manually add his/her list of
domains to the white list based upon his/her prior knowledge of
these sites. When a web page is entered in the browser’s
address bar and it loads, the domain of the web page is
compared with the domains present in the white list. In the
case of a match, no further checks are performed and the user
proceeds to the intended destination URL.
The sites which have been the targets for phishing sites
were collected from statistics provided by PhishTank.com and
APWG. They were added to the white list maintained by us.
b) Black Lists: We maintain a black list of 50 phishing sites
as training data. Each time a web page is loaded, a lookup of
the webpage domain is done against the domains in the
blacklist. If a match occurs, the user is warned right away that
the site is a phishing site and a message is popped out to the
user asking him not to proceed. At this juncture, the user is left
with either of these options:
i) The user is navigated away from the site on pressing the
cancel button.
ii) The user proceeds to the site on pressing the proceed ok
button
c) Location of Domain: There are specific countries which
rank high in the number of fraudulent sites. Also a user may
have prior knowledge/experience of an intended site and hence
be immediately aware if the location of the malicious site is
from an unlikely country. The location of the site is obtained
through a reverse lookup on a WHOIS database (my-
addr.com).
This location information is displayed to the user in the
status bar. This information is useful especially useful when the
user has prior knowledge of the intended site and it is a means
of providing cues to the user. The user, if suspicious of the
location of a particular site, can click on the status bar which
would popup a window requesting confirmation from the user
to proceed to add to the blacklist.
E.g. The Indian Railways site, “irctc.co.in” and the popular
bank site “barclays.co.uk” sites are unlikely to be hosted in
countries like China or Russia.
d) Age of a Website: The age of a website is queried by
doing a look up of the domain on a WHOIS database
(vitzo.com/en/whois) and the registration date of the domain is
obtained. Most phishing sites tend to have very little or no
history, hence the age of websites is a factor to take into
consideration when evaluating it for phishing.
If the age of the site is below a threshold a positive
determining factor is added for the site. Similarly if the age is
above an upper limit (i.e. the site has been there for a long
time), then a negative determining factor is given for the site.
e) Meta Information: The content property of the META,
whose name or http-equiv is “description is retrieved. If there
is no match between the information in the URL and
information in the META then a positive phishing determining
factor is added for the domain. Otherwise a negative
determining factor is attributed to the domain
f) Google Page Rank: Google assigns a page rank in a scale
of 0-10 for every webpage. Page rank defines ‘the importance'
of a webpage. The higher the rank the more important a web
page is. Here we retrieve the domain of the webpage and query
Google (at toolbarqueries.google.com) programmatically to
obtain the page rank of that domain.
Once the page rank is obtained a different weighing scale is
provided
If Pr = PageRank (Web Page)
And the respective weight of the webpage for the page rank
heuristic would be
If the Pr is
between 2
and 10
Factor is based on how much information the page rank
provides in classification of a website into phishing/non-
phishing. A page rank of 2-10 would ensure that the phishing
determinant value for this heuristic is negative (i.e. leans
towards this page not being a phish).
If the Pr is 0, 1 or -1 (page rank not found)
A page rank value of 0, 1 or not found would compute a
positive determinant value for this heuristic.
g) Google Page Index: The web page domain is looked up
in Google index database and if the page is indexed, a negative
determinant score is provided for the index heuristic. If no
index information is obtained on querying Google, a positive
determinant score is provided for the index heuristic
h) Inbound Links: The popularity of a page or to some
extent the level of dependence or trustworthiness is gauged by
the number of external references from other pages to this
page.
Thus the number of inbound links to a webpage domain is
programmatically queried and obtained through the Alexa
Database. The number of inbound links to a phishing site is
very minimal with maybe only a few instances of inbound
links. Hence this data can be applied effectively in determining
if a page could fall into the phishing category.
i) Traffic Information: The popularity of a webpage could
also be gauged through the number of visits to the page. The
traffic page rank of page is obtained programmatically by
querying the Alexa Database. The lower the value of rank more
W (Web Page) = (-1) * Pr * (factor)
W (Web Page) = 10 * factor
is the traffic flow to the site. This page rank is computed based
on the combination of number of user visits and number of
page views over a period of three months.
A threshold is set on the traffic rank and if the traffic rank
of the web page domain is lesser than the threshold, a positive
determinant value is set for the page rank heuristic. If the page
rank is above the threshold, a negative determinant value is set
for the page rank heuristic. This is also a very useful heuristic
in gauging a site to be phishing or not.
j) Anchor tags: Anchor tags are a very common way of
cheating unsuspecting users. Anchor tags have a href portion
not visible to the user and also a visible text portion for the user
to click upon. A fraudster could manipulate this in a way to
benefit him.
For example,
<a href=”http://www.phising.com/”> www.icicibank.com
</a> could mislead the user to think that the link leads to
icicibank.com
Hence we check if the visible text is an URL and if so if it
is the same as the actual text. If different a positive determinant
value is applied for this heuristic. Similarly href’s in anchor
tags go through all checks that have been applied to URLS like
checking for IP addresses, obfuscated URLs, length, number of
dots, special characters etc. If any of the heuristic passes, it
results in a positive determinant which is added to the total
determinant.
k) Server Form Handlers: Documents contain the form tag
whose action specifies the URL to pass control to in the event
of an action like pressing the submit button. Usually phishing
sites have only a limited or a single URL or domain which is
similar to the target website.
Hence a check is performed to confirm if the action
property of the form references an external location different
from that of the URL in the address bar. If the check succeeds
a positive determinant value is assigned to the form handler
heuristic.
l) Images and Other External Objects: Phishing sites, in
order to look akin to the target website so as to trick users into
believing it to be the original site use images from the image
server of the target (original site).
A compilation (white list) of popular sites especially those
of financial institutions is maintained in our project. There is
also a map of servers used by these websites. Hence when a
fraudulent site tries to use the images from any of these sites,
the img src information of this site is matched with the image
server URLs of the white listed sites and if there is a match, a
positive determinant weight is set for external object heuristic.
m) Length of URL: Adversaries try to manipulate sites by
including more information in the website URL. Utilizing the
limited number of strings the browser shows in the address bar,
fraudulent websites can be made to show legitimate. Hence, we
have included a heuristic check to find the length of URL and
increase the weight if the length exceeds 80 characters.
Studies have shown that phishing URLs have unusually
large length and hence a check on the URL length would help
determining fraudulent websites.
n) Dots in URL: We have included a check to determine
the number of dots in an URL and increase the weighted sum if
it is found that the number of dots is greater than XXX. It was
found that most of phishing sites have more than acceptable
number of dots in their URLs.
o) ‘@’ symbol check in URL: Phishing sites make use of
the characteristic feature if the @ symbol which allows the
browser to disregard the addresses to the left of it. So, anything
between the http:// and @ is not considered by the browser.
Adversaries utilize this disadvantage and manipulate
legitimate sites by inserting URLs between http and the '@'
symbol to trick the users. Our scheme checks for this @
symbol or its hexadecimal equivalent ‘%40’ in the URL and
increases the weighted sum if it is found.
p) Pattern matching: Most of the phishing sites try to
mimic target sites such as eBay, PayPal, Bank Of America
(Top 3 targets) etc.., The most common way through which an
user is mislead into going to such a phishing site is getting
enticed by the catch words eBay, PayPal etc.
Hence we apply a basic check where we check if the URL
has a domain other than the popular sites in the whitelist but
has a string which closely resembles that of these possible
target sites in the whitelist. In this scenario, we add a positive
determinant value for this heuristic.
IV. EVALUATION AND ANALYSIS OF PHISHILLA
Phishilla is a rule-based heuristic tool. It may at times cause
false positives (treat non-phishing site as phishing site) and
false negatives (i.e., treat phishing site as non-phishing site).
Phishilla was evaluated on a sample size of 64 URLs. 32 of
them were phishing URLs obtained from Phishtank.com and
other web resources. These URLS were also ascertained to be
phish URLS after checking them on browsers like Google
chrome and Mozilla Firefox phishing filters.
32 of these were URLS chosen at random through Yahoo
Random URL generator (http://random.yahoo.com/fast/ryl)
and also certain URLS of known people and known domains.
Two tables below show the evaluation results for phishing and
legitimate sites respectively.
No. PHISHING URLS RESULT
1 http://www.setuplogecount.co.uk/index.php Found
2 http://info.kuspuk.net/phpMyAdmin/config/ppusa/ Found
3 http://aimm.ye.ro/zboard/data/pesmm.html Found
4 http://grapelove.co.kr/_gabia/fs3_gongji/gtbplc/ibank_gtbplc_com.php Found
5
http://www.goodcreditahead.com/forum/bancoposta/index.php?
MfcISAPICommand=SignInFPP&UsingSSL=1&emai=&userid=
Found
6
http://singine4baylogisny8iaznwaz.nm.ru/by-Brownie-
wise_W032879327328929Qitem1QQDJSyyyd37sdcmbbyloginpag23za32wa32w2azZza3ews
az.html
Found
7 http://server.e-foto.lt/js/228411.paypal.com/webscr_cmd_login-run.php Found
8 http://www.olancompany.com/images/redirecting.html Found
9 http://forumoficial.hostrator.com/de.html Failed
10 http://nvbchannel.net/forum/paypal.htm Found
11 http://www.web-page.com.ar/win/133847.paypal.com/webscr_cmd_login-run.php Found
12 http://activex.emenace.com/us Failed
13 http://it-paypal.com/PayPal.It.html Found
14 http://publidisco.com/catalog/images/microsoft/index2.htm Found
15 http://muziekschoolallmusic.nl/vakanties/ib.html Failed
16 http://kbic.info/bbs/data/portal/server.pt/ Found
17 http://vonage.id1114555.online-webforms.com/ Found
18 http://motors-support.net Failed
19 http://ba03.pochta.ru/ehay.html Found
20 http://www.portaljenipapo.com/login.htm Failed
21 http://soullovebags.com/images/www.mybank.alliance-leicester.co.uk/index.html Found
22 http://www.stentend.com/de/ Found
23
http://signin.ebay.com.ws.ebayisapi.dll.ciczdztxtwsdyhsfpndr.virtualbattlespace2.com/frogstar/
down
Found
24 http://signin-ebay.adacorrigan.co.uk/ Found
25 http://www.centralfilms.net/locaciones/moore/scripts.php Found
26 http://e-mind.be/img/hp/base/b049/gdxow.php Found
27 http://www.parkdaeli.com/bbs/file/new.egg.com/logon.htm Found
28 http://www.stentend.com/de/ Found
29 http://www.candelaradio.fm/los15img/ Found
30 http://www.skype.com.ofi.uni.cc/?id=49126&lc=us Failed
31 http://fwqdeq.mail2k.ru/n.html Found
32 http://mobile-me.org Found
Table-1 Evaluation results for phishing sites
No. LEGITIMATE SITES RESULTS
1 http://sportsillustrated.cnn.com/basketball/ncaa/women/teams/youngstown/ Passed
2 http://www.zumbrolutheran.org/ Passed
3 http://www.socialcouch.com/interview-with-richard-binhammer-dell-social-media/ Passed
4 http://amazwi.blogspot.com/ Passed
5 http://cs.vt.edu/ Passed
6 http://www.alphabusinesscentre.com/ Passed
7 http://www.christchurchpompton.org/ Passed
8 http://www.wongbrothers.com/ Passed
9 http://www.nzembassy.com/ Passed
10 http://geogratis.cgdi.gc.ca/ Passed
11 http://www.prapa.com/ Passed
12 http://www.circuit8.org/ Passed
13 http://www.findstolenart.com/ Passed
14 http://www.maroc.net/ Passed
15 http://www.minorleagueballparks.com/neds_oh.html Passed
16 http://www.elth.pub.ro/ Passed
17 http://www.finleys.com/ Passed
18 http://www.spravi.8m.com/ Passed
19 http://www.paconcours.com/ Passed
20 http://www.rmadhavan.com/ Passed
21 http://us.com/ Passed
22 https://home.americanexpress.com/home/global_splash.html Passed
23 http://www.lamega.com/ Passed
24 http://www.everydaymaternity.com/ Passed
25 http://www.shop-cliftonparkcenter.com/ Passed
26 http://www.eqc.govt.nz/ Passed
27 http://www.yorkarchaeology.co.uk/ Failed
28 http://mikeshost.110mb.com/xy.php Failed
29 http://weather.mgnetwork.com/cgi-bin/weatherIMD3/weather.cgi?
user=TBO&forecast=zandh&pands=Miami%2C+FL
Failed
30 http://www.atifitnuts.com/ Failed
31 http://www.asgsherman.com/ Failed
32 http://www.ambache.co.uk/ Failed
Table-2 Evaluation results for legitimate sites
A. Evaluation measures
The following measures were adopted in evaluating
Phishilla:
a) Total Catch Rate: Number of phish URLs that were
correctly blocked or warned.
Number of correctly caught phish URLs = 28
Total number of phish URLs = 32
Percentage of correctly caught URLS = 28 / 32 * 100
= 87.5 %
b) False Negatives: Number of phish URLs that were
incorrectly allowed.
Number of incorrectly allowed phish URLs = 4
Total number of phish URLs = 32
Percentage of false negatives = 4 / 32 * 100
= 12.5 %
c) Allows: Number of good URLS that were correctly
allowed.
Number of correctly allowed good URLs = 26
Total number of good URLs = 32
Percentage of false positives = 26/32 * 100
= 81.25%
d) False Positives: Number of good URLS that were
incorrectly blocked.
Number of incorrectly blocked good URLs = 6
Total number of good URLs = 32
Percentage of false positives = 6/32 * 100
= 18.75%
B. Analysis of Phishilla
Through our evaluation we verified that Phishilla may
sometimes result in false positives for relatively unknown sites
but is unlikely to cause false negatives of major impact.
a) Analysis of False Positives: False positives are the
number of the good URLs that are incorrectly blocked
1) Phishilla reports false positives in case of good URLs
with abnormal URL lengths or a large number of dots as
opposed to standard conventions.
2) If a dotted decimal IP address is provided instead of a
name, if Phishilla were to report an error, it could sometimes
result in a false positive as sometimes this kind of domain
name maybe desirable. Hence in this case Phishilla only
reports a warning that an IP address is being used in the URL
and it could possibly be an illegitimate site.
3) False positives are also possible when the site is
relatively new or unknown site with very little or no inbound
links or traffic.
b) Analysis of False Negatives: False negatives are the
number of the phished URLs that are incorrectly allowed.
It is imperative that any good anti-phishing scheme or
tool reduces the number of false negatives and Phishilla
addresses this issue well. False negatives occur mostly only
when there is very little DOM element information that can
be compared against standard heuristics
c) Performance:
The performance of Phishilla is good since only JavaScript
is used and all operations are done on the client side.
V. ADVANTAGES OF PHISHILLA
Thus Phishilla is a browser plug-in which accomplishes the
task of detecting a phishing site by following a set of well
proven and established method.
It has the following advantages:
1) Lightweight
2) Follows combination of well-tested and successful
anti-phishing schemes.
3) Computes a weighted sum where heuristics are
assigned different values based upon their ability to
classify the malicious content in the website.
4) Excellent catch rate.
VI. CONCLUSION
In this paper we have discussed the set of existing counter
measures against phishing and the possible merits and flaws in
these schemes and the adoption of these schemes by existing
market place tools. We have identified that a single heuristic
or a single class of heuristics are not sufficient enough to
successfully determine a phishing site. Hence we have adopted
a scheme which combines several Phishing classification
schemes used across several tools and added weights for each
scheme depending on its effectiveness in classification i.e. the
detection accuracy.
Phishilla provides phishing alerts to the user in a non
intrusive manner without affecting the browser experience. It
follows a client-side approach where all the logic is executed
in client-side code. This makes Phishilla efficient and also
brings about only a minimal set of requirements.
While Phishilla has a good catch rate and detects a majority
of the phishing sites, possible avenues of enhancements in
Phishilla include incorporating features such as profiling,
checking of SSL certificates, image matching, etc. which
would need server side functionalities. The GUI could be
enhanced to provide more virtual cues to the user and possible
display of color codes. This would indicate the level of
determining whether a site is malicious or not. Similarly, the
users could be profiled when they mark a site as phishing and
weights could be provided to users based upon their previous
phish reporting history. Other learning Based Methods could
also be incorporated where the effectiveness of each heuristic
is monitored over time and the weights re-adjusted
accordingly. Phishilla could also be extended to track and
report phishing e-mails, the current plague on the internet
which leads users to the unsolicited phishing sites.
VII. ACKNOWLEDMENTS
This work was carried out at the Virginia Polytechnic and
State University. We thank Dr Jung Min Park for providing
the impetus for this paper.
REFERENCES
[1] PhishTank, available at: http://www.phishtank.com/
[2] N. Chou, R. Ledesma, Y. Teraguchi, and J. C. Mitchell, “Client-Side
Defense against Web-Based Identity Theft", in Proceedings of the
Network and Distributed System Security Symposium, (NDSS '04),
February 2004.
[3] S. Garera, N. Provos, M. Chew, and A. D. Rubin, “A Framework for
Detection and Measurement of Phishing Attacks", in Proceedings of the
2007 ACM Workshop on Recurring Malcode (WORM '07), Nov. 2007,
pp. 1–8.
[4] FireFox, “Phishing Protection". Available at:
http://www.mozilla.com/en-US/firefox/phishing-protection/
[5] Y. Pan and X. Ding, “Anomaly Based Web Phishing Page Detection", in
Proceedings of the 22nd Annual Computer Security Applications
Conference (ACSAC '06), December 2006, pp. 381–392.
[6] Y. Zhang, J. I. Hong, and L. F. Cranor, “Cantina: A Content-Based
Approach to Detecting Phishing Web Sites”, in Proceedings of 16th
International World Wide Web Conference (WWW '07), May 2007, pp.
639–648.
[7] Paul Robichaux, Devin L. Ganger, “Gone Phishing: Evaluating Anti-
Phishing Tools for Windows", September 2006
[8] D. Kevin McGrath, Minaxi Gupta, "Behind Phishing: An Examination
of Phisher Modi Operandi", Proceedings of the 1st Usenix Workshop on
Large-Scale Exploits and Emergent Threats, San Francisco, California,
Article No. 4, 2008
[9] Bayesian Classification of Phishing :
http://www.sonicwall.com/downloads/WP-ENG-025_Phishing-
Bayesian-Classification.pdf
[10] Google Page Rank Information:
http://abhinavsingh.com/blog/2009/04/getting-google-page-rank-using-
javascript-for-adobe-air-apps/
[11] Introduction to Phishing: http://en.wikipedia.org/wiki/Phishing
[12] Who-Is Information: http://vitzo.com/en/whois
[13] Traffic Information : http://www.alexa.com/siteinfo
[14] Reverse Domain Lookup: http://my-addr.com/reverse-lookup-domain-
hostname/free-reverse-ip-lookup-service/reverse_lookup.php
[15] Anti-Phishing Information: http://www.antiphishing.org/

More Related Content

What's hot

Malicious Url Detection Using Machine Learning
Malicious Url Detection Using Machine LearningMalicious Url Detection Using Machine Learning
Malicious Url Detection Using Machine Learningsecurityxploded
 
A SURVEY ON WEB SPAM DETECTION METHODS: TAXONOMY
A SURVEY ON WEB SPAM DETECTION METHODS: TAXONOMYA SURVEY ON WEB SPAM DETECTION METHODS: TAXONOMY
A SURVEY ON WEB SPAM DETECTION METHODS: TAXONOMYIJNSA Journal
 
State of the Art Analysis Approach for Identification of the Malignant URLs
State of the Art Analysis Approach for Identification of the Malignant URLsState of the Art Analysis Approach for Identification of the Malignant URLs
State of the Art Analysis Approach for Identification of the Malignant URLsIOSRjournaljce
 
Improving Phishing URL Detection Using Fuzzy Association Mining
Improving Phishing URL Detection Using Fuzzy Association MiningImproving Phishing URL Detection Using Fuzzy Association Mining
Improving Phishing URL Detection Using Fuzzy Association Miningtheijes
 
2014_protect_presentation
2014_protect_presentation2014_protect_presentation
2014_protect_presentationJeff Holland
 
IRJET- Detecting Malicious URLS using Machine Learning Techniques: A Comp...
IRJET-  	  Detecting Malicious URLS using Machine Learning Techniques: A Comp...IRJET-  	  Detecting Malicious URLS using Machine Learning Techniques: A Comp...
IRJET- Detecting Malicious URLS using Machine Learning Techniques: A Comp...IRJET Journal
 
Classification Model to Detect Malicious URL via Behaviour Analysis
Classification Model to Detect Malicious URL via Behaviour AnalysisClassification Model to Detect Malicious URL via Behaviour Analysis
Classification Model to Detect Malicious URL via Behaviour AnalysisEditor IJCATR
 
A Comparative Analysis of Different Feature Set on the Performance of Differe...
A Comparative Analysis of Different Feature Set on the Performance of Differe...A Comparative Analysis of Different Feature Set on the Performance of Differe...
A Comparative Analysis of Different Feature Set on the Performance of Differe...gerogepatton
 
A Deep Learning Technique for Web Phishing Detection Combined URL Features an...
A Deep Learning Technique for Web Phishing Detection Combined URL Features an...A Deep Learning Technique for Web Phishing Detection Combined URL Features an...
A Deep Learning Technique for Web Phishing Detection Combined URL Features an...IJCNCJournal
 
Review of the machine learning methods in the classification of phishing attack
Review of the machine learning methods in the classification of phishing attackReview of the machine learning methods in the classification of phishing attack
Review of the machine learning methods in the classification of phishing attackjournalBEEI
 
A web content analytics
A web content analyticsA web content analytics
A web content analyticscsandit
 
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...IRJET Journal
 
Done rerea dwebspam paper good
Done rerea dwebspam paper goodDone rerea dwebspam paper good
Done rerea dwebspam paper goodJames Arnold
 
Network paperthesis2
Network paperthesis2Network paperthesis2
Network paperthesis2Dhara Shah
 
Gam Documentation
Gam DocumentationGam Documentation
Gam DocumentationDavid Chen
 
Low Cost Page Quality Factors To Detect Web Spam
Low Cost Page Quality Factors To Detect Web Spam Low Cost Page Quality Factors To Detect Web Spam
Low Cost Page Quality Factors To Detect Web Spam ieijjournal
 

What's hot (18)

Malicious Url Detection Using Machine Learning
Malicious Url Detection Using Machine LearningMalicious Url Detection Using Machine Learning
Malicious Url Detection Using Machine Learning
 
A SURVEY ON WEB SPAM DETECTION METHODS: TAXONOMY
A SURVEY ON WEB SPAM DETECTION METHODS: TAXONOMYA SURVEY ON WEB SPAM DETECTION METHODS: TAXONOMY
A SURVEY ON WEB SPAM DETECTION METHODS: TAXONOMY
 
State of the Art Analysis Approach for Identification of the Malignant URLs
State of the Art Analysis Approach for Identification of the Malignant URLsState of the Art Analysis Approach for Identification of the Malignant URLs
State of the Art Analysis Approach for Identification of the Malignant URLs
 
Improving Phishing URL Detection Using Fuzzy Association Mining
Improving Phishing URL Detection Using Fuzzy Association MiningImproving Phishing URL Detection Using Fuzzy Association Mining
Improving Phishing URL Detection Using Fuzzy Association Mining
 
2014_protect_presentation
2014_protect_presentation2014_protect_presentation
2014_protect_presentation
 
IRJET- Detecting Malicious URLS using Machine Learning Techniques: A Comp...
IRJET-  	  Detecting Malicious URLS using Machine Learning Techniques: A Comp...IRJET-  	  Detecting Malicious URLS using Machine Learning Techniques: A Comp...
IRJET- Detecting Malicious URLS using Machine Learning Techniques: A Comp...
 
Classification Model to Detect Malicious URL via Behaviour Analysis
Classification Model to Detect Malicious URL via Behaviour AnalysisClassification Model to Detect Malicious URL via Behaviour Analysis
Classification Model to Detect Malicious URL via Behaviour Analysis
 
A Comparative Analysis of Different Feature Set on the Performance of Differe...
A Comparative Analysis of Different Feature Set on the Performance of Differe...A Comparative Analysis of Different Feature Set on the Performance of Differe...
A Comparative Analysis of Different Feature Set on the Performance of Differe...
 
A Deep Learning Technique for Web Phishing Detection Combined URL Features an...
A Deep Learning Technique for Web Phishing Detection Combined URL Features an...A Deep Learning Technique for Web Phishing Detection Combined URL Features an...
A Deep Learning Technique for Web Phishing Detection Combined URL Features an...
 
Review of the machine learning methods in the classification of phishing attack
Review of the machine learning methods in the classification of phishing attackReview of the machine learning methods in the classification of phishing attack
Review of the machine learning methods in the classification of phishing attack
 
A web content analytics
A web content analyticsA web content analytics
A web content analytics
 
Learning to detect phishing ur ls
Learning to detect phishing ur lsLearning to detect phishing ur ls
Learning to detect phishing ur ls
 
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
 
Done rerea dwebspam paper good
Done rerea dwebspam paper goodDone rerea dwebspam paper good
Done rerea dwebspam paper good
 
Network paperthesis2
Network paperthesis2Network paperthesis2
Network paperthesis2
 
50120140504017
5012014050401750120140504017
50120140504017
 
Gam Documentation
Gam DocumentationGam Documentation
Gam Documentation
 
Low Cost Page Quality Factors To Detect Web Spam
Low Cost Page Quality Factors To Detect Web Spam Low Cost Page Quality Factors To Detect Web Spam
Low Cost Page Quality Factors To Detect Web Spam
 

Viewers also liked

A new successful project -lamp product--wit mold
A new successful project -lamp product--wit moldA new successful project -lamp product--wit mold
A new successful project -lamp product--wit moldBeta Jiang
 
Skyscape 2015-onboces-pdf
Skyscape 2015-onboces-pdfSkyscape 2015-onboces-pdf
Skyscape 2015-onboces-pdfJeff Paye
 
Abuse_in_the_Cloud_Palani_Ashwin
Abuse_in_the_Cloud_Palani_AshwinAbuse_in_the_Cloud_Palani_Ashwin
Abuse_in_the_Cloud_Palani_AshwinAshwin Palani
 
Some automotive parts made by WIT MOLD
Some automotive parts made by WIT MOLDSome automotive parts made by WIT MOLD
Some automotive parts made by WIT MOLDBeta Jiang
 

Viewers also liked (6)

A new successful project -lamp product--wit mold
A new successful project -lamp product--wit moldA new successful project -lamp product--wit mold
A new successful project -lamp product--wit mold
 
Skyscape 2015-onboces-pdf
Skyscape 2015-onboces-pdfSkyscape 2015-onboces-pdf
Skyscape 2015-onboces-pdf
 
Norma iram 4501
Norma iram 4501Norma iram 4501
Norma iram 4501
 
Abuse_in_the_Cloud_Palani_Ashwin
Abuse_in_the_Cloud_Palani_AshwinAbuse_in_the_Cloud_Palani_Ashwin
Abuse_in_the_Cloud_Palani_Ashwin
 
Tugas eka
Tugas ekaTugas eka
Tugas eka
 
Some automotive parts made by WIT MOLD
Some automotive parts made by WIT MOLDSome automotive parts made by WIT MOLD
Some automotive parts made by WIT MOLD
 

Similar to Report - Final_New_phishila

Detecting Phishing using Machine Learning
Detecting Phishing using Machine LearningDetecting Phishing using Machine Learning
Detecting Phishing using Machine Learningijtsrd
 
Detecting Phishing Websites Using Machine Learning
Detecting Phishing Websites Using Machine LearningDetecting Phishing Websites Using Machine Learning
Detecting Phishing Websites Using Machine LearningIRJET Journal
 
Lab-3 Cyber Threat Analysis In Lab-3, you will do some c.docx
Lab-3 Cyber Threat Analysis        In Lab-3, you will do some c.docxLab-3 Cyber Threat Analysis        In Lab-3, you will do some c.docx
Lab-3 Cyber Threat Analysis In Lab-3, you will do some c.docxLaticiaGrissomzz
 
Detection of Phishing Websites
Detection of Phishing Websites Detection of Phishing Websites
Detection of Phishing Websites Nikhil Soni
 
Low Cost Page Quality Factors To Detect Web Spam
Low Cost Page Quality Factors To Detect Web SpamLow Cost Page Quality Factors To Detect Web Spam
Low Cost Page Quality Factors To Detect Web Spamieijjournal
 
Low Cost Page Quality Factors To Detect Web Spam
Low Cost Page Quality Factors To Detect Web SpamLow Cost Page Quality Factors To Detect Web Spam
Low Cost Page Quality Factors To Detect Web Spamieijjournal
 
A security note for web developers
A security note for web developersA security note for web developers
A security note for web developersJohn Ombagi
 
Phishing Website Detection using Classification Algorithms
Phishing Website Detection using Classification AlgorithmsPhishing Website Detection using Classification Algorithms
Phishing Website Detection using Classification AlgorithmsIRJET Journal
 
IRJET- Phishing Website Detection System
IRJET- Phishing Website Detection SystemIRJET- Phishing Website Detection System
IRJET- Phishing Website Detection SystemIRJET Journal
 
PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...
PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...
PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...IJCNCJournal
 
PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...
PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...
PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...IJCNCJournal
 
ChongLiu-MaliciousURLDetection
ChongLiu-MaliciousURLDetectionChongLiu-MaliciousURLDetection
ChongLiu-MaliciousURLDetectionDaniel Liu
 
200+ SEO factors.docx
200+ SEO factors.docx200+ SEO factors.docx
200+ SEO factors.docxSuman456834
 
200+ SEO factors.docx
200+ SEO factors.docx200+ SEO factors.docx
200+ SEO factors.docxSuman456834
 
Smart Crawler Automation with RMI
Smart Crawler Automation with RMISmart Crawler Automation with RMI
Smart Crawler Automation with RMIIRJET Journal
 
A Novel Interface to a Web Crawler using VB.NET Technology
A Novel Interface to a Web Crawler using VB.NET TechnologyA Novel Interface to a Web Crawler using VB.NET Technology
A Novel Interface to a Web Crawler using VB.NET TechnologyIOSR Journals
 
Detection of Phishing Websites
Detection of Phishing WebsitesDetection of Phishing Websites
Detection of Phishing WebsitesIRJET Journal
 
IRJET - Chrome Extension for Detecting Phishing Websites
IRJET -  	  Chrome Extension for Detecting Phishing WebsitesIRJET -  	  Chrome Extension for Detecting Phishing Websites
IRJET - Chrome Extension for Detecting Phishing WebsitesIRJET Journal
 
IRJET- Phishing Website Detection based on Machine Learning
IRJET- Phishing Website Detection based on Machine LearningIRJET- Phishing Website Detection based on Machine Learning
IRJET- Phishing Website Detection based on Machine LearningIRJET Journal
 
IRJET - Phishing Attack Detection and Prevention using Linkguard Algorithm
IRJET - Phishing Attack Detection and Prevention using Linkguard AlgorithmIRJET - Phishing Attack Detection and Prevention using Linkguard Algorithm
IRJET - Phishing Attack Detection and Prevention using Linkguard AlgorithmIRJET Journal
 

Similar to Report - Final_New_phishila (20)

Detecting Phishing using Machine Learning
Detecting Phishing using Machine LearningDetecting Phishing using Machine Learning
Detecting Phishing using Machine Learning
 
Detecting Phishing Websites Using Machine Learning
Detecting Phishing Websites Using Machine LearningDetecting Phishing Websites Using Machine Learning
Detecting Phishing Websites Using Machine Learning
 
Lab-3 Cyber Threat Analysis In Lab-3, you will do some c.docx
Lab-3 Cyber Threat Analysis        In Lab-3, you will do some c.docxLab-3 Cyber Threat Analysis        In Lab-3, you will do some c.docx
Lab-3 Cyber Threat Analysis In Lab-3, you will do some c.docx
 
Detection of Phishing Websites
Detection of Phishing Websites Detection of Phishing Websites
Detection of Phishing Websites
 
Low Cost Page Quality Factors To Detect Web Spam
Low Cost Page Quality Factors To Detect Web SpamLow Cost Page Quality Factors To Detect Web Spam
Low Cost Page Quality Factors To Detect Web Spam
 
Low Cost Page Quality Factors To Detect Web Spam
Low Cost Page Quality Factors To Detect Web SpamLow Cost Page Quality Factors To Detect Web Spam
Low Cost Page Quality Factors To Detect Web Spam
 
A security note for web developers
A security note for web developersA security note for web developers
A security note for web developers
 
Phishing Website Detection using Classification Algorithms
Phishing Website Detection using Classification AlgorithmsPhishing Website Detection using Classification Algorithms
Phishing Website Detection using Classification Algorithms
 
IRJET- Phishing Website Detection System
IRJET- Phishing Website Detection SystemIRJET- Phishing Website Detection System
IRJET- Phishing Website Detection System
 
PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...
PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...
PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...
 
PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...
PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...
PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...
 
ChongLiu-MaliciousURLDetection
ChongLiu-MaliciousURLDetectionChongLiu-MaliciousURLDetection
ChongLiu-MaliciousURLDetection
 
200+ SEO factors.docx
200+ SEO factors.docx200+ SEO factors.docx
200+ SEO factors.docx
 
200+ SEO factors.docx
200+ SEO factors.docx200+ SEO factors.docx
200+ SEO factors.docx
 
Smart Crawler Automation with RMI
Smart Crawler Automation with RMISmart Crawler Automation with RMI
Smart Crawler Automation with RMI
 
A Novel Interface to a Web Crawler using VB.NET Technology
A Novel Interface to a Web Crawler using VB.NET TechnologyA Novel Interface to a Web Crawler using VB.NET Technology
A Novel Interface to a Web Crawler using VB.NET Technology
 
Detection of Phishing Websites
Detection of Phishing WebsitesDetection of Phishing Websites
Detection of Phishing Websites
 
IRJET - Chrome Extension for Detecting Phishing Websites
IRJET -  	  Chrome Extension for Detecting Phishing WebsitesIRJET -  	  Chrome Extension for Detecting Phishing Websites
IRJET - Chrome Extension for Detecting Phishing Websites
 
IRJET- Phishing Website Detection based on Machine Learning
IRJET- Phishing Website Detection based on Machine LearningIRJET- Phishing Website Detection based on Machine Learning
IRJET- Phishing Website Detection based on Machine Learning
 
IRJET - Phishing Attack Detection and Prevention using Linkguard Algorithm
IRJET - Phishing Attack Detection and Prevention using Linkguard AlgorithmIRJET - Phishing Attack Detection and Prevention using Linkguard Algorithm
IRJET - Phishing Attack Detection and Prevention using Linkguard Algorithm
 

Report - Final_New_phishila

  • 1. Countermeasures against Phishing sites Phishilla – An anti-phishing extension for Mozilla firefox Nagarajan Kuppuswami Department of Computer Science Virginia Tech Blacksburg, VA nagara7@vt.edu Venkatasubramaniam Ganesan Department of Computer Science Virginia Tech Blacksburg, VA venkatg@vt.edu Ashwin Palani Department of Computer Science Virginia Tech Blacksburg, VA ashwinp7@vt.edu Abstract Phishing has been in prominence since 1987 and it has done considerable damage to the internet user community. The level of expertise of the adversaries in attacking sites has increased along with the advancements in the security. The attacks could be either from a malicious website or through emails. There is an urgent need to combat such attacks as the losses caused by them have been growing exponentially. The first part of our project addresses the existing countermeasures that are in place in various anti-phishing tools, advantages of using them and possible disadvantages that fraudsters could exploit. The second section explains the features of the extension proposed by us, its working and advantages. We have also evaluated our scheme by assessing its performance on certain phishing as well as legitimate sites. We conclude by mentioning the enhancements and improvements that could be added to the current scheme. I. INTRODUCTION TO PHISHING AND EXTENSIONS Phishing is defined as a criminally fraudulent process that attempts to acquire sensitive information such as usernames, passwords and financial details such as credit card information, etc. They typically attack users by means of fake URLs, emails and instant messaging. Modern day browsers have developed capabilities to detect such fraudulent sites and there are schemes that could probably be missed by the browsers. Hence, extensions/plug-ins could serve effectively in providing these additional functionalities. II. EXISTING COUNTERMEASURES AND SCHEMES A. Black Lists and White Lists Check Scheme This scheme uses a database or list published by a trusted party, where known phishing web sites are blacklisted. Tools include Websense, McAfee’s anti–phishing filter, Netcraft anti- phishing system, Cloudmark SafetyBar, Microsoft Phishing Filter. A similar white list is also maintained for sites that are valid and legal. White lists usually contain sites that have been the targets of phishing attacks. Advantages: Simple to implement and would just involve a look up of domain against the blacklist Disadvantages: The weakness of this approach is its poor scalability and its timeliness. Phishing sites mushroom randomly and last for only a few days. B. Server based schemes a) Server Authentication: This is used to verify the credentials presented by a web server. This is issued by a trusted third party that can vouch for the bearer’s identity. Generally, it displays logos, icons, seals of the brand in the browser window. This scheme is used by anti-phishing toolbars such as Content Verification Certificates, GeoTrust ToolBar, and Trustbar. Trust Watch is a toolbar which authenticates through a third party. Advantages: This scheme is one of the robust ways of checking the authenticity of the server. It reduces the chances of raising false alarms of legitimate sites and false negatives. Disadvantages: Because of the lack of global public key infrastructure, users may tend to blindly trust or reject the credentials presented by the web server. b) Shared Secret Schemes: This scheme is currently used in Dynamic Security Skins. It works by visually comparing client images by users with the ones provided by the server. Advantages: User takes the decision in detection of the phishing site through recall. Disadvantages: User should be aware and also have prior knowledge of the intended domain. C. Information Retrieval Based Schemes a) Term Frequency Calculation: This scheme calculates the tf- idf weights and finds the maximum frequency terms and searches for these terms in a search engine like Google and finds if the domain comes within the top ‘n’ results. b) Support Vector Machines: Perform binary classification into sites as phishing or non-phishing based on the information obtained through identity information like DOM objects (A HREF, IMG) etc.
  • 2. Advantages: The major advantage of these schemes is their strong mathematical foundations and applied probabilistic values and learning based techniques. Disadvantages: The scheme raises false alarms and requires manual classification of initial training data and specification of rules. D. Page Ranking based Schemes PageRank is a link analysis algorithm, named after Larry Page, used by the Google Internet search engine that assigns a numerical weighting to each element of a hyperlinked set of documents, such as the World Wide Web, with the purpose of "measuring" its relative importance within the set. The algorithm may be applied to any collection of entities with reciprocal quotations and references. The numerical weight that it assigns to any given element E is also called the PageRank of E and denoted by PR(E). Page rank determines the popularity of a URL in the web. The higher the Page Rank, the more important is the page. Phishing web pages most often either have a very low page rank or their page rank does not exist. Very few phishing pages manage to increase their page rank, possibly by using link spamming techniques. Page Index is defined as the number of pages from a particular website that Google has in its database. Phishing web pages usually are accessible only for a short period and hence many might not be found in the index. Advantages: Page Rank and Page Index values are strong features for identifying if a URL is non-phishing especially if the crawl is from a reputed search engine like Google. Disadvantages: Freshly created pages especially in new domains would rank very low and hence could sometimes result in a false negative. E. DOM objects Retrieval Schemes a) Keywords and Meta Tags: This scheme searches for domain information in meta tags with “name = description” and whose name or http–equiv is “copyright”. It also searches for the title tag, retrieves information and matches it with domain. If no match is found, suspicion weight for that site is increased. b) Request URLS: DOM elements like <img> tags load information from other URLs. Most of these URLs would be within the same domain or these objects would be loaded from an image server for this domain. i) Phishing sites in order to arrive at the same look and feel of the phished domain specifies the phished domains image server in its <img src> ii) Also phishing sites mostly maintain only a single or few URLs similar to that of targeted site. Hence the <img src> URLs would be different from that of the domain. The above two points help us to formulate a heuristic where the number of external domain references (also different image servers ) are checked and if the number crosses a threshold , the site can be marked against a phishing site and a degree of suspicion assigned to it. c) AURL – Anchor URLs: This scheme detects a phish based on the Anchor URLs present in the site. Following deductions can be made based on AURL and other information regarding URLs in a webpage. i) As already mentioned in the previous paragraph, the number of external anchors in an illegitimate webpage would be high. This property can be used to mark a page as a phishing page. ii) The hyperlink provides DNS domain names in the anchor text, but the destination DNS name in the visible link doesn’t match that in the actual link. iii) Dotted decimals used in the URI could be calculated to detect malicious websites. d) Form Tags: Legitimate websites usually have the form’s action set to a valid URL mostly within the domain. Illegitimate sites have this action tag containing URL in different domains from that of the page or sometimes null. e) Body Tags: Some websites provide a description of themselves in the body portion and this can be used to identify a phishing site f) SSL Certificates: Distinguished names of phished sites in the certificate vary with that of claimed identity. This check can be employed to detect any forgeries in the website. Advantages: These schemes are fast and easy to detect phishing sites. They perform checks irrespective of any date change/manipulation in the website. Disadvantages: False negatives - Certain legitimate sites could have lengthy URLS, large number of dots etc. F. URL Check Schemes One method to detect phishing sites is by observing the URL of the page and examining characteristics such as its length, presence of suspicious punctuations, etc. Below are some of the checks performed on the URL to determine its validity against phishing attacks. a) URL check to see presence of other domain names: The URL is checked against a valid list of white listed sites and presence of any of the sites in the URL path but not in the host name of the URL. This serves as an indication that the URL checked is trying to use a valid site.
  • 3. b) Length of the URL: Abnormal length of the URL could raise suspicion and sites carrying long host names or large string of words is checked for phishing. c) Presence of suspicious special characters: Adversaries use the character ‘@’ in the path as a ‘@’ symbol in a URL causes the string to the left to be disregarded, with the string on the right treated as the actual URL for retrieving the page. This combined with limited size of the browser address bar allows the attacker to write URLs that appear valid within the address bar but actually contains some malicious path after the @ symbol. The check for this symbol in the URL should help in detecting phishing URLs. d) Checking obfuscation of URLs: URLs can be obfuscated by inserting hexadecimal characters instead of individual characters. Attackers take advantage of the ignorance of certain users regarding the structure of a URL. For example, the symbol @ could be represented as %40 and the dots could be replaced by the character ‘%2e’. IP addresses can also be represented as hexadecimal characters and a suspected IP address which is part of blacklisted sites can be hidden by these characters and thus escapes the URL blacklist check. This attack can be identified by maintaining a map containing the hexadecimals and its possible conversions. e) Suggestive word tokens in the URL: Phishing URLs aim to extract confidential user information such as their usernames and passwords in a particular domain. Hence, a check for keywords such as login, sign-in, confirm, etc. in the path suggests that the page looks for user information and the URL is double checked for phishing. These tokens are extracted from the blacklisted URL paths and it is found they occur more frequently than other tokens. f) Dots in URL: It is found that phishing sites use many dots in their URL but legitimate sites do not. The given URL is considered a phish if the number of dots in the URL exceeds five. These URL schemes are used in anti-phishing tools like SpoofGuard, WebSpoof and SpoofStick. Disadvantages: There are cases where some valid sites fail some of the URL checks such as the length check. G. WHOIS Lookup based Schemes WHOIS is a query/response protocol which is widely used to query databases to retrieve details of Internet resources such as the domain name, IP address block and autonomous system number. Primarily, it serves as an effective tool to search domain information, registrar data, admin data and the name servers used. WHOIS lookup can be used in anti-phishing schemes to detect information such as age of the domain and IP address resolution. Checking the age of the domain of the phishing site identifies the validity of the phishing site. APWG states that the average age of a phishing site is 4.5 days and there are also sites which last only for few days. Hence, WHOIS lookup of the phishing site can be performed to check the age of the domain. If the page was registered longer than 12 months, it is considered legitimate and if it is less, it could be checked stringently for phishing. Some sites do not return data on a WHOIS lookup and they could be considered a phish. Providing the IP address to WHOIS database provides the details of its domain and its registration details. This helps in anti-phishing schemes where the path contains only the IP address information. Disadvantages: This check by contacting the WHOIS database for each website fails in circumstances where the phishing site is hosted on an existing valid server and when criminals manage to break into the server. Here, the WHOIS lookup yields a value that is valid and hence the check fails to find a phish. The check also fails in cases where some businesses outsource some of their web operations to contractors with different domain names. eBay Toolbar and SpoofGuard are tools using the WHOIS lookup scheme. H. Client side Defense schemes These are anti-phishing schemes which require user to maintain databases on objects that is generally present in a web page. The user/browser stores information such as passwords, images, etc. SpoofGuard uses the client side defense schemes mentioned. Some of the schemes are briefed in the section below. a) Outgoing password check: By maintaining triplets (domain, username, and password) for each domain in a database, a phishing scheme can avoid the possibility of leaked information. Every time a user enters a password into a phishing site, the stored password which is hashed using algorithms such as SHA-1 is compared and issues a warning to the user that the same username-password combination is used for a different domain. This scheme is particularly helpful when the spoof site could use an image of the word “password” instead of html text to request user’s password. Since all the passwords are hashed and stored in a database, this phishing site can be detected. Disadvantages: It is practically not possible to include the passwords for all domains and their usernames. This is also a security risk where leakage of these stored passwords could lead to greater impact.
  • 4. b) History Check: Most of the above anti-phishing measures are bound to raise alarms for legitimate sites. Hence, this is a scheme that is employed to avoid any false alarms in phishing schemes. It checks the user’s browser history and does not issue any warnings to sites that are in the user’s history file. Disadvantages: If the user inadvertently bypasses the initial warning, this site will never be checked for phishing and might considerable damage. c) Domain Check: This scheme checks the history of the user’s browser and checks if the domain of the current page closely resembles any of the previously visited pages/domains. This comparison is done by calculating the hamming distance. For example, a site ‘wikifedia.org’ will raise a warning if the user has previously visited the ‘wikipedia.org’. This scheme is devised to prevent adversaries from hosting sites that contains misspelled versions of popular sites. Disadvantages: This check fails when there are legitimate sites which have close resemblances in their domain names and this raises false alarms. d) Referring site check: The browser maintains a record of referring pages. The referring pages are those links which is followed by the user. Typical phishing attacks are from emails and if the user reads an email from a phishing site, the referring page is the email host. Use of IP addresses by phishing sites can be tracked by doing a reverse DNS lookup. If the resolve hostname is not listed in the referring sites listing, the site is deemed as a phish. e) Image-Domain associations: The scheme maintains a database of images associated with each domain. The initial static database is assembled by using crawler type tools and it is augmented to an individual’s browser history. The database maintained can contain either fixed database or hashed images. The scheme helps in finding phishing sites, which might contain images with different hash values than what is stored in the database and this raises an alert. Disadvantages: The possibility of storing images and its hash values is infeasible and it is limited to the client-side configuration and storage restrictions. f) Profiling/Cache: The cache of a web site could be obtained through means of Google - Cache. The last cache date could be found. This information could be used in determining a phishing site. III. PHISHILLA FEATURES A. Introduction to Phishilla Phishilla is a Plug-in or extension for the web browser Mozilla Firefox. It is embedded with the browser and runs in the same memory context as the browser. It checks for any malice in the site entered by the user and if found, provides a popup box, warning the user from entering the site. Phishilla uses features such as URL check, WHOIS lookup to retrieve the site information, page rank, page index and a host of other features which are described in detail in Section III. A typical Firefox extension is packaged in a ZIP file or bundle with a file extension .xpi. It follows a folder structure and contains a XUL file that adds functionalities to the browser. XUL is an XML grammar that provides user interface widgets like buttons, menus, toolbars, trees, etc. These XUL files contain references to JavaScript which provides additional functions to the browser. Phishilla uses the XPI file structure maintained for Firefox extensions with an XUL file that calls a JavaScript function. The JavaScript function performs the necessary checks on the site entered by the user and returns a result based on a weighted sum calculation. This weighted sum calculation is done for each site based on the features mentioned in Section III. Phishilla is a lightweight browser plug-in for Mozilla Firefox and supports versions 1.through 3.5 and higher. Figure 1 depicts the flow of working of the plugin. The webpage is first checked in the maintained white list and if found to be in the list, it extension proceeds to load the page. Phishilla displays the location of the page in the status bar of the browser and if the user doubts the location to be somewhere else rather than the actual location that is being displayed, he/she can click on the same and the browser shows a popup warning whether to proceed. The user takes a decision on proceeding and site is added to the blacklist if the user accepts the warning. If not, Phishilla proceeds to do the mandatory checks such as blacklist check, DOM Objects check, Google Page rank check, Inbound links check, Traffic information check, Page Index check, URL information check and Domain Age check. A weighted sum is calculated based on the outcome of all these checks and if the weight is more than 35, Phishilla displays a warning to the user and adds the site to the blacklist based on the user acceptance of the warning. If the user proceeds to ignore the warning, Phishilla asks for a confirmation on whether to add the site to white list and proceeds to load the page on user confirmation. Figure 1. Phishilla working design
  • 5. B. Features in Phishilla Phishilla has incorporated most of the above schemes in section 1.A making use of client side checking. Since no one method is good enough to detect a wide variety of phishing sites we have used a combination of schemes where a score is computed foe each web page which is a weighted sum of the application of sets of heuristics. a) White Lists: There is an initial list of trustworthy /popular sites. The user can also manually add his/her list of domains to the white list based upon his/her prior knowledge of these sites. When a web page is entered in the browser’s address bar and it loads, the domain of the web page is compared with the domains present in the white list. In the case of a match, no further checks are performed and the user proceeds to the intended destination URL. The sites which have been the targets for phishing sites were collected from statistics provided by PhishTank.com and APWG. They were added to the white list maintained by us. b) Black Lists: We maintain a black list of 50 phishing sites as training data. Each time a web page is loaded, a lookup of the webpage domain is done against the domains in the blacklist. If a match occurs, the user is warned right away that the site is a phishing site and a message is popped out to the user asking him not to proceed. At this juncture, the user is left with either of these options: i) The user is navigated away from the site on pressing the cancel button. ii) The user proceeds to the site on pressing the proceed ok button c) Location of Domain: There are specific countries which rank high in the number of fraudulent sites. Also a user may have prior knowledge/experience of an intended site and hence be immediately aware if the location of the malicious site is from an unlikely country. The location of the site is obtained through a reverse lookup on a WHOIS database (my- addr.com). This location information is displayed to the user in the status bar. This information is useful especially useful when the user has prior knowledge of the intended site and it is a means of providing cues to the user. The user, if suspicious of the location of a particular site, can click on the status bar which would popup a window requesting confirmation from the user to proceed to add to the blacklist. E.g. The Indian Railways site, “irctc.co.in” and the popular bank site “barclays.co.uk” sites are unlikely to be hosted in countries like China or Russia. d) Age of a Website: The age of a website is queried by doing a look up of the domain on a WHOIS database (vitzo.com/en/whois) and the registration date of the domain is obtained. Most phishing sites tend to have very little or no history, hence the age of websites is a factor to take into consideration when evaluating it for phishing. If the age of the site is below a threshold a positive determining factor is added for the site. Similarly if the age is above an upper limit (i.e. the site has been there for a long time), then a negative determining factor is given for the site. e) Meta Information: The content property of the META, whose name or http-equiv is “description is retrieved. If there is no match between the information in the URL and information in the META then a positive phishing determining factor is added for the domain. Otherwise a negative determining factor is attributed to the domain f) Google Page Rank: Google assigns a page rank in a scale of 0-10 for every webpage. Page rank defines ‘the importance' of a webpage. The higher the rank the more important a web page is. Here we retrieve the domain of the webpage and query Google (at toolbarqueries.google.com) programmatically to obtain the page rank of that domain. Once the page rank is obtained a different weighing scale is provided If Pr = PageRank (Web Page) And the respective weight of the webpage for the page rank heuristic would be If the Pr is between 2 and 10 Factor is based on how much information the page rank provides in classification of a website into phishing/non- phishing. A page rank of 2-10 would ensure that the phishing determinant value for this heuristic is negative (i.e. leans towards this page not being a phish). If the Pr is 0, 1 or -1 (page rank not found) A page rank value of 0, 1 or not found would compute a positive determinant value for this heuristic. g) Google Page Index: The web page domain is looked up in Google index database and if the page is indexed, a negative determinant score is provided for the index heuristic. If no index information is obtained on querying Google, a positive determinant score is provided for the index heuristic h) Inbound Links: The popularity of a page or to some extent the level of dependence or trustworthiness is gauged by the number of external references from other pages to this page. Thus the number of inbound links to a webpage domain is programmatically queried and obtained through the Alexa Database. The number of inbound links to a phishing site is very minimal with maybe only a few instances of inbound links. Hence this data can be applied effectively in determining if a page could fall into the phishing category. i) Traffic Information: The popularity of a webpage could also be gauged through the number of visits to the page. The traffic page rank of page is obtained programmatically by querying the Alexa Database. The lower the value of rank more W (Web Page) = (-1) * Pr * (factor) W (Web Page) = 10 * factor
  • 6. is the traffic flow to the site. This page rank is computed based on the combination of number of user visits and number of page views over a period of three months. A threshold is set on the traffic rank and if the traffic rank of the web page domain is lesser than the threshold, a positive determinant value is set for the page rank heuristic. If the page rank is above the threshold, a negative determinant value is set for the page rank heuristic. This is also a very useful heuristic in gauging a site to be phishing or not. j) Anchor tags: Anchor tags are a very common way of cheating unsuspecting users. Anchor tags have a href portion not visible to the user and also a visible text portion for the user to click upon. A fraudster could manipulate this in a way to benefit him. For example, <a href=”http://www.phising.com/”> www.icicibank.com </a> could mislead the user to think that the link leads to icicibank.com Hence we check if the visible text is an URL and if so if it is the same as the actual text. If different a positive determinant value is applied for this heuristic. Similarly href’s in anchor tags go through all checks that have been applied to URLS like checking for IP addresses, obfuscated URLs, length, number of dots, special characters etc. If any of the heuristic passes, it results in a positive determinant which is added to the total determinant. k) Server Form Handlers: Documents contain the form tag whose action specifies the URL to pass control to in the event of an action like pressing the submit button. Usually phishing sites have only a limited or a single URL or domain which is similar to the target website. Hence a check is performed to confirm if the action property of the form references an external location different from that of the URL in the address bar. If the check succeeds a positive determinant value is assigned to the form handler heuristic. l) Images and Other External Objects: Phishing sites, in order to look akin to the target website so as to trick users into believing it to be the original site use images from the image server of the target (original site). A compilation (white list) of popular sites especially those of financial institutions is maintained in our project. There is also a map of servers used by these websites. Hence when a fraudulent site tries to use the images from any of these sites, the img src information of this site is matched with the image server URLs of the white listed sites and if there is a match, a positive determinant weight is set for external object heuristic. m) Length of URL: Adversaries try to manipulate sites by including more information in the website URL. Utilizing the limited number of strings the browser shows in the address bar, fraudulent websites can be made to show legitimate. Hence, we have included a heuristic check to find the length of URL and increase the weight if the length exceeds 80 characters. Studies have shown that phishing URLs have unusually large length and hence a check on the URL length would help determining fraudulent websites. n) Dots in URL: We have included a check to determine the number of dots in an URL and increase the weighted sum if it is found that the number of dots is greater than XXX. It was found that most of phishing sites have more than acceptable number of dots in their URLs. o) ‘@’ symbol check in URL: Phishing sites make use of the characteristic feature if the @ symbol which allows the browser to disregard the addresses to the left of it. So, anything between the http:// and @ is not considered by the browser. Adversaries utilize this disadvantage and manipulate legitimate sites by inserting URLs between http and the '@' symbol to trick the users. Our scheme checks for this @ symbol or its hexadecimal equivalent ‘%40’ in the URL and increases the weighted sum if it is found. p) Pattern matching: Most of the phishing sites try to mimic target sites such as eBay, PayPal, Bank Of America (Top 3 targets) etc.., The most common way through which an user is mislead into going to such a phishing site is getting enticed by the catch words eBay, PayPal etc. Hence we apply a basic check where we check if the URL has a domain other than the popular sites in the whitelist but has a string which closely resembles that of these possible target sites in the whitelist. In this scenario, we add a positive determinant value for this heuristic. IV. EVALUATION AND ANALYSIS OF PHISHILLA Phishilla is a rule-based heuristic tool. It may at times cause false positives (treat non-phishing site as phishing site) and false negatives (i.e., treat phishing site as non-phishing site). Phishilla was evaluated on a sample size of 64 URLs. 32 of them were phishing URLs obtained from Phishtank.com and other web resources. These URLS were also ascertained to be phish URLS after checking them on browsers like Google chrome and Mozilla Firefox phishing filters. 32 of these were URLS chosen at random through Yahoo Random URL generator (http://random.yahoo.com/fast/ryl) and also certain URLS of known people and known domains. Two tables below show the evaluation results for phishing and legitimate sites respectively. No. PHISHING URLS RESULT
  • 7. 1 http://www.setuplogecount.co.uk/index.php Found 2 http://info.kuspuk.net/phpMyAdmin/config/ppusa/ Found 3 http://aimm.ye.ro/zboard/data/pesmm.html Found 4 http://grapelove.co.kr/_gabia/fs3_gongji/gtbplc/ibank_gtbplc_com.php Found 5 http://www.goodcreditahead.com/forum/bancoposta/index.php? MfcISAPICommand=SignInFPP&UsingSSL=1&emai=&userid= Found 6 http://singine4baylogisny8iaznwaz.nm.ru/by-Brownie- wise_W032879327328929Qitem1QQDJSyyyd37sdcmbbyloginpag23za32wa32w2azZza3ews az.html Found 7 http://server.e-foto.lt/js/228411.paypal.com/webscr_cmd_login-run.php Found 8 http://www.olancompany.com/images/redirecting.html Found 9 http://forumoficial.hostrator.com/de.html Failed 10 http://nvbchannel.net/forum/paypal.htm Found 11 http://www.web-page.com.ar/win/133847.paypal.com/webscr_cmd_login-run.php Found 12 http://activex.emenace.com/us Failed 13 http://it-paypal.com/PayPal.It.html Found 14 http://publidisco.com/catalog/images/microsoft/index2.htm Found 15 http://muziekschoolallmusic.nl/vakanties/ib.html Failed 16 http://kbic.info/bbs/data/portal/server.pt/ Found 17 http://vonage.id1114555.online-webforms.com/ Found 18 http://motors-support.net Failed 19 http://ba03.pochta.ru/ehay.html Found 20 http://www.portaljenipapo.com/login.htm Failed 21 http://soullovebags.com/images/www.mybank.alliance-leicester.co.uk/index.html Found 22 http://www.stentend.com/de/ Found 23 http://signin.ebay.com.ws.ebayisapi.dll.ciczdztxtwsdyhsfpndr.virtualbattlespace2.com/frogstar/ down Found 24 http://signin-ebay.adacorrigan.co.uk/ Found 25 http://www.centralfilms.net/locaciones/moore/scripts.php Found 26 http://e-mind.be/img/hp/base/b049/gdxow.php Found 27 http://www.parkdaeli.com/bbs/file/new.egg.com/logon.htm Found 28 http://www.stentend.com/de/ Found 29 http://www.candelaradio.fm/los15img/ Found 30 http://www.skype.com.ofi.uni.cc/?id=49126&lc=us Failed 31 http://fwqdeq.mail2k.ru/n.html Found 32 http://mobile-me.org Found Table-1 Evaluation results for phishing sites No. LEGITIMATE SITES RESULTS
  • 8. 1 http://sportsillustrated.cnn.com/basketball/ncaa/women/teams/youngstown/ Passed 2 http://www.zumbrolutheran.org/ Passed 3 http://www.socialcouch.com/interview-with-richard-binhammer-dell-social-media/ Passed 4 http://amazwi.blogspot.com/ Passed 5 http://cs.vt.edu/ Passed 6 http://www.alphabusinesscentre.com/ Passed 7 http://www.christchurchpompton.org/ Passed 8 http://www.wongbrothers.com/ Passed 9 http://www.nzembassy.com/ Passed 10 http://geogratis.cgdi.gc.ca/ Passed 11 http://www.prapa.com/ Passed 12 http://www.circuit8.org/ Passed 13 http://www.findstolenart.com/ Passed 14 http://www.maroc.net/ Passed 15 http://www.minorleagueballparks.com/neds_oh.html Passed 16 http://www.elth.pub.ro/ Passed 17 http://www.finleys.com/ Passed 18 http://www.spravi.8m.com/ Passed 19 http://www.paconcours.com/ Passed 20 http://www.rmadhavan.com/ Passed 21 http://us.com/ Passed 22 https://home.americanexpress.com/home/global_splash.html Passed 23 http://www.lamega.com/ Passed 24 http://www.everydaymaternity.com/ Passed 25 http://www.shop-cliftonparkcenter.com/ Passed 26 http://www.eqc.govt.nz/ Passed 27 http://www.yorkarchaeology.co.uk/ Failed 28 http://mikeshost.110mb.com/xy.php Failed 29 http://weather.mgnetwork.com/cgi-bin/weatherIMD3/weather.cgi? user=TBO&forecast=zandh&pands=Miami%2C+FL Failed 30 http://www.atifitnuts.com/ Failed 31 http://www.asgsherman.com/ Failed 32 http://www.ambache.co.uk/ Failed Table-2 Evaluation results for legitimate sites A. Evaluation measures The following measures were adopted in evaluating Phishilla: a) Total Catch Rate: Number of phish URLs that were correctly blocked or warned.
  • 9. Number of correctly caught phish URLs = 28 Total number of phish URLs = 32 Percentage of correctly caught URLS = 28 / 32 * 100 = 87.5 % b) False Negatives: Number of phish URLs that were incorrectly allowed. Number of incorrectly allowed phish URLs = 4 Total number of phish URLs = 32 Percentage of false negatives = 4 / 32 * 100 = 12.5 % c) Allows: Number of good URLS that were correctly allowed. Number of correctly allowed good URLs = 26 Total number of good URLs = 32 Percentage of false positives = 26/32 * 100 = 81.25% d) False Positives: Number of good URLS that were incorrectly blocked. Number of incorrectly blocked good URLs = 6 Total number of good URLs = 32 Percentage of false positives = 6/32 * 100 = 18.75% B. Analysis of Phishilla Through our evaluation we verified that Phishilla may sometimes result in false positives for relatively unknown sites but is unlikely to cause false negatives of major impact. a) Analysis of False Positives: False positives are the number of the good URLs that are incorrectly blocked 1) Phishilla reports false positives in case of good URLs with abnormal URL lengths or a large number of dots as opposed to standard conventions. 2) If a dotted decimal IP address is provided instead of a name, if Phishilla were to report an error, it could sometimes result in a false positive as sometimes this kind of domain name maybe desirable. Hence in this case Phishilla only reports a warning that an IP address is being used in the URL and it could possibly be an illegitimate site. 3) False positives are also possible when the site is relatively new or unknown site with very little or no inbound links or traffic. b) Analysis of False Negatives: False negatives are the number of the phished URLs that are incorrectly allowed. It is imperative that any good anti-phishing scheme or tool reduces the number of false negatives and Phishilla addresses this issue well. False negatives occur mostly only when there is very little DOM element information that can be compared against standard heuristics c) Performance: The performance of Phishilla is good since only JavaScript is used and all operations are done on the client side. V. ADVANTAGES OF PHISHILLA Thus Phishilla is a browser plug-in which accomplishes the task of detecting a phishing site by following a set of well proven and established method. It has the following advantages: 1) Lightweight 2) Follows combination of well-tested and successful anti-phishing schemes. 3) Computes a weighted sum where heuristics are assigned different values based upon their ability to classify the malicious content in the website. 4) Excellent catch rate. VI. CONCLUSION In this paper we have discussed the set of existing counter measures against phishing and the possible merits and flaws in these schemes and the adoption of these schemes by existing market place tools. We have identified that a single heuristic or a single class of heuristics are not sufficient enough to successfully determine a phishing site. Hence we have adopted a scheme which combines several Phishing classification schemes used across several tools and added weights for each scheme depending on its effectiveness in classification i.e. the detection accuracy. Phishilla provides phishing alerts to the user in a non intrusive manner without affecting the browser experience. It follows a client-side approach where all the logic is executed in client-side code. This makes Phishilla efficient and also brings about only a minimal set of requirements. While Phishilla has a good catch rate and detects a majority of the phishing sites, possible avenues of enhancements in Phishilla include incorporating features such as profiling, checking of SSL certificates, image matching, etc. which would need server side functionalities. The GUI could be enhanced to provide more virtual cues to the user and possible display of color codes. This would indicate the level of
  • 10. determining whether a site is malicious or not. Similarly, the users could be profiled when they mark a site as phishing and weights could be provided to users based upon their previous phish reporting history. Other learning Based Methods could also be incorporated where the effectiveness of each heuristic is monitored over time and the weights re-adjusted accordingly. Phishilla could also be extended to track and report phishing e-mails, the current plague on the internet which leads users to the unsolicited phishing sites. VII. ACKNOWLEDMENTS This work was carried out at the Virginia Polytechnic and State University. We thank Dr Jung Min Park for providing the impetus for this paper. REFERENCES [1] PhishTank, available at: http://www.phishtank.com/ [2] N. Chou, R. Ledesma, Y. Teraguchi, and J. C. Mitchell, “Client-Side Defense against Web-Based Identity Theft", in Proceedings of the Network and Distributed System Security Symposium, (NDSS '04), February 2004. [3] S. Garera, N. Provos, M. Chew, and A. D. Rubin, “A Framework for Detection and Measurement of Phishing Attacks", in Proceedings of the 2007 ACM Workshop on Recurring Malcode (WORM '07), Nov. 2007, pp. 1–8. [4] FireFox, “Phishing Protection". Available at: http://www.mozilla.com/en-US/firefox/phishing-protection/ [5] Y. Pan and X. Ding, “Anomaly Based Web Phishing Page Detection", in Proceedings of the 22nd Annual Computer Security Applications Conference (ACSAC '06), December 2006, pp. 381–392. [6] Y. Zhang, J. I. Hong, and L. F. Cranor, “Cantina: A Content-Based Approach to Detecting Phishing Web Sites”, in Proceedings of 16th International World Wide Web Conference (WWW '07), May 2007, pp. 639–648. [7] Paul Robichaux, Devin L. Ganger, “Gone Phishing: Evaluating Anti- Phishing Tools for Windows", September 2006 [8] D. Kevin McGrath, Minaxi Gupta, "Behind Phishing: An Examination of Phisher Modi Operandi", Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats, San Francisco, California, Article No. 4, 2008 [9] Bayesian Classification of Phishing : http://www.sonicwall.com/downloads/WP-ENG-025_Phishing- Bayesian-Classification.pdf [10] Google Page Rank Information: http://abhinavsingh.com/blog/2009/04/getting-google-page-rank-using- javascript-for-adobe-air-apps/ [11] Introduction to Phishing: http://en.wikipedia.org/wiki/Phishing [12] Who-Is Information: http://vitzo.com/en/whois [13] Traffic Information : http://www.alexa.com/siteinfo [14] Reverse Domain Lookup: http://my-addr.com/reverse-lookup-domain- hostname/free-reverse-ip-lookup-service/reverse_lookup.php [15] Anti-Phishing Information: http://www.antiphishing.org/