Report - Final_New_phishila

Countermeasures against Phishing sites
Phishilla – An anti-phishing extension for Mozilla firefox
Nagarajan Kuppuswami
Department of Computer Science
Virginia Tech
Blacksburg, VA
nagara7@vt.edu
Venkatasubramaniam Ganesan
Virginia Tech
Blacksburg, VA
venkatg@vt.edu
Ashwin Palani
Virginia Tech
Blacksburg, VA
ashwinp7@vt.edu
Abstract
Phishing has been in prominence since 1987 and it has done
considerable damage to the internet user community. The
level of expertise of the adversaries in attacking sites has
increased along with the advancements in the security. The
attacks could be either from a malicious website or through
emails. There is an urgent need to combat such attacks as the
losses caused by them have been growing exponentially. The
first part of our project addresses the existing
countermeasures that are in place in various anti-phishing
tools, advantages of using them and possible disadvantages
that fraudsters could exploit. The second section explains the
features of the extension proposed by us, its working and
advantages. We have also evaluated our scheme by assessing
its performance on certain phishing as well as legitimate sites.
We conclude by mentioning the enhancements and
improvements that could be added to the current scheme.
I. INTRODUCTION TO PHISHING AND EXTENSIONS
Phishing is defined as a criminally fraudulent process that
attempts to acquire sensitive information such as usernames,
passwords and financial details such as credit card information,
etc. They typically attack users by means of fake URLs, emails
and instant messaging. Modern day browsers have developed
capabilities to detect such fraudulent sites and there are
schemes that could probably be missed by the browsers.
Hence, extensions/plug-ins could serve effectively in providing
these additional functionalities.
II. EXISTING COUNTERMEASURES AND SCHEMES
A. Black Lists and White Lists Check Scheme
This scheme uses a database or list published by a trusted
party, where known phishing web sites are blacklisted. Tools
include Websense, McAfee’s anti–phishing filter, Netcraft anti-
phishing system, Cloudmark SafetyBar, Microsoft Phishing
Filter.
A similar white list is also maintained for sites that are
valid and legal. White lists usually contain sites that have been
the targets of phishing attacks.
Advantages: Simple to implement and would just involve a
look up of domain against the blacklist
Disadvantages: The weakness of this approach is its poor
scalability and its timeliness. Phishing sites mushroom
randomly and last for only a few days.
B. Server based schemes
a) Server Authentication: This is used to verify the
credentials presented by a web server. This is issued by a
trusted third party that can vouch for the bearer’s identity.
Generally, it displays logos, icons, seals of the brand in the
browser window. This scheme is used by anti-phishing toolbars
such as Content Verification Certificates, GeoTrust ToolBar,
and Trustbar. Trust Watch is a toolbar which authenticates
through a third party.
Advantages: This scheme is one of the robust ways of
checking the authenticity of the server. It reduces the chances
of raising false alarms of legitimate sites and false negatives.
Disadvantages: Because of the lack of global public key
infrastructure, users may tend to blindly trust or reject the
credentials presented by the web server.
b) Shared Secret Schemes: This scheme is currently used in
Dynamic Security Skins. It works by visually comparing client
images by users with the ones provided by the server.
Advantages: User takes the decision in detection of the
phishing site through recall.
Disadvantages: User should be aware and also have prior
knowledge of the intended domain.
C. Information Retrieval Based Schemes
a) Term Frequency Calculation: This scheme calculates the tf-
idf weights and finds the maximum frequency terms and
searches for these terms in a search engine like Google and
finds if the domain comes within the top ‘n’ results.
b) Support Vector Machines: Perform binary classification into
sites as phishing or non-phishing based on the information
obtained through identity information like DOM objects (A
HREF, IMG) etc.

Advantages: The major advantage of these schemes is their
strong mathematical foundations and applied probabilistic
values and learning based techniques.
Disadvantages: The scheme raises false alarms and requires
manual classification of initial training data and specification
of rules.
D. Page Ranking based Schemes
PageRank is a link analysis algorithm, named after Larry Page,
used by the Google Internet search engine that assigns a
numerical weighting to each element of a hyperlinked set of
documents, such as the World Wide Web, with the purpose of
"measuring" its relative importance within the set. The
algorithm may be applied to any collection of entities with
reciprocal quotations and references. The numerical weight that
it assigns to any given element E is also called the PageRank of
E and denoted by PR(E).
Page rank determines the popularity of a URL in the web. The
higher the Page Rank, the more important is the page. Phishing
web pages most often either have a very low page rank or their
page rank does not exist. Very few phishing pages manage to
increase their page rank, possibly by using link spamming
techniques.
Page Index is defined as the number of pages from a particular
website that Google has in its database. Phishing web pages
usually are accessible only for a short period and hence many
might not be found in the index.
Advantages: Page Rank and Page Index values are strong
features for identifying if a URL is non-phishing especially if
the crawl is from a reputed search engine like Google.
Disadvantages: Freshly created pages especially in new
domains would rank very low and hence could sometimes
result in a false negative.
E. DOM objects Retrieval Schemes
a) Keywords and Meta Tags: This scheme searches for domain
information in meta tags with “name = description” and whose
name or http–equiv is “copyright”. It also searches for the title
tag, retrieves information and matches it with domain. If no
match is found, suspicion weight for that site is increased.
b) Request URLS: DOM elements like <img> tags load
information from other URLs. Most of these URLs would be
within the same domain or these objects would be loaded from
an image server for this domain.
i) Phishing sites in order to arrive at the same look and feel of
the phished domain specifies the phished domains image server
in its <img src>
ii) Also phishing sites mostly maintain only a single or few
URLs similar to that of targeted site. Hence the <img src>
URLs would be different from that of the domain.
The above two points help us to formulate a heuristic where the
number of external domain references (also different image
servers ) are checked and if the number crosses a threshold ,
the site can be marked against a phishing site and a degree of
suspicion assigned to it.
c) AURL – Anchor URLs: This scheme detects a phish based
on the Anchor URLs present in the site. Following deductions
can be made based on AURL and other information regarding
URLs in a webpage.
i) As already mentioned in the previous paragraph, the
number of external anchors in an illegitimate webpage would
be high. This property can be used to mark a page as a
phishing page.
ii) The hyperlink provides DNS domain names in the anchor
text, but the destination DNS name in the visible link doesn’t
match that in the actual link.
iii) Dotted decimals used in the URI could be calculated to
detect malicious websites.
d) Form Tags: Legitimate websites usually have the form’s
action set to a valid URL mostly within the domain.
Illegitimate sites have this action tag containing URL in
different domains from that of the page or sometimes null.
e) Body Tags: Some websites provide a description of
themselves in the body portion and this can be used to identify
a phishing site
f) SSL Certificates: Distinguished names of phished sites in
the certificate vary with that of claimed identity. This check
can be employed to detect any forgeries in the website.
Advantages: These schemes are fast and easy to detect
phishing sites. They perform checks irrespective of any date
change/manipulation in the website.
Disadvantages: False negatives - Certain legitimate sites could
have lengthy URLS, large number of dots etc.
F. URL Check Schemes
One method to detect phishing sites is by observing the URL
of the page and examining characteristics such as its length,
presence of suspicious punctuations, etc. Below are some of
the checks performed on the URL to determine its validity
against phishing attacks.
a) URL check to see presence of other domain names: The
URL is checked against a valid list of white listed sites and
presence of any of the sites in the URL path but not in the host
name of the URL. This serves as an indication that the URL
checked is trying to use a valid site.

b) Length of the URL: Abnormal length of the URL could raise
suspicion and sites carrying long host names or large string of
words is checked for phishing.
c) Presence of suspicious special characters: Adversaries use
the character ‘@’ in the path as a ‘@’ symbol in a URL causes
the string to the left to be disregarded, with the string on the
right treated as the actual URL for retrieving the page. This
combined with limited size of the browser address bar allows
the attacker to write URLs that appear valid within the address
bar but actually contains some malicious path after the @
symbol. The check for this symbol in the URL should help in
detecting phishing URLs.
d) Checking obfuscation of URLs: URLs can be obfuscated by
inserting hexadecimal characters instead of individual
characters. Attackers take advantage of the ignorance of certain
users regarding the structure of a URL. For example, the
symbol @ could be represented as %40 and the dots could be
replaced by the character ‘%2e’. IP addresses can also be
represented as hexadecimal characters and a suspected IP
address which is part of blacklisted sites can be hidden by these
characters and thus escapes the URL blacklist check. This
attack can be identified by maintaining a map containing the
hexadecimals and its possible conversions.
e) Suggestive word tokens in the URL: Phishing URLs aim to
extract confidential user information such as their usernames
and passwords in a particular domain. Hence, a check for
keywords such as login, sign-in, confirm, etc. in the path
suggests that the page looks for user information and the URL
is double checked for phishing. These tokens are extracted
from the blacklisted URL paths and it is found they occur more
frequently than other tokens.
f) Dots in URL: It is found that phishing sites use many dots in
their URL but legitimate sites do not. The given URL is
considered a phish if the number of dots in the URL exceeds
five.
These URL schemes are used in anti-phishing tools like
SpoofGuard, WebSpoof and SpoofStick.
Disadvantages: There are cases where some valid sites fail
some of the URL checks such as the length check.
G. WHOIS Lookup based Schemes
WHOIS is a query/response protocol which is widely used to
query databases to retrieve details of Internet resources such as
the domain name, IP address block and autonomous system
number. Primarily, it serves as an effective tool to search
domain information, registrar data, admin data and the name
servers used.
WHOIS lookup can be used in anti-phishing schemes to detect
information such as age of the domain and IP address
resolution.
Checking the age of the domain of the phishing site identifies
the validity of the phishing site. APWG states that the average
age of a phishing site is 4.5 days and there are also sites which
last only for few days. Hence, WHOIS lookup of the phishing
site can be performed to check the age of the domain. If the
page was registered longer than 12 months, it is considered
legitimate and if it is less, it could be checked stringently for
phishing. Some sites do not return data on a WHOIS lookup
and they could be considered a phish.
Providing the IP address to WHOIS database provides the
details of its domain and its registration details. This helps in
anti-phishing schemes where the path contains only the IP
address information.
Disadvantages: This check by contacting the WHOIS database
for each website fails in circumstances where the phishing site
is hosted on an existing valid server and when criminals
manage to break into the server. Here, the WHOIS lookup
yields a value that is valid and hence the check fails to find a
phish. The check also fails in cases where some businesses
outsource some of their web operations to contractors with
different domain names.
eBay Toolbar and SpoofGuard are tools using the WHOIS
lookup scheme.
H. Client side Defense schemes
These are anti-phishing schemes which require user to maintain
databases on objects that is generally present in a web page.
The user/browser stores information such as passwords,
images, etc. SpoofGuard uses the client side defense schemes
mentioned. Some of the schemes are briefed in the section
below.
a) Outgoing password check: By maintaining triplets (domain,
username, and password) for each domain in a database, a
phishing scheme can avoid the possibility of leaked
information. Every time a user enters a password into a
phishing site, the stored password which is hashed using
algorithms such as SHA-1 is compared and issues a warning to
the user that the same username-password combination is used
for a different domain.
This scheme is particularly helpful when the spoof site could
use an image of the word “password” instead of html text to
request user’s password. Since all the passwords are hashed
and stored in a database, this phishing site can be detected.
Disadvantages: It is practically not possible to include the
passwords for all domains and their usernames. This is also a
security risk where leakage of these stored passwords could
lead to greater impact.

b) History Check: Most of the above anti-phishing measures
are bound to raise alarms for legitimate sites. Hence, this is a
scheme that is employed to avoid any false alarms in phishing
schemes. It checks the user’s browser history and does not
issue any warnings to sites that are in the user’s history file.
Disadvantages: If the user inadvertently bypasses the initial
warning, this site will never be checked for phishing and might
considerable damage.
c) Domain Check: This scheme checks the history of the user’s
browser and checks if the domain of the current page closely
resembles any of the previously visited pages/domains. This
comparison is done by calculating the hamming distance. For
example, a site ‘wikifedia.org’ will raise a warning if the user
has previously visited the ‘wikipedia.org’.
This scheme is devised to prevent adversaries from hosting
sites that contains misspelled versions of popular sites.
Disadvantages: This check fails when there are legitimate sites
which have close resemblances in their domain names and this
raises false alarms.
d) Referring site check: The browser maintains a record of
referring pages. The referring pages are those links which is
followed by the user. Typical phishing attacks are from emails
and if the user reads an email from a phishing site, the referring
page is the email host. Use of IP addresses by phishing sites
can be tracked by doing a reverse DNS lookup. If the resolve
hostname is not listed in the referring sites listing, the site is
deemed as a phish.
e) Image-Domain associations: The scheme maintains a
database of images associated with each domain. The initial
static database is assembled by using crawler type tools and it
is augmented to an individual’s browser history. The database
maintained can contain either fixed database or hashed images.
The scheme helps in finding phishing sites, which might
contain images with different hash values than what is stored in
the database and this raises an alert.
Disadvantages: The possibility of storing images and its hash
values is infeasible and it is limited to the client-side
configuration and storage restrictions.
f) Profiling/Cache: The cache of a web site could be obtained
through means of Google - Cache. The last cache date could be
found. This information could be used in determining a
phishing site.
III. PHISHILLA FEATURES
A. Introduction to Phishilla
Phishilla is a Plug-in or extension for the web browser
Mozilla Firefox. It is embedded with the browser and runs in
the same memory context as the browser. It checks for any
malice in the site entered by the user and if found, provides a
popup box, warning the user from entering the site. Phishilla
uses features such as URL check, WHOIS lookup to retrieve
the site information, page rank, page index and a host of other
features which are described in detail in Section III.
A typical Firefox extension is packaged in a ZIP file or
bundle with a file extension .xpi. It follows a folder structure
and contains a XUL file that adds functionalities to the
browser. XUL is an XML grammar that provides user interface
widgets like buttons, menus, toolbars, trees, etc. These XUL
files contain references to JavaScript which provides additional
functions to the browser.
Phishilla uses the XPI file structure maintained for Firefox
extensions with an XUL file that calls a JavaScript function.
The JavaScript function performs the necessary checks on the
site entered by the user and returns a result based on a weighted
sum calculation. This weighted sum calculation is done for
each site based on the features mentioned in Section III.
Phishilla is a lightweight browser plug-in for Mozilla Firefox
and supports versions 1.through 3.5 and higher.
Figure 1 depicts the flow of working of the plugin.
The webpage is first checked in the maintained white list
and if found to be in the list, it extension proceeds to load the
page. Phishilla displays the location of the page in the status
bar of the browser and if the user doubts the location to be
somewhere else rather than the actual location that is being
displayed, he/she can click on the same and the browser shows
a popup warning whether to proceed.
The user takes a decision on proceeding and site is added to
the blacklist if the user accepts the warning. If not, Phishilla
proceeds to do the mandatory checks such as blacklist check,
DOM Objects check, Google Page rank check, Inbound links
check, Traffic information check, Page Index check, URL
information check and Domain Age check. A weighted sum is
calculated based on the outcome of all these checks and if the
weight is more than 35, Phishilla displays a warning to the user
and adds the site to the blacklist based on the user acceptance
of the warning. If the user proceeds to ignore the warning,
Phishilla asks for a confirmation on whether to add the site to
white list and proceeds to load the page on user confirmation.
Figure 1. Phishilla working design

B. Features in Phishilla
Phishilla has incorporated most of the above schemes in
section 1.A making use of client side checking.
Since no one method is good enough to detect a wide
variety of phishing sites we have used a combination of
schemes where a score is computed foe each web page which
is a weighted sum of the application of sets of heuristics.
a) White Lists: There is an initial list of trustworthy
/popular sites. The user can also manually add his/her list of
domains to the white list based upon his/her prior knowledge of
these sites. When a web page is entered in the browser’s
address bar and it loads, the domain of the web page is
compared with the domains present in the white list. In the
case of a match, no further checks are performed and the user
proceeds to the intended destination URL.
The sites which have been the targets for phishing sites
were collected from statistics provided by PhishTank.com and
APWG. They were added to the white list maintained by us.
b) Black Lists: We maintain a black list of 50 phishing sites
as training data. Each time a web page is loaded, a lookup of
the webpage domain is done against the domains in the
blacklist. If a match occurs, the user is warned right away that
the site is a phishing site and a message is popped out to the
user asking him not to proceed. At this juncture, the user is left
with either of these options:
i) The user is navigated away from the site on pressing the
cancel button.
ii) The user proceeds to the site on pressing the proceed ok
button
c) Location of Domain: There are specific countries which
rank high in the number of fraudulent sites. Also a user may
have prior knowledge/experience of an intended site and hence
be immediately aware if the location of the malicious site is
from an unlikely country. The location of the site is obtained
through a reverse lookup on a WHOIS database (my-
addr.com).
This location information is displayed to the user in the
status bar. This information is useful especially useful when the
user has prior knowledge of the intended site and it is a means
of providing cues to the user. The user, if suspicious of the
location of a particular site, can click on the status bar which
would popup a window requesting confirmation from the user
to proceed to add to the blacklist.
E.g. The Indian Railways site, “irctc.co.in” and the popular
bank site “barclays.co.uk” sites are unlikely to be hosted in
countries like China or Russia.
d) Age of a Website: The age of a website is queried by
doing a look up of the domain on a WHOIS database
(vitzo.com/en/whois) and the registration date of the domain is
obtained. Most phishing sites tend to have very little or no
history, hence the age of websites is a factor to take into
consideration when evaluating it for phishing.
If the age of the site is below a threshold a positive
determining factor is added for the site. Similarly if the age is
above an upper limit (i.e. the site has been there for a long
time), then a negative determining factor is given for the site.
e) Meta Information: The content property of the META,
whose name or http-equiv is “description is retrieved. If there
is no match between the information in the URL and
information in the META then a positive phishing determining
factor is added for the domain. Otherwise a negative
determining factor is attributed to the domain
f) Google Page Rank: Google assigns a page rank in a scale
of 0-10 for every webpage. Page rank defines ‘the importance'
of a webpage. The higher the rank the more important a web
page is. Here we retrieve the domain of the webpage and query
Google (at toolbarqueries.google.com) programmatically to
obtain the page rank of that domain.
Once the page rank is obtained a different weighing scale is
provided
If Pr = PageRank (Web Page)
And the respective weight of the webpage for the page rank
heuristic would be
If the Pr is
between 2
and 10
Factor is based on how much information the page rank
provides in classification of a website into phishing/non-
phishing. A page rank of 2-10 would ensure that the phishing
determinant value for this heuristic is negative (i.e. leans
towards this page not being a phish).
If the Pr is 0, 1 or -1 (page rank not found)
A page rank value of 0, 1 or not found would compute a
positive determinant value for this heuristic.
g) Google Page Index: The web page domain is looked up
in Google index database and if the page is indexed, a negative
determinant score is provided for the index heuristic. If no
index information is obtained on querying Google, a positive
determinant score is provided for the index heuristic
h) Inbound Links: The popularity of a page or to some
extent the level of dependence or trustworthiness is gauged by
the number of external references from other pages to this
page.
Thus the number of inbound links to a webpage domain is
programmatically queried and obtained through the Alexa
Database. The number of inbound links to a phishing site is
very minimal with maybe only a few instances of inbound
links. Hence this data can be applied effectively in determining
if a page could fall into the phishing category.
i) Traffic Information: The popularity of a webpage could
also be gauged through the number of visits to the page. The
traffic page rank of page is obtained programmatically by
querying the Alexa Database. The lower the value of rank more
W (Web Page) = (-1) * Pr * (factor)
W (Web Page) = 10 * factor

is the traffic flow to the site. This page rank is computed based
on the combination of number of user visits and number of
page views over a period of three months.
A threshold is set on the traffic rank and if the traffic rank
of the web page domain is lesser than the threshold, a positive
determinant value is set for the page rank heuristic. If the page
rank is above the threshold, a negative determinant value is set
for the page rank heuristic. This is also a very useful heuristic
in gauging a site to be phishing or not.
j) Anchor tags: Anchor tags are a very common way of
cheating unsuspecting users. Anchor tags have a href portion
not visible to the user and also a visible text portion for the user
to click upon. A fraudster could manipulate this in a way to
benefit him.
For example,
<a href=”http://www.phising.com/”> www.icicibank.com
</a> could mislead the user to think that the link leads to
icicibank.com
Hence we check if the visible text is an URL and if so if it
is the same as the actual text. If different a positive determinant
value is applied for this heuristic. Similarly href’s in anchor
tags go through all checks that have been applied to URLS like
checking for IP addresses, obfuscated URLs, length, number of
dots, special characters etc. If any of the heuristic passes, it
results in a positive determinant which is added to the total
determinant.
k) Server Form Handlers: Documents contain the form tag
whose action specifies the URL to pass control to in the event
of an action like pressing the submit button. Usually phishing
sites have only a limited or a single URL or domain which is
similar to the target website.
Hence a check is performed to confirm if the action
property of the form references an external location different
from that of the URL in the address bar. If the check succeeds
a positive determinant value is assigned to the form handler
heuristic.
l) Images and Other External Objects: Phishing sites, in
order to look akin to the target website so as to trick users into
believing it to be the original site use images from the image
server of the target (original site).
A compilation (white list) of popular sites especially those
of financial institutions is maintained in our project. There is
also a map of servers used by these websites. Hence when a
fraudulent site tries to use the images from any of these sites,
the img src information of this site is matched with the image
server URLs of the white listed sites and if there is a match, a
positive determinant weight is set for external object heuristic.
m) Length of URL: Adversaries try to manipulate sites by
including more information in the website URL. Utilizing the
limited number of strings the browser shows in the address bar,
fraudulent websites can be made to show legitimate. Hence, we
have included a heuristic check to find the length of URL and
increase the weight if the length exceeds 80 characters.
Studies have shown that phishing URLs have unusually
large length and hence a check on the URL length would help
determining fraudulent websites.
n) Dots in URL: We have included a check to determine
the number of dots in an URL and increase the weighted sum if
it is found that the number of dots is greater than XXX. It was
found that most of phishing sites have more than acceptable
number of dots in their URLs.
o) ‘@’ symbol check in URL: Phishing sites make use of
the characteristic feature if the @ symbol which allows the
browser to disregard the addresses to the left of it. So, anything
between the http:// and @ is not considered by the browser.
Adversaries utilize this disadvantage and manipulate
legitimate sites by inserting URLs between http and the '@'
symbol to trick the users. Our scheme checks for this @
symbol or its hexadecimal equivalent ‘%40’ in the URL and
increases the weighted sum if it is found.
p) Pattern matching: Most of the phishing sites try to
mimic target sites such as eBay, PayPal, Bank Of America
(Top 3 targets) etc.., The most common way through which an
user is mislead into going to such a phishing site is getting
enticed by the catch words eBay, PayPal etc.
Hence we apply a basic check where we check if the URL
has a domain other than the popular sites in the whitelist but
has a string which closely resembles that of these possible
target sites in the whitelist. In this scenario, we add a positive
determinant value for this heuristic.
IV. EVALUATION AND ANALYSIS OF PHISHILLA
Phishilla is a rule-based heuristic tool. It may at times cause
false positives (treat non-phishing site as phishing site) and
false negatives (i.e., treat phishing site as non-phishing site).
Phishilla was evaluated on a sample size of 64 URLs. 32 of
them were phishing URLs obtained from Phishtank.com and
other web resources. These URLS were also ascertained to be
phish URLS after checking them on browsers like Google
chrome and Mozilla Firefox phishing filters.
32 of these were URLS chosen at random through Yahoo
Random URL generator (http://random.yahoo.com/fast/ryl)
and also certain URLS of known people and known domains.
Two tables below show the evaluation results for phishing and
legitimate sites respectively.
No. PHISHING URLS RESULT

1 http://www.setuplogecount.co.uk/index.php Found
2 http://info.kuspuk.net/phpMyAdmin/config/ppusa/ Found
3 http://aimm.ye.ro/zboard/data/pesmm.html Found
4 http://grapelove.co.kr/_gabia/fs3_gongji/gtbplc/ibank_gtbplc_com.php Found
5
http://www.goodcreditahead.com/forum/bancoposta/index.php?
MfcISAPICommand=SignInFPP&UsingSSL=1&emai=&userid=
Found
6
http://singine4baylogisny8iaznwaz.nm.ru/by-Brownie-
wise_W032879327328929Qitem1QQDJSyyyd37sdcmbbyloginpag23za32wa32w2azZza3ews
az.html
Found
7 http://server.e-foto.lt/js/228411.paypal.com/webscr_cmd_login-run.php Found
8 http://www.olancompany.com/images/redirecting.html Found
9 http://forumoficial.hostrator.com/de.html Failed
10 http://nvbchannel.net/forum/paypal.htm Found
11 http://www.web-page.com.ar/win/133847.paypal.com/webscr_cmd_login-run.php Found
12 http://activex.emenace.com/us Failed
13 http://it-paypal.com/PayPal.It.html Found
14 http://publidisco.com/catalog/images/microsoft/index2.htm Found
15 http://muziekschoolallmusic.nl/vakanties/ib.html Failed
16 http://kbic.info/bbs/data/portal/server.pt/ Found
17 http://vonage.id1114555.online-webforms.com/ Found
18 http://motors-support.net Failed
19 http://ba03.pochta.ru/ehay.html Found
20 http://www.portaljenipapo.com/login.htm Failed
21 http://soullovebags.com/images/www.mybank.alliance-leicester.co.uk/index.html Found
22 http://www.stentend.com/de/ Found
23
http://signin.ebay.com.ws.ebayisapi.dll.ciczdztxtwsdyhsfpndr.virtualbattlespace2.com/frogstar/
down
Found
24 http://signin-ebay.adacorrigan.co.uk/ Found
25 http://www.centralfilms.net/locaciones/moore/scripts.php Found
26 http://e-mind.be/img/hp/base/b049/gdxow.php Found
27 http://www.parkdaeli.com/bbs/file/new.egg.com/logon.htm Found
28 http://www.stentend.com/de/ Found
29 http://www.candelaradio.fm/los15img/ Found
30 http://www.skype.com.ofi.uni.cc/?id=49126&lc=us Failed
31 http://fwqdeq.mail2k.ru/n.html Found
32 http://mobile-me.org Found
Table-1 Evaluation results for phishing sites
No. LEGITIMATE SITES RESULTS

1 http://sportsillustrated.cnn.com/basketball/ncaa/women/teams/youngstown/ Passed
2 http://www.zumbrolutheran.org/ Passed
3 http://www.socialcouch.com/interview-with-richard-binhammer-dell-social-media/ Passed
4 http://amazwi.blogspot.com/ Passed
5 http://cs.vt.edu/ Passed
6 http://www.alphabusinesscentre.com/ Passed
7 http://www.christchurchpompton.org/ Passed
8 http://www.wongbrothers.com/ Passed
9 http://www.nzembassy.com/ Passed
10 http://geogratis.cgdi.gc.ca/ Passed
11 http://www.prapa.com/ Passed
12 http://www.circuit8.org/ Passed
13 http://www.findstolenart.com/ Passed
14 http://www.maroc.net/ Passed
15 http://www.minorleagueballparks.com/neds_oh.html Passed
16 http://www.elth.pub.ro/ Passed
17 http://www.finleys.com/ Passed
18 http://www.spravi.8m.com/ Passed
19 http://www.paconcours.com/ Passed
20 http://www.rmadhavan.com/ Passed
21 http://us.com/ Passed
22 https://home.americanexpress.com/home/global_splash.html Passed
23 http://www.lamega.com/ Passed
24 http://www.everydaymaternity.com/ Passed
25 http://www.shop-cliftonparkcenter.com/ Passed
26 http://www.eqc.govt.nz/ Passed
27 http://www.yorkarchaeology.co.uk/ Failed
28 http://mikeshost.110mb.com/xy.php Failed
29 http://weather.mgnetwork.com/cgi-bin/weatherIMD3/weather.cgi?
user=TBO&forecast=zandh&pands=Miami%2C+FL
Failed
30 http://www.atifitnuts.com/ Failed
31 http://www.asgsherman.com/ Failed
32 http://www.ambache.co.uk/ Failed
Table-2 Evaluation results for legitimate sites
A. Evaluation measures
The following measures were adopted in evaluating
Phishilla:
a) Total Catch Rate: Number of phish URLs that were
correctly blocked or warned.

Number of correctly caught phish URLs = 28
Total number of phish URLs = 32
Percentage of correctly caught URLS = 28 / 32 * 100
= 87.5 %
b) False Negatives: Number of phish URLs that were
incorrectly allowed.
Number of incorrectly allowed phish URLs = 4
Total number of phish URLs = 32
Percentage of false negatives = 4 / 32 * 100
= 12.5 %
c) Allows: Number of good URLS that were correctly
allowed.
Number of correctly allowed good URLs = 26
Total number of good URLs = 32
Percentage of false positives = 26/32 * 100
= 81.25%
d) False Positives: Number of good URLS that were
incorrectly blocked.
Number of incorrectly blocked good URLs = 6
Total number of good URLs = 32
Percentage of false positives = 6/32 * 100
= 18.75%
B. Analysis of Phishilla
Through our evaluation we verified that Phishilla may
sometimes result in false positives for relatively unknown sites
but is unlikely to cause false negatives of major impact.
a) Analysis of False Positives: False positives are the
number of the good URLs that are incorrectly blocked
1) Phishilla reports false positives in case of good URLs
with abnormal URL lengths or a large number of dots as
opposed to standard conventions.
2) If a dotted decimal IP address is provided instead of a
name, if Phishilla were to report an error, it could sometimes
result in a false positive as sometimes this kind of domain
name maybe desirable. Hence in this case Phishilla only
reports a warning that an IP address is being used in the URL
and it could possibly be an illegitimate site.
3) False positives are also possible when the site is
relatively new or unknown site with very little or no inbound
links or traffic.
b) Analysis of False Negatives: False negatives are the
number of the phished URLs that are incorrectly allowed.
It is imperative that any good anti-phishing scheme or
tool reduces the number of false negatives and Phishilla
addresses this issue well. False negatives occur mostly only
when there is very little DOM element information that can
be compared against standard heuristics
c) Performance:
The performance of Phishilla is good since only JavaScript
is used and all operations are done on the client side.
V. ADVANTAGES OF PHISHILLA
Thus Phishilla is a browser plug-in which accomplishes the
task of detecting a phishing site by following a set of well
proven and established method.
It has the following advantages:
1) Lightweight
2) Follows combination of well-tested and successful
anti-phishing schemes.
3) Computes a weighted sum where heuristics are
assigned different values based upon their ability to
classify the malicious content in the website.
4) Excellent catch rate.
VI. CONCLUSION
In this paper we have discussed the set of existing counter
measures against phishing and the possible merits and flaws in
these schemes and the adoption of these schemes by existing
market place tools. We have identified that a single heuristic
or a single class of heuristics are not sufficient enough to
successfully determine a phishing site. Hence we have adopted
a scheme which combines several Phishing classification
schemes used across several tools and added weights for each
scheme depending on its effectiveness in classification i.e. the
detection accuracy.
Phishilla provides phishing alerts to the user in a non
intrusive manner without affecting the browser experience. It
follows a client-side approach where all the logic is executed
in client-side code. This makes Phishilla efficient and also
brings about only a minimal set of requirements.
While Phishilla has a good catch rate and detects a majority
of the phishing sites, possible avenues of enhancements in
Phishilla include incorporating features such as profiling,
checking of SSL certificates, image matching, etc. which
would need server side functionalities. The GUI could be
enhanced to provide more virtual cues to the user and possible
display of color codes. This would indicate the level of

determining whether a site is malicious or not. Similarly, the
users could be profiled when they mark a site as phishing and
weights could be provided to users based upon their previous
phish reporting history. Other learning Based Methods could
also be incorporated where the effectiveness of each heuristic
is monitored over time and the weights re-adjusted
accordingly. Phishilla could also be extended to track and
report phishing e-mails, the current plague on the internet
which leads users to the unsolicited phishing sites.
VII. ACKNOWLEDMENTS
This work was carried out at the Virginia Polytechnic and
State University. We thank Dr Jung Min Park for providing
the impetus for this paper.
REFERENCES
[1] PhishTank, available at: http://www.phishtank.com/
[2] N. Chou, R. Ledesma, Y. Teraguchi, and J. C. Mitchell, “Client-Side
Defense against Web-Based Identity Theft", in Proceedings of the
Network and Distributed System Security Symposium, (NDSS '04),
February 2004.
[3] S. Garera, N. Provos, M. Chew, and A. D. Rubin, “A Framework for
Detection and Measurement of Phishing Attacks", in Proceedings of the
2007 ACM Workshop on Recurring Malcode (WORM '07), Nov. 2007,
pp. 1–8.
[4] FireFox, “Phishing Protection". Available at:
http://www.mozilla.com/en-US/firefox/phishing-protection/
[5] Y. Pan and X. Ding, “Anomaly Based Web Phishing Page Detection", in
Proceedings of the 22nd Annual Computer Security Applications
Conference (ACSAC '06), December 2006, pp. 381–392.
[6] Y. Zhang, J. I. Hong, and L. F. Cranor, “Cantina: A Content-Based
Approach to Detecting Phishing Web Sites”, in Proceedings of 16th
International World Wide Web Conference (WWW '07), May 2007, pp.
639–648.
[7] Paul Robichaux, Devin L. Ganger, “Gone Phishing: Evaluating Anti-
Phishing Tools for Windows", September 2006
[8] D. Kevin McGrath, Minaxi Gupta, "Behind Phishing: An Examination
of Phisher Modi Operandi", Proceedings of the 1st Usenix Workshop on
Large-Scale Exploits and Emergent Threats, San Francisco, California,
Article No. 4, 2008
[9] Bayesian Classification of Phishing :
http://www.sonicwall.com/downloads/WP-ENG-025_Phishing-
Bayesian-Classification.pdf
[10] Google Page Rank Information:
http://abhinavsingh.com/blog/2009/04/getting-google-page-rank-using-
javascript-for-adobe-air-apps/
[11] Introduction to Phishing: http://en.wikipedia.org/wiki/Phishing
[12] Who-Is Information: http://vitzo.com/en/whois
[13] Traffic Information : http://www.alexa.com/siteinfo
[14] Reverse Domain Lookup: http://my-addr.com/reverse-lookup-domain-
hostname/free-reverse-ip-lookup-service/reverse_lookup.php
[15] Anti-Phishing Information: http://www.antiphishing.org/

Report - Final_New_phishila

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (6)

Similar to Report - Final_New_phishila

Similar to Report - Final_New_phishila (20)

Report - Final_New_phishila