2. What is Phishing?
An engineering attack
An attempt to trick individuals into revealing personal
credentials (uname, passwd, credit card info, etc)
Based on faked email and websites
A threat for the internet users
Damages
- 73 million US adults
received more than 50
phishing emails a year
- $2.8 billion loss a year
3. Phishing Methods
Establish websites having similar interface/URL
as famous websites
Establish cheating websites to get users’
personal information
Establish transparent website between original
websites and users
Send emails containing malicious URL
Send emails containing embed malicious
flash/picture files to avoid text checking of anti-
phishing
4. False positive/negative rate of
Anti-Phishing Approaches
False negative rate: the rate of phishing websites being
regarded as good in all phishing websites
False positive rate: the rate of good websites being
regarded as phishing in all good websites
So, the lower false rates are, the better Anti-Phishing
approach is
goodphish
phish
goodgood
good
pf
goodphish
phish
goodgood
good
pf
goodphish
phish
goodgood
good
pf
phishgood
good
phishphish
phish
nf
phishgood
good
phishphish
phish
nf
5. Anti-Phishing Approaches
for Specific Websites
Typically, designed by website companies
An example is Sitekey mechanism of
BankOfAmerica online
Pro: False negative rate is low
False positive rate can be zero
Con: Not applicable for phishing emails
6. Anti-Phishing Approaches
Based on Database
Anti-phishing Firewall : Kaspersky
Anti-phishing Toolbar : Netcraft
All based on on-line database
Toolbar can provide URL statistics data in advance
Pro: Applicable for both websites and emails
False negative rate can be low
False positive rate is low
Con: Need frequent updates
Relatively hard to implement
False negative rate increases if not up-to-date
7. Anti-Phishing Approaches
Based on Content
PILFER: email phishing detection based on machine-learning combining 10
filters:
- IP based URL: 192.168.0.1/paypal.cgi?fix=account
- Domain age from whois.net
- Non-matching URL: <a href=“phishingsite.com"> paypal.com</a>
- HTML email : hidden URLs
- Malicious JavaScript
- <More>…
Pro: Practically, false positive and negative rate are relative low
Machine learning methods make it possible to improve accuracy
No constant update is needed
Con: Still need updates on training data and filters to adapt new styles of
phishing emails
Network cost is a problem
8. Anti-Phishing Approaches
Based on Content (cont.)
CANTINA: phishing website detection based on TF-IDF weight
- TF: the number of times a given term appears in a specific document
- IDF: a measure of the general importance of the term in all documents
- TF-IDF = TF/IDF, specifies term with frequency in a given document
- Search five top TF-IDF words of current web page in search engine such as
Google
- Current web page should be in top N (30) search results to be legitimate
CANTINA also uses filters similar to PILFER to decrease false positive
Pro: False positive and negative rate are very low
No constant update is needed
Search engine ranking is relative hard to cheat
Con: Network cost is a problem
Too many phishing website searches may affect phishing websites’
ranking
9. Summary of mentioned
Anti-Phishing Approaches
Anti-Phishing Approaches False Positive False Negative
Implement
Effort
Adaptation
Update
Cycle
For Specific Websites Zero Low Easy Specific Website None
Firewall Based on Database Low Medium Medium
General
Web/Email
Very Frequently
Toolbar Based on Database Low Low Hard
General
Web/Email
Very Frequently
PILFER Low Low Medium General Email Sometimes
CANTINA Very Low Low Medium
General
Websites
Few