Your SlideShare is downloading. ×
0
User Interfaces and Algorithms
for Fighting Phishing
Jason I. Hong
Carnegie Mellon University
Everyday Privacy and Security Problem
This entire process
known as phishing
Fast Facts on Phishing
• Estimated 3.5 million people have fallen for phishing
• Estimated to cost $1-2 billion a year (an...
Supporting Trust Decisions
• Goal: help people make better trust decisions
– Focus on anti-phishing
• Large multi-discipli...
Our Multi-Pronged Approach
• Human side
– Interviews to understand decision-making
– Embedded training
– Anti-phishing gam...
What do users know
about phishing?
Interview Study
• Interviewed 40 Internet users, included 35 non-experts
• “Mental models” interviews included email role ...
Little Knowledge of Phishing
• Only about half knew meaning of the term “phishing”
“Something to do with the band Phish, I...
Minimal Knowledge of Lock Icon
“I think that it means secured, it symbolizes
some kind of security, somehow.”
• 85% of par...
Little Attention Paid to URLs
• Only 55% of participants said they had ever noticed
an unexpected or strange-looking URL
•...
Some Knowledge of Scams
• 55% of participants reported being cautious when
email asks for sensitive financial info
– But v...
Naive Evaluation Strategies
• The most frequent strategies don’t help much in
identifying phish
– This email appears to be...
Other Findings
• Web security pop-ups are confusing
“Yeah, like the certificate has expired. I don’t actually
know what th...
Can we train people not to
fall for phishing?
Web Site Training Study
• Laboratory study of 28 non-expert computer users
• Two conditions, both asked to evaluate 20 web...
How Do We Get People Trained?
• Most people don’t proactively look for training
materials on the web
• Many companies send...
Embedded Training
• Can we “train” people during their normal use of
email to avoid phishing attacks?
– Periodically, peop...
Diagram Intervention
Diagram Intervention
Explains why they are
seeing this message
Diagram InterventionExplains how to identify
a phishing scam
Diagram Intervention
Explains what a
phishing scam is
Diagram InterventionExplains simple things
you can do to protect self
Comic Strip Intervention
Embedded Training Evaluation
• Lab study comparing our prototypes to standard
security notices
– EBay, PayPal notices
– Di...
Embedded Training Results
• Existing practice of security notices is ineffective
• Diagram intervention somewhat better
• ...
Next Steps
• Iterate on intervention design
– Have already created newer designs, ready for testing
• Understand why comic...
Anti-Phishing Phil
• A game to teach people not to fall for phish
– Embedded training focuses on email
– Game focuses on w...
Anti-Phishing Phil
Outline
• Human side
– Interviews to understand decision-making
– Embedded training
– Anti-phishing game
• Computer side
–...
How accurate are today’s
anti-phishing toolbars?
Some Users Rely on Toolbars
• Dozens of anti-phishing toolbars offered
– Built into security software suites
– Offered by ...
Some Users Rely on Toolbars
• Dozens of anti-phishing toolbars offered
– Built into security software suites
– Offered by ...
Testing the Toolbars
• April 2006: Manual evaluation of 5 toolbars
– Required lots of undergraduate labor over 2-week peri...
Testbed for Anti-Phishing Toolbars
• Manual evaluation was tedious, slow, error-prone
• Created a testbed that could semi-...
Testbed System Architecture
Finding Fresh Phish for Test
• Need a source with lots of fresh phishing URLs
– Can’t use toolbar black lists if we are te...
November 2006 evaluation
• Tested 10 toolbars
– Microsoft Internet Explorer v7.0.5700.6
– Netscape Navigator v8.1.2
– Eart...
November 2006 Evaluation
• Test URLs
– 100 manually confirmed fresh phish from phishtank.com
(reported within 6 hours)
• D...
Results
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0 1 2 12 24
Time (hours)
Phishingsitescorrectlyidentified
SpoofGuard
E...
Results
• Only toolbar >90% accuracy has high false positive rate
• Several catch 70-85% of phish with few false positives...
Our Anti-Phishing Toolbar
Robust Hyperlinks
• Developed by Phelps and Wilensky to solve
“404 not found” problem
• Key idea was to add a lexical sign...
Adapting TF-IDF for Anti-Phishing
• Can same basic approach be used for anti-phishing?
– Scammers often directly copy web ...
Adapting TF-IDF for Anti-Phishing
• Rough algorithm
– Given a web page, calculate TF-IDF for each word on page
– Take five...
Evaluation #1
• 100 phishing URLs fro PhishTank.com
• 100 legitimate URLs from 3Sharp’s study
94%
30%
67%
10%
94%
31%
97%
...
Discussion of Evaluation #1
• Very good results (97%), but false positives (10%)
• Added several heuristics to reduce fals...
Evaluation #2
• Compared to SpoofGuard and NetCraft
– SpoofGuard uses all heuristics
– NetCraft 1.7.0 uses heuristics (?) ...
Results of Evaluation #2
97%
6%
89%
1%
91%
48%
97%
0%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
true negative false posi...
Discussion
• Pretty good results for TF-IDF approach
– 97% with 6% false positive, 89% with 1% false positive
– False posi...
Summary
• Large multi-disciplinary team project at CMU looking
at trust decisions, currently anti-phishing
• Human side
– ...
Embedded Training Results
0
10
20
30
40
50
60
70
80
90
1003:Phish
5:Training
7:Real
8:Spam
11:Training
12:Spam
13:Real
14:...
Email Anti-Phishing Filter
• Philosophy: automate where possible, support
where necessary
• Goal: Create an email filter t...
Email Anti-Phishing Filter
• Heuristics combined in SVM
– IP addresses in links (http://128.23.34.45/blah)
– Age of linked...
Email Anti-Phishing Filter Evaluation
• Ham corpora from SpamAssassin (2002 and 2003)
– 6950 good emails
• Phishingcorpus
...
Email Anti-Phishing Filter Evaluation
Is it legitimate
Our label
Yes No
Yes True positive False positive
No False negative True negative
User Interfaces and Algorithms for Fighting Phishing, at Google Tech Talk Jan 2007
User Interfaces and Algorithms for Fighting Phishing, at Google Tech Talk Jan 2007
User Interfaces and Algorithms for Fighting Phishing, at Google Tech Talk Jan 2007
User Interfaces and Algorithms for Fighting Phishing, at Google Tech Talk Jan 2007
User Interfaces and Algorithms for Fighting Phishing, at Google Tech Talk Jan 2007
User Interfaces and Algorithms for Fighting Phishing, at Google Tech Talk Jan 2007
Upcoming SlideShare
Loading in...5
×

User Interfaces and Algorithms for Fighting Phishing, at Google Tech Talk Jan 2007

200

Published on

Discusses some of our group's work at Carnegie Mellon University on developing user interfaces and algorithms to combat phishing attacks.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
200
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • 2-3.5 million http://www.gartner.com/it/page.jsp?id=498245
  • Email #16 was from CardMember Services with the subject "Your Online Statement Is Now Available" Email #17 was from [email_address] with the subject "Reactivate your PayPal Account"
  • Transcript of "User Interfaces and Algorithms for Fighting Phishing, at Google Tech Talk Jan 2007"

    1. 1. User Interfaces and Algorithms for Fighting Phishing Jason I. Hong Carnegie Mellon University
    2. 2. Everyday Privacy and Security Problem
    3. 3. This entire process known as phishing
    4. 4. Fast Facts on Phishing • Estimated 3.5 million people have fallen for phishing • Estimated to cost $1-2 billion a year (and growing) • 9255 unique phishing sites reported in June 2006 • Easier (and safer) to phish than rob a bank
    5. 5. Supporting Trust Decisions • Goal: help people make better trust decisions – Focus on anti-phishing • Large multi-disciplinary team project at CMU – Supported by NSF, ARO, CMU CyLab – Six faculty, five PhD students, undergrads, staff – Computer science, human-computer interaction, public policy, social and decision sciences, CERT
    6. 6. Our Multi-Pronged Approach • Human side – Interviews to understand decision-making – Embedded training – Anti-phishing game • Computer side – Email anti-phishing filter – Automated testbed for anti-phishing toolbars – Our anti-phishing toolbar Automate where possible, support where necessary
    7. 7. What do users know about phishing?
    8. 8. Interview Study • Interviewed 40 Internet users, included 35 non-experts • “Mental models” interviews included email role play and open ended questions • Interviews recorded and coded J. Downs, M. Holbrook, and L. Cranor. Decision Strategies and Susceptibility to Phishing. In Proceedings of the 2006 Symposium On Usable Privacy and Security, 12-14 July 2006, Pittsburgh, PA.
    9. 9. Little Knowledge of Phishing • Only about half knew meaning of the term “phishing” “Something to do with the band Phish, I take it.”
    10. 10. Minimal Knowledge of Lock Icon “I think that it means secured, it symbolizes some kind of security, somehow.” • 85% of participants were aware of lock icon • Only 40% of those knew that it was supposed to be in the browser chrome • Only 35% had noticed https, and many of those did not know what it meant
    11. 11. Little Attention Paid to URLs • Only 55% of participants said they had ever noticed an unexpected or strange-looking URL • Most did not consider them to be suspicious
    12. 12. Some Knowledge of Scams • 55% of participants reported being cautious when email asks for sensitive financial info – But very few reported being suspicious of email asking for passwords • Knowledge of financial phish reduced likelihood of falling for these scams – But did not transfer to other scams, such as amazon.com password phish
    13. 13. Naive Evaluation Strategies • The most frequent strategies don’t help much in identifying phish – This email appears to be for me – It’s normal to hear from companies you do business with – Reputable companies will send emails “I will probably give them the information that they asked for. And I would assume that I had already given them that information at some point so I will feel comfortable giving it to them again.”
    14. 14. Other Findings • Web security pop-ups are confusing “Yeah, like the certificate has expired. I don’t actually know what that means.” • Don’t know what encryption means • Summary – People generally not good at identifying scams they haven’t specifically seen before – People don’t use good strategies to protect themselves
    15. 15. Can we train people not to fall for phishing?
    16. 16. Web Site Training Study • Laboratory study of 28 non-expert computer users • Two conditions, both asked to evaluate 20 web sites – Control group evaluated 10 web sites, took 15 minute break to read email or play solitaire, evaluated 10 more web sites – Experimental group same as above, but spent 15 minute break reading web-based training materials • Experimental group performed significantly better identifying phish after training – Less reliance on “professional-looking” designs – Looking at and understanding URLs – Web site asks for too much information People can learn from web-based training materials, if only we could get them to read them!
    17. 17. How Do We Get People Trained? • Most people don’t proactively look for training materials on the web • Many companies send “security notice” emails to their employees and/or customers • But these tend to be ignored – Too much to read – People don’t consider them relevant – People think they already know how to protect themselves
    18. 18. Embedded Training • Can we “train” people during their normal use of email to avoid phishing attacks? – Periodically, people get sent a training email – Training email looks like a phishing attack – If person falls for it, intervention warns and highlights what cues to look for in succinct and engaging format P. Kumaraguru, Y. Rhee, A. Acquisti, L. Cranor, J. Hong, and E. Nunge. Protecting People from Phishing: The Design and Evaluation of an Embedded Training Email System. CyLab Technical Report. CMU-CyLab-06-017, 2006. http://www.cylab.cmu.edu/default.aspx?id=2253 [to be presented at CHI 2007]
    19. 19. Diagram Intervention
    20. 20. Diagram Intervention Explains why they are seeing this message
    21. 21. Diagram InterventionExplains how to identify a phishing scam
    22. 22. Diagram Intervention Explains what a phishing scam is
    23. 23. Diagram InterventionExplains simple things you can do to protect self
    24. 24. Comic Strip Intervention
    25. 25. Embedded Training Evaluation • Lab study comparing our prototypes to standard security notices – EBay, PayPal notices – Diagram that explains phishing – Comic strip that tells a story • 10 participants in each condition (30 total) • Roughly, go through 19 emails, 4 phishing attacks scattered throughout, 2 training emails too – Emails are in context of working in an office
    26. 26. Embedded Training Results • Existing practice of security notices is ineffective • Diagram intervention somewhat better • Comic strip intervention worked best – Statistically significant • Pilot study showed interventions most effective when based on real brands
    27. 27. Next Steps • Iterate on intervention design – Have already created newer designs, ready for testing • Understand why comic strip worked better – Story? Comic format? • Preparing for larger scale deployment – Include more people – Evaluate retention over time – Deploy outside lab conditions if possible • Real world deployment and evaluation – Need corporate partners to let us spoof their brand
    28. 28. Anti-Phishing Phil • A game to teach people not to fall for phish – Embedded training focuses on email – Game focuses on web browser, URLs • Goals – How to parse URLs – Where to look for URLs – Use search engines instead • Available on our web site soon
    29. 29. Anti-Phishing Phil
    30. 30. Outline • Human side – Interviews to understand decision-making – Embedded training – Anti-phishing game • Computer side – Email anti-phishing filter – Automated testbed for anti-phishing toolbars – Our anti-phishing toolbar
    31. 31. How accurate are today’s anti-phishing toolbars?
    32. 32. Some Users Rely on Toolbars • Dozens of anti-phishing toolbars offered – Built into security software suites – Offered by ISPs – Free downloads – Built into latest version of popular web browsers
    33. 33. Some Users Rely on Toolbars • Dozens of anti-phishing toolbars offered – Built into security software suites – Offered by ISPs – Free downloads – Built into latest version of popular web browsers • Previous studies demonstrated usability problems that need further work • But how well do they detect phish?
    34. 34. Testing the Toolbars • April 2006: Manual evaluation of 5 toolbars – Required lots of undergraduate labor over 2-week period • Summer 2006: Created a semi-automated test bed • September 2006: Automated evaluation of 5 toolbars – Used APWG feed as source of phishing URLs • November 2006: Automated evaluation of 10 toolbars – Used phishtank.com as source of phishing URLs – Evaluated 100 phish and 510 legit sites in just 2 days L. Cranor, S. Egelman, J. Hong and Y. Zhang. Phinding Phish: An Evaluation of Anti-Phishing Toolbars. CyLab Technical Report. CMU-CyLab-06-018, 2006. http://www.cylab.cmu.edu/default.aspx?id=2255 [to be presented at NDSS]
    35. 35. Testbed for Anti-Phishing Toolbars • Manual evaluation was tedious, slow, error-prone • Created a testbed that could semi-automatically evaluate these toolbars – Just give it a set of URLs to check (labeled as phish or not) – Checks all the toolbars, aggregates statistics • How to automate this for different toolbars? – Different APIs (if any), different browsers – Image-based approach, take screenshots of web browser and compare relevant portions to known states
    36. 36. Testbed System Architecture
    37. 37. Finding Fresh Phish for Test • Need a source with lots of fresh phishing URLs – Can’t use toolbar black lists if we are testing their tools – Sites get taken down within a few days, need phish less than one day old • To observe how fast black lists get updated, the fresher the better • Experimented with several sources – APWG - high volume, but many duplicates and legitimate URLs included – Phishtank.com - lower volume but easier to extract phish – Other phish archives - often low volume or not fresh enough • Choice of feed impacts results
    38. 38. November 2006 evaluation • Tested 10 toolbars – Microsoft Internet Explorer v7.0.5700.6 – Netscape Navigator v8.1.2 – EarthLink v3.3.44.0 – eBay v 2.3.2.0 – McAfee SiteAdvisor v1.7.0.53 – NetCraft v1.7.0 – TrustWatch v3.0.4.0.1.2 – SpoofGuard – Cloudmark v1.0. – Google Toolbar v2.1 (Firefox) • Most use blacklists and simple heuristics – SpoofGuard only one to rely solely on heuristics
    39. 39. November 2006 Evaluation • Test URLs – 100 manually confirmed fresh phish from phishtank.com (reported within 6 hours) • Did not use the fully confirmed ones – 60 legitimate sites linked to by phishing messages – 510 legitimate sites tested by 3Sharp in Sept 2006 report
    40. 40. Results 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0 1 2 12 24 Time (hours) Phishingsitescorrectlyidentified SpoofGuard EarthLink Netcraft Google IE7 Cloudmark TrustWatch eBay Netscape McAfee 38% false positives 1% false positives
    41. 41. Results • Only toolbar >90% accuracy has high false positive rate • Several catch 70-85% of phish with few false positives – After 15 minutes of training, users seem to do as well • Few improvements in catch rates seen over 24 hours – Suggests most toolbars not taking advantage of available phish feeds to quickly update black lists • Combination of heuristics and frequently updated black list (and white list?) seems to be most promising approach • Plan to periodically repeat study every quarter • Should only consider this a rough ordering – Different sources of phishing URLs lead to different results
    42. 42. Our Anti-Phishing Toolbar
    43. 43. Robust Hyperlinks • Developed by Phelps and Wilensky to solve “404 not found” problem • Key idea was to add a lexical signature to URLs that could be fed to a search engine if URL failed – Ex. http://abc.com/page.html?sig=“word1+word2+...+word5” • How to generate signature? – Found that TF-IDF was fairly effective • Informal evaluation found five words was sufficient for most web pages
    44. 44. Adapting TF-IDF for Anti-Phishing • Can same basic approach be used for anti-phishing? – Scammers often directly copy web pages – With Google search engine, fake should have low page rank Fake Real
    45. 45. Adapting TF-IDF for Anti-Phishing • Rough algorithm – Given a web page, calculate TF-IDF for each word on page – Take five terms with highest TF-IDF weights – Feed these terms into a search engine (Google) – If domain name of current web page is in top N search results, consider it legitimate (N=30 worked well)
    46. 46. Evaluation #1 • 100 phishing URLs fro PhishTank.com • 100 legitimate URLs from 3Sharp’s study 94% 30% 67% 10% 94% 31% 97% 10% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% true negative false positive Basic-TF-IDF Basic-TF-IDF+domain Basic-TF-IDF+ZMP Basic-TF-IDF+domain+ZMP
    47. 47. Discussion of Evaluation #1 • Very good results (97%), but false positives (10%) • Added several heuristics to reduce false positives – Many of these heuristics used by other toolbars – Age of domain – Known images – Suspicious URLs (has @ or -) – Suspicious links (see above) – IP Address in URL – Dots in URL (>= 5 dots) – Page contains text entry field – TF-IDF • Used simple forward linear model to weight these
    48. 48. Evaluation #2 • Compared to SpoofGuard and NetCraft – SpoofGuard uses all heuristics – NetCraft 1.7.0 uses heuristics (?) and extensive blacklist • 100 phishing URLs from PhishTank.com • 100 legitimate URLs – Sites often attacked (citibank, paypal) – Top pages from Alexa (most popular sites) – Random web pages from random.yahoo.com
    49. 49. Results of Evaluation #2 97% 6% 89% 1% 91% 48% 97% 0% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% true negative false positive Final-TF-IDF Final-TF-IDF + Heuristics SpoofGuard Netcraft
    50. 50. Discussion • Pretty good results for TF-IDF approach – 97% with 6% false positive, 89% with 1% false positive – False positives due to JavaScript phishing sites • Limitations – Does not work well for non-English web sites (TF-IDF) – System performance (querying Google each time) • Attacks by criminals – Using images instead of words – Invisible text – Circumventing TF-IDF and PageRank (hard in practice?)
    51. 51. Summary • Large multi-disciplinary team project at CMU looking at trust decisions, currently anti-phishing • Human side – Interviews to understand decision-making – Embedded training – Anti-phishing game • Computer side – Automated testbed for anti-phishing toolbars – Our anti-phishing toolbar
    52. 52. Embedded Training Results 0 10 20 30 40 50 60 70 80 90 1003:Phish 5:Training 7:Real 8:Spam 11:Training 12:Spam 13:Real 14:Phish 16:Phish 17:Phish Emails which had links in them Percentageofuserswhoclicked onalink Group A Group B Group C
    53. 53. Email Anti-Phishing Filter • Philosophy: automate where possible, support where necessary • Goal: Create an email filter that detects phishing emails – Well explored area for spam – Can we do better for phishing?
    54. 54. Email Anti-Phishing Filter • Heuristics combined in SVM – IP addresses in links (http://128.23.34.45/blah) – Age of linked-to domains (younger domains likely phishing) – Non-matching URLs (ex. most links point to PayPal) – “Click here to restore your account” – HTML email – Number of links – Number of domain names in links – Number of dots in URLs (http://www.paypal.update.example.com/update.cgi) – JavaScript – SpamAssassin rating
    55. 55. Email Anti-Phishing Filter Evaluation • Ham corpora from SpamAssassin (2002 and 2003) – 6950 good emails • Phishingcorpus – 860 phishing emails
    56. 56. Email Anti-Phishing Filter Evaluation
    57. 57. Is it legitimate Our label Yes No Yes True positive False positive No False negative True negative
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×