A Method for Automated D etection of P hishing
Websites: Through B oth S ite Characteristics and
Image Analysis



Joshua S. White
Jeanna N. Matthews, PhD
Outline
 • Problem
 • Method
    – Image Analysis (in detail)
 • Method Verification
 • Results
 • Conclusion
 • References
P roblem
 • Phishing site detection
   – A largely manual process
      • Requires human visual review of site to
        eliminate false positives / negatives
   – URL's comes from actual phishing attempts
      • Email, and other user report URL's
   – Analysis is responsive, not proactive
Method (Overview)
Method
 • For rapid proof of concept
   – Data collected using the 140Dev php script
     and MySQL schema




 • Page characteristics collected using PHP for
   DOM object parsing
   – Links, Images, Forms, Iframes, Meta Tags
Image Analysis
 • Collected using headless web-browser
   – CutyCapt, XVFB-RUN
 • Hashing of resultant images
   – MD5Sum, SHA512, PHash
      • Final choice was PHash (Perceptual Hash)
         – Uses descrete cosign transformation
           » Reduces Sampling Frequency
 • Hamming Distance used to compare
  each hash value
Image Analysis
Image Analysis
• Process:
  – Reduce the size of the image 32 x 32
  – Reduce the color to greyscale
  – Calculate the DCT (creates frequency scalars)
  – Reduce the DCT to 8 x 8 pixels
  – Second DCT reduction, set bits to 1 or 0 depending on
    placement above or below average DCT
  – Take Hash
Method Verification
R esults
 • After our method was verified we concentrated
   on the top 5 most spoofed sites:




 • Some False Characteristic Matches:
Conclusion
 • Phishing URL posting on social media networks
   is a growing problem
 • We have developed a tool that quickly and
   effectively detects matches between legitimate
   and spoofed sites
 • Future work includes:
   – Integration of our characteristic mapping and
     image analysis technique into our social
     media analytics toolkit
Questions




            ?
R eferences
R eferences

Phishing spie 2012 presentation - jsw - d2

  • 1.
    A Method forAutomated D etection of P hishing Websites: Through B oth S ite Characteristics and Image Analysis Joshua S. White Jeanna N. Matthews, PhD
  • 2.
    Outline • Problem • Method – Image Analysis (in detail) • Method Verification • Results • Conclusion • References
  • 3.
    P roblem •Phishing site detection – A largely manual process • Requires human visual review of site to eliminate false positives / negatives – URL's comes from actual phishing attempts • Email, and other user report URL's – Analysis is responsive, not proactive
  • 4.
  • 5.
    Method • Forrapid proof of concept – Data collected using the 140Dev php script and MySQL schema • Page characteristics collected using PHP for DOM object parsing – Links, Images, Forms, Iframes, Meta Tags
  • 6.
    Image Analysis •Collected using headless web-browser – CutyCapt, XVFB-RUN • Hashing of resultant images – MD5Sum, SHA512, PHash • Final choice was PHash (Perceptual Hash) – Uses descrete cosign transformation » Reduces Sampling Frequency • Hamming Distance used to compare each hash value
  • 7.
  • 8.
    Image Analysis • Process: – Reduce the size of the image 32 x 32 – Reduce the color to greyscale – Calculate the DCT (creates frequency scalars) – Reduce the DCT to 8 x 8 pixels – Second DCT reduction, set bits to 1 or 0 depending on placement above or below average DCT – Take Hash
  • 9.
  • 10.
    R esults •After our method was verified we concentrated on the top 5 most spoofed sites: • Some False Characteristic Matches:
  • 11.
    Conclusion • PhishingURL posting on social media networks is a growing problem • We have developed a tool that quickly and effectively detects matches between legitimate and spoofed sites • Future work includes: – Integration of our characteristic mapping and image analysis technique into our social media analytics toolkit
  • 12.
  • 13.
  • 14.