Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Use of hog descriptors in phishing detection

438 views

Published on

In this paper we are diving into the details of an anti phishing detection system which employs HOG features.

* The presentation is built with voice recording

Published in: Science
  • Be the first to comment

  • Be the first to like this

Use of hog descriptors in phishing detection

  1. 1. Use of HOG Descriptors in Phishing Detection Ahmet Selman Bozkir, Ebru Akcapinar Sezer Hacettepe University Computer Engineering Department, TURKEY ISDFS 2016
  2. 2. Topics • What is phishing? • Facts and the rise of phishing attacks • Existing approaches • Why vision based scheme? • HOG descriptors • Demonstration of developed method • Experiments and Results • Conclusion
  3. 3. What is phishing? • Phishing is a scamming activity which deals with making a visual illusion on computer users by providing fake web pages which mimic their legitimate targets in order to steal valuable digital data such as credit card information or e-mail passwords. Phone phreaking + fishing -> «phishing»
  4. 4. Facts and figures * Source: PhishLabs 2016 Phishing Trends & Intelligence Report
  5. 5. Facts and figures • In 2012-2013, 37.3 millions users were affected by phishing attacks* 37.3M * Source: 2013 Verizon Data Breach Investigation Report
  6. 6. Facts and figures • 1 million confirmed malicious phishing sites on over 130,000 unique domains. (as of 2013) * Source: PhishLabs 2016 Phishing Trends & Intelligence Report
  7. 7. Facts and figures Average life time of phishing pages is 32 hours • Risk of zero-day attacks getting higher due to not being discovered by blacklists 32h * Source: APWG, Phishing activity trends paper. [Online]. Available at http://www/antiphishing.org/resources/apwg-papers/
  8. 8. Facts and figures Consumer-oriented phishing attacks targeted • financial institutions • cloud storage/file hosting sites • webmail and online services • ecommerce sites • payment services. 90% * Source: PhishLabs 2016 Phishing Trends & Intelligence Report
  9. 9. Facts and figures • financial institutions • payment services. * Source: PhishLabs 2016 Phishing Trends & Intelligence Report • cloud storage/file hosting sites
  10. 10. Existing Anti-Phishing Approaches Content & Blacklist CANTINA [1] SpoofGuard[2] NetCraft [3] DOM based Medvet et al.[4] Zhang et al. [5] Fu et al. [6] Vision based Maurer et al.[7] Verilog [8] Other Chen et al.[9]
  11. 11. Why vision based scheme? • Substition of textual HTML elements with <IMG> or applet like contents • Zero day attacks need pro-active solutions • Dynamic / AJAX type content loading • Different DOM organizations between legitimate and fake web pages • More robust to complex backgrounds or page layouts • And the most important is vision based solutions are in concordance with human perception * Source: PhishLabs 2016 Phishing Trends & Intelligence Report
  12. 12. Methodology: HOG Features and Descriptors • Histogram of Oriented Gradients • Dalal & Triggs-2005 • A good way to characterize and capture local object appearance or shapes by utilizing distribution of intensity gradients or edge directions. • Preffered because of: (i) HOG descriptors are able to capture visual cues of overall page layout; (ii) they are able to provide a certain degree of rotation and translation invariance.
  13. 13. Developed approach in details 𝑆𝑖𝑚 𝐻 𝑀 , 𝐻 𝑁 = 𝑖=1 𝑇 mi n( 𝐻 𝑀 (𝑖), 𝐻 𝑁 (𝑖)
  14. 14. Experiments • For the first phishing web page dataset, 50 unique phishing pages reported from Phishtank covering the days between 14 December 2015 and 5 January 2016 were collected. • For the legitimate web page pairs, we have collected 18 legitimate home pages from Alexa top 500 web site directory. Afterwards, we have shuffled the page URLs in order to obtain 100 distinct legitimate home page pairs. • 64 pixel wide and 128 pixel wide cells were employed
  15. 15. Results - 1 Statistics Similarity of Pairs of Phishing Pages (50 pages) HOG-64 px cells HOG-128 px cells min 51.873 % 49.910 % max 98.861 % 98.390 % mean 78.868 % 78.637 % standard deviation 12.147 % 10.963 % STATISTICS OF PHISHING AND THEIR TARGET PAGE PAIRS IN HOG-64 AND HOG-128 Statistics Similarity of Pairs of Legitimate Pages (100 unique pairs) HOG-64 px cells HOG-128 px cells min 38.420 % 45.683 % max 74.459 % 77.092 % mean 60.739 % 66.012 % standard deviation 11.026 % 9.492 % STATISTICS OF UNIQUE LEGITIMATE PAGE PAIRS IN HOG-64 AND HOG-128
  16. 16. Results - 2 Similarity scores of unique legitimate page pairs
  17. 17. Results - 3 Similarity scores of phishing pages and their legitimate targets
  18. 18. Discussion and Conclusion • This work is the first study that employs HOG in phishing detection • It performs a robust method for phishing detection as it is pure vision based and able to capture local visual cues on web page surface. • However we addressed some shortcomings. • Image contents in phishing web pages are generally different than the legitimate ones. So the image invariance must be supplied in order to achieve a better and robust phishing detection. • The method must be also verified with a more comprehensive dataset.
  19. 19. References 1. Y. Zhang, J. Hong, L. Cranor, CANTINA: A Content-Based Approach to Detecting Phishing Web Sites, WWW 2007 2. Chou, N., R. Ledesma, Y. Teraguchi, D. Boneh, and J.C. Mitchell. Client-Side Defense against Web-Based Identity Theft. In Proceedings of The 11th Annual Network and Distributed System Security Symposium (NDSS '04). 3. Netcraft, Netcraft Anti-Phishing Toolbar. Visited: April 20, 2016. http://toolbar.netcraft.com/ 4. E. Medvet, E. Kirda and C. Krueger, Visual-Similarity-Based Phishing Detection, Securecomm ’08 International Conference on Security and Privacy in Communication Networks, 2008 5. W. Zhang, H. Lu, B. Xu and H. Yang, Web Phishing Detection Based on Page Spatial Layout Similarity, Informatica, vol. 37, pp. 231-244, 2013. 6. A.Y. Fu, L. Wenyin and X. Deng, Detecting Phishing Web Pages with Visual Similarity Assesment based Earth Mover’s Distance (EMD), IEEE Transactions on Dependable and Secure Computing, pp. 301-311, 2006. 7. M.E. Maurer and D. Herzner, Using visual website similarity for phishing detection and reporting, In CHI’12 Extended Abstacts on Human Factors in Computing Systems, 2012. 8. G. Wang, H. Liu, S. Becerra, K. Wang, Verilog: Proactive Phishing Detection via Logo Recognition, Technical Report CS2011-0669, UC San Diego, 2011. 9. T. Chen, S. Dick, J. Miller, Detecting Visually Similar Web Pages: Application to Phishing Detection, ACM Transactions on Internet and Technology, 10(2), 2010

×