PhishAri: Automatic Realtime Phishing Detection on Twitter


Published on

With the advent of online social media, phishers have started using social networks like Twitter, Facebook, Foursquare to spread phishing scams. Twitter is an immensely popular micro-blogging network where people post short messages of 140 characters called tweets. It has over 100 million active users who post about 200 million tweets everyday. Because of this vast information dissemination, phishers have started using Twitter as a medium to spread phishing. It is also difficult to detect phishing on Twitter unlike emails because of the quick spread of phishing links in the network, short size of the content, and use of URL obfuscation to shorten the URL to meet the requirement of 140 character tweet limit. Our technique, PhishAri, detects phishing on Twitter in realtime. We use Twitter specific features along with URL features to detect whether a tweet posted with a URL is phishing or not. Some of the Twitter specific features we use are tweet content and its characteristics like length, hashtags and mentions. Other Twitter features used are the characteristics of the Twitter user posting the tweet such as age of the account, number of tweets and the follower-followee ratio. These twitter specific features coupled with URL based features prove to be a strong mechanism to detect phishing tweets. We use machine learning classification techniques and detect phishing tweets with an accuracy of 92.52%. We have deployed our system for end-users by providing an easy to use Chrome browser extension. The extension works in realtime and classifies a tweet as phishing or safe when it appears in Twitter timeline of a user. In this research, we show that we are able to detect phishing tweets at zero hour with high accuracy which is much faster than public blacklists and as well as Twitter's own defense mechanism to detect malicious content. We also performed a quick user evaluation of PhishAri in a laboratory study to show that users like and are happy to use PhishAri in real-world. To the best of our knowledge, this is the first realtime, comprehensive, and usable system to detect phishing on Twitter.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

PhishAri: Automatic Realtime Phishing Detection on Twitter

  1. 1. Automatic Realtime Phishing Detection on Twitter Anupama Aggarwal, Ashwin Rajadesingan, Ponnurangam Kumaraguru 1
  2. 2. Motivation: Some Statistics • $520 million were lost worldwide from phishing attacks in 2011 alone. (RSA Report) • In 2012, around 20% of all phishing attacks targeted Facebook • Social network phishing has jumped 221% attacks during Q1 of 2012 2
  3. 3. Phishing Detection on OSM: Current State-of-Art • Offline Spam Characterization & Detection Studies • No characterization of phishing on OSM • Lack of Realtime detection mechanisms • Absence of end-user deployed systems • Dependence on Spam/Phishing Blacklists 3
  4. 4. What Did We Do to Fill the Gap? • Built a mechanism to Automatically detect phishing on Twitter in Realtime • No dependency on Blacklists • Deployed end-user system for Twitter users - Chrome Extension 4
  5. 5. Twitter 101 Hey, I am in Puerto Rico attending @APWG eCrime research Tweets <140 char Talking about #phishing on OSN Earn Money #help #money 5
  6. 6. Twitter 101 Hey, I am in Puerto Rico To mention/reply to a Twitter user @Tag attending @APWG eCrime research Talking about #phishing on OSN To mention a topic #Tag Earn Money #help #money To link external media URL in Tweet 6
  7. 7. Twitter 101 We’ll follow Blue! attending @APWG eCrime research I’ll follow Grey2! I’ll follow Grey1! Followers Nice! I’ll share this tweet in my network! attending @APWG eCrime research Followees Retweet (RT) 7
  8. 8. Twitter 101 We’ll follow Blue! @Blue Twitter Timeline attending @APWG eCrime research I’ll follow Grey2! I’ll follow Grey1! Followers Tweets by Followees Retweets by Followees Tweets by Self Retweets by Self Tweets with @Blue Nice! I’ll share this tweet in my network! attending @APWG eCrime research Followees Retweet (RT) 8
  9. 9. Challenges of Phishing Detection on Twitter • Only 140 Characters - very less information • Use of short URLs in tweets • 100,000 Tweets per minute - quick spread • Phishing Blacklists are slow - not reliable 9
  10. 10. Our Contribution • PhishAri: Automatic realtime phishing detection mechanism for Twitter • More efficient than plain blacklisting method • Better than Twitter’s own phishing detection mechanism • Real-world implementation of the system Chrome Extension for Twitter 10
  11. 11. Methodology • • Step 1: Classification Model for Phishing Detection • • • Data Collection Feature Extraction Classification Step 2: Realtime end-user Interface • • Using pre-trained classification model Chrome Browser Extension 11
  12. 12. Data Collection • • 1,589 Phishing Tweets 903 Unique phishing URLs Wait for 3 days 12
  13. 13. Features Used • URL Features - Length, number of dots, characters, redirections • WHOIs Features - domain name, ownership period • Tweet Features - Number of #tags, @mentions, length, trending topics • Network Features - Follower/Followee ratio, Age of account, Number of Tweets 13
  14. 14. Classification Results Evaluation Naive Bayes Metric Decision Tree Random Forest Accuracy 87.02% 89.28% 92.52% Precision 89.21% 88.05% 95.24% Precision 92.12% 94.15% 97.23% Recall 68.32% 74.51% 92.21% Precision 85.68% 89.20% 95.54% (Phishing) (Safe) (Phishing) (Safe) 14
  15. 15. Evaluation • Comparison with Blacklists • 80.6% more phishing tweets detected by PhishAri at zero hour which were caught by blacklists after 3 days. • Comparison with Twitter’s defense mechanism • 84.6% more phishing tweets detected by PhishAri at zero hour which were marked as suspicious by Twitter after 3 days 15
  16. 16. Time Evaluation • Used Intel Xeon 16 core Ubuntu server with 2.67 GHz processor and 32 GB RAM • Multiprocessing Modules for faster processing • Time required for the feature extraction & classification of a tweet is a maximum of 0.522 seconds (Min: 0.167 sec, Avg: 0.425 sec, Median 0.384 sec) 16
  17. 17. Text Analysis Phishing Tweets Legitimate Tweets 17
  18. 18. PhishAri: RESTful API • Use above classification model to create a RESTful API • POST requests can be made to API to query a tweet • Pre-trained classifier model used for classification of new tweets 18
  19. 19. PhishAri Chrome Extension 19
  20. 20. PhishAri Chrome Extension • Red / Green Indicators in front of Tweets with URLs • Detects phishing tweets on • User Timeline • Twitter search results • Profile of other users • DMs (Limited as for now) 20
  21. 21. Demo 21
  22. 22. How Extension Works? • Integration of API with the Browser Extension 22
  23. 23. PhishAri Extension: User Experience and Statistics • 78 Active Users • User study shows that • users want support for other browsers, mobile apps • found useful to use • more robustness desired 23
  24. 24. Conclusion • “Phish” + “Ari” = Realtime Automatic Detection • 92.52% Accuracy with Random Forest Classifier • Efficient - takes only 0.522 seconds for indicator to appear • No dependency on Blacklists • Faster than Blacklists • Faster than Twitter’s own detection mechanism 24
  25. 25. Future Work • Backend database for faster lookup • Increase the scope of PhishAri from public to all tweets • Increase response time of PhishAri and appearance of indicators • Support for other browsers and mobile apps 25
  26. 26. Thank You! Questions? Suggestions? 26
  27. 27. For any further information, please write to 27