Warningbird

1,720 views
1,876 views

Published on

Published in: Technology, Design
1 Comment
0 Likes
Statistics
Notes
  • Be the first to like this

No Downloads
Views
Total views
1,720
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
438
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide

Warningbird

  1. 1. WarningBird: Detecting Suspicious URLs in Twitter Stream Sangho Lee and Jong Kim Pohang University of Science and Technology January 18, 2012
  2. 2. Threat Post URLs to attract traffic to website Can deliver various payloads
  3. 3. Threat Post URLs to attract traffic to website Can deliver various payloads Spam
  4. 4. Threat Post URLs to attract traffic to website Can deliver various payloads Spam Phishing
  5. 5. Threat Post URLs to attract traffic to website Can deliver various payloads Spam Phishing Download Malicious Software
  6. 6. Twitter Online micro-blogging service Large (about 100 million accounts) URL shortener services Tweets broadcasted to legitimate users
  7. 7. Twitter Online micro-blogging service Large (about 100 million accounts) URL shortener services Tweets broadcasted to legitimate users Good vector for attackers to attract traffic Many potential targets URL shorteners common and mask actual website Many users view tweets based on content and not authorship
  8. 8. Existing Detection Approaches and Limitations 1. Detect accounts based on account information E.g., ratio of Tweets with URLs to Tweets without URLs Easily fabricated by attacker
  9. 9. Existing Detection Approaches and Limitations 1. Detect accounts based on account information E.g., ratio of Tweets with URLs to Tweets without URLs Easily fabricated by attacker 2. Detect accounts based on social graph E.g., connectivity measures for each node Hard to obtain and analyze large amounts of Twitter data
  10. 10. Existing Detection Approaches and Limitations 1. Detect accounts based on account information E.g., ratio of Tweets with URLs to Tweets without URLs Easily fabricated by attacker 2. Detect accounts based on social graph E.g., connectivity measures for each node Hard to obtain and analyze large amounts of Twitter data 3. Crawl URLs to classify them E.g., detect malicious URLs based on html content Redirection chains used by attackers
  11. 11. Redirection Chains Redirect chains start by resolving shortened URL Several hops of URLs owned by attacker to redirect user Dynamically choose which page a user ultimately visits Crawlers goto legitimate URL Legitimate users goto the malicious URL
  12. 12. Problem Given a URL posted on Twitter, determine whether a legitimate user would ultimately be directed to a malicious URL by visiting the URL on Twitter
  13. 13. Problem Given a URL posted on Twitter, determine whether a legitimate user would ultimately be directed to a malicious URL by visiting the URL on Twitter Assumptions: Cannot use features easily fabricated by attacker No access to large Twitter graph Have access to part of redirect chain available to crawlers Redirect chains cannot be fabricated
  14. 14. Problem Given a URL posted on Twitter, determine whether a legitimate user would ultimately be directed to a malicious URL by visiting the URL on Twitter Assumptions: Cannot use features easily fabricated by attacker No access to large Twitter graph Have access to part of redirect chain available to crawlers Redirect chains cannot be fabricated Solution Overview: Create classifier Rely on redirect chain for features Validate accuracy/performance with Twitter data
  15. 15. Warning Bird Input: tweets Output: suspicious URLs Live website shows recent suspicious URLs
  16. 16. Data Collection Use Twitter Streaming API to collect Tweets Keep only Tweets with URLs Crawl and store URL chain of each URL Queue many Tweets to be analyzed together
  17. 17. Feature Extraction Grouping domains xyz.com = 20.30.40.50 = abc.com Find entry point URLs 11 features based on URL chains and Tweet context
  18. 18. Features
  19. 19. Classifier Features are all normalized between zero and one Logistic regression classification experimentally found to be the best Ground truth from Twitter account status for supervised learning
  20. 20. Experimentation Real Twitter data from Twitter Streaming API Their own commodity hardware Performed experiments on Twitter data to investigate Accuracy Performance Delay in Detection
  21. 21. Accuracy Results 60 days of training data 183k benign and 42k malicious URLs 30 days of test data 71k benign and 6.7k malicious URLs Achieved 3.67% FPR and 3.21% FNR Of 71k benign, 2.6k marked malicious Of 6.7k malicious, 200 not discovered
  22. 22. Performance Results Running time of various components 24ms time to crawl redirections (100 concurrent crawls) 2ms domain grouping 1.6ms feature extraction 0.5ms classification Process 100,000 URLs in one hour Can distribute redirection crawling to improve this
  23. 23. Delay Results WarningBird can detect faster than Twitter Only shows results for those accounts suspended by Twitter within a day
  24. 24. Conclusion Found important feature others have ignored Attacker must either spend more for more redirection servers or risk being caught

×