Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
WarningBird: Detecting Suspicious URLs in             Twitter Stream           Sangho Lee and Jong Kim   Pohang University...
Threat         Post URLs to attract traffic to website         Can deliver various payloads
Threat         Post URLs to attract traffic to website         Can deliver various payloads           Spam
Threat         Post URLs to attract traffic to website         Can deliver various payloads           Spam           Phishing
Threat         Post URLs to attract traffic to website         Can deliver various payloads           Spam           Phishin...
Twitter      Online micro-blogging service          Large (about 100 million accounts)          URL shortener services    ...
Twitter      Online micro-blogging service          Large (about 100 million accounts)          URL shortener services    ...
Existing Detection Approaches and Limitations    1. Detect accounts based on account information           E.g., ratio of ...
Existing Detection Approaches and Limitations    1. Detect accounts based on account information           E.g., ratio of ...
Existing Detection Approaches and Limitations    1. Detect accounts based on account information           E.g., ratio of ...
Redirection Chains      Redirect chains start by resolving shortened URL      Several hops of URLs owned by attacker to re...
Problem      Given a URL posted on Twitter, determine whether a      legitimate user would ultimately be directed to a mal...
Problem      Given a URL posted on Twitter, determine whether a      legitimate user would ultimately be directed to a mal...
Problem      Given a URL posted on Twitter, determine whether a      legitimate user would ultimately be directed to a mal...
Warning Bird      Input: tweets      Output: suspicious URLs      Live website shows recent suspicious URLs
Data Collection      Use Twitter Streaming API to collect Tweets      Keep only Tweets with URLs      Crawl and store URL ...
Feature Extraction     Grouping domains xyz.com     = 20.30.40.50 = abc.com     Find entry point URLs     11 features base...
Features
Classifier       Features are all normalized between zero and one       Logistic regression classification experimentally fo...
Experimentation      Real Twitter data from Twitter Streaming API      Their own commodity hardware      Performed experim...
Accuracy Results      60 days of training data 183k benign and 42k malicious URLs      30 days of test data 71k benign and...
Performance Results      Running time of various components          24ms time to crawl redirections (100 concurrent crawl...
Delay Results      WarningBird can detect faster than Twitter      Only shows results for those accounts suspended by Twit...
Conclusion      Found important feature others have ignored      Attacker must either spend more for more redirection serv...
Upcoming SlideShare
Loading in …5
×

Warningbird

1,970 views

Published on

Published in: Technology, Design
  • Be the first to like this

Warningbird

  1. 1. WarningBird: Detecting Suspicious URLs in Twitter Stream Sangho Lee and Jong Kim Pohang University of Science and Technology January 18, 2012
  2. 2. Threat Post URLs to attract traffic to website Can deliver various payloads
  3. 3. Threat Post URLs to attract traffic to website Can deliver various payloads Spam
  4. 4. Threat Post URLs to attract traffic to website Can deliver various payloads Spam Phishing
  5. 5. Threat Post URLs to attract traffic to website Can deliver various payloads Spam Phishing Download Malicious Software
  6. 6. Twitter Online micro-blogging service Large (about 100 million accounts) URL shortener services Tweets broadcasted to legitimate users
  7. 7. Twitter Online micro-blogging service Large (about 100 million accounts) URL shortener services Tweets broadcasted to legitimate users Good vector for attackers to attract traffic Many potential targets URL shorteners common and mask actual website Many users view tweets based on content and not authorship
  8. 8. Existing Detection Approaches and Limitations 1. Detect accounts based on account information E.g., ratio of Tweets with URLs to Tweets without URLs Easily fabricated by attacker
  9. 9. Existing Detection Approaches and Limitations 1. Detect accounts based on account information E.g., ratio of Tweets with URLs to Tweets without URLs Easily fabricated by attacker 2. Detect accounts based on social graph E.g., connectivity measures for each node Hard to obtain and analyze large amounts of Twitter data
  10. 10. Existing Detection Approaches and Limitations 1. Detect accounts based on account information E.g., ratio of Tweets with URLs to Tweets without URLs Easily fabricated by attacker 2. Detect accounts based on social graph E.g., connectivity measures for each node Hard to obtain and analyze large amounts of Twitter data 3. Crawl URLs to classify them E.g., detect malicious URLs based on html content Redirection chains used by attackers
  11. 11. Redirection Chains Redirect chains start by resolving shortened URL Several hops of URLs owned by attacker to redirect user Dynamically choose which page a user ultimately visits Crawlers goto legitimate URL Legitimate users goto the malicious URL
  12. 12. Problem Given a URL posted on Twitter, determine whether a legitimate user would ultimately be directed to a malicious URL by visiting the URL on Twitter
  13. 13. Problem Given a URL posted on Twitter, determine whether a legitimate user would ultimately be directed to a malicious URL by visiting the URL on Twitter Assumptions: Cannot use features easily fabricated by attacker No access to large Twitter graph Have access to part of redirect chain available to crawlers Redirect chains cannot be fabricated
  14. 14. Problem Given a URL posted on Twitter, determine whether a legitimate user would ultimately be directed to a malicious URL by visiting the URL on Twitter Assumptions: Cannot use features easily fabricated by attacker No access to large Twitter graph Have access to part of redirect chain available to crawlers Redirect chains cannot be fabricated Solution Overview: Create classifier Rely on redirect chain for features Validate accuracy/performance with Twitter data
  15. 15. Warning Bird Input: tweets Output: suspicious URLs Live website shows recent suspicious URLs
  16. 16. Data Collection Use Twitter Streaming API to collect Tweets Keep only Tweets with URLs Crawl and store URL chain of each URL Queue many Tweets to be analyzed together
  17. 17. Feature Extraction Grouping domains xyz.com = 20.30.40.50 = abc.com Find entry point URLs 11 features based on URL chains and Tweet context
  18. 18. Features
  19. 19. Classifier Features are all normalized between zero and one Logistic regression classification experimentally found to be the best Ground truth from Twitter account status for supervised learning
  20. 20. Experimentation Real Twitter data from Twitter Streaming API Their own commodity hardware Performed experiments on Twitter data to investigate Accuracy Performance Delay in Detection
  21. 21. Accuracy Results 60 days of training data 183k benign and 42k malicious URLs 30 days of test data 71k benign and 6.7k malicious URLs Achieved 3.67% FPR and 3.21% FNR Of 71k benign, 2.6k marked malicious Of 6.7k malicious, 200 not discovered
  22. 22. Performance Results Running time of various components 24ms time to crawl redirections (100 concurrent crawls) 2ms domain grouping 1.6ms feature extraction 0.5ms classification Process 100,000 URLs in one hour Can distribute redirection crawling to improve this
  23. 23. Delay Results WarningBird can detect faster than Twitter Only shows results for those accounts suspended by Twitter within a day
  24. 24. Conclusion Found important feature others have ignored Attacker must either spend more for more redirection servers or risk being caught

×