Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Exploration of gaps in Bitly's spam detection
and relevant counter measures
Neha Gupta
Advisor: Dr. Ponnurangam Kumaraguru...
Thesis Committee
 Mr. Sachin Gaur, MixORG
 Dr. Vinayak Naik, IIIT-Delhi
 Dr. PK (Chair), IIIT-Delhi
2
Achievements
 Gupta, N., Aggarwal, A., and Kumaraguru, P. bit.ly/can-do-better.
Poster at Security and Privacy Symposium ...
Presentation Outline
 Research Motivation and Aim
 Related Work
 Research Contribution
 Methodology
 Experiments and ...
What are URL shortening services?
Long URL Short URL
…
Others
http://bit.ly/1oL7gi5
https://www.youtube.com/watch?v=ukUL_I...
Use of URL shortening services
 Space gain (Twitter’s 140 character limit)
 More manageable
 Prevent line breaks
 Easy...
Abuse of URL shortening services
- Attack scenario
URL
shortening
service
One-level obfuscation
Long malicious URL Short m...
Abuse of URL shortening services
- Attack execution
Legitimate looking tweet Scam!!
8
Research Motivation and Aim
Major attacks
Year 2014
9
Research Motivation and Aim
Bitly's Spam Detection Policies
+
+
More filters..
10
‘‘ ’’
‘‘
’’
Research Motivation and Aim
Research Aim
A focused study on Bitly to:
 characterize malicious URLs
 examine Bitly’s security policies
 identify Bit...
Presentation Outline
 Research Motivation and Aim
 Related Work
 Research Contribution
 Methodology
 Experiments and ...
Related Work
2009
• Kandylas et al. Relative study of long and short Bitly URLs on Twitter
2010
• Benevenuto et al. + Grie...
Related Work
2013
• Click Traffic Analysis of Short URL Spam on Twitter
Wang et al. Classification of a link as spam / non...
Research Contribution
 Click traffic analysis and social network impact of malicious Bitly links
 Detailed inspection to...
Methodology - Acquiring Dataset
link_encoder_info
link_encoder_link_history
link_info
link_expand
link_clicks
link_referri...
Presentation Outline
 Research Motivation and Aim
 Related Work
 Research Contribution
 Methodology
 Experiments and ...
Metadata Analysis – Content Creation
Bitly Global Hash
Long URL
#Warnings
TLD
Domains
Status check after 5 months
(22,038)...
Experiments and Analysis
19
Network Analysis
 Referrer Network
 Connected Network
Experiments and Analysis
Network Analysis - Referrer Network
Referrer as Twitter
37,903
last <=200 tweets
788,759
Text + URL + Domain
Jaccard Simil...
21
Network Analysis – Connected Network
Experiments and Analysis
Twitter
3,415
(63.54%)
Facebook
951
(17.69%)
1,009
18.77%
5,375 users connected Twitter / Facebook profile
22
 Users can ...
23
Bitly profiles
(Link history)
Bitly warning check
(Connected Twitter
accounts)
(<=200 tweets)
Inter Twitter profile
Jac...
Community 1
2 Bitly users, 9 associated Twitter accounts each
All 18 Twitter accounts shared similar explicit
pornograph...
Experiments and Analysis
25
Experiments and Analysis
Security Analysis
 Efficiency Check
 Promptness Check
 Tractabilit...
(a) Malicious link identification
Security Analysis - Efficiency Check
International consortium that
brings together busin...
1
APWGs live
feed request
setup
142,660
6 months
216
2,656
2,872
+
Bitly warning check 382
(13.30%)
Inference:
 86% undet...
(b) Malicious user profile Identification
collectedlinksTotal
pagewarningBitlytogredirectinLinks
FactorSuspicion encoder
#...
2,018 (12,344 - 10,326) out of 12,344 encoders (16.35%) had a Suspicion Factor=1
i.e. they shortened only suspicious links...
Our blog Bitly’s reponse
Tweet to our blog
30
Experiments and Analysis
Security Analysis - Efficiency Check
Highly suspicious profiles: User has shortened at least 100 links + Suspicion Factor is 1 80 profiles
31
Experiments and A...
Result:
 35.2% identified malicious links in October 2013 are also being actively clicked in year 2014
Inference:
 By-pa...
 Bitly is not using the claimed detection services effectively
 Extreme delay in suspicious user identification (if at a...
Presentation Outline
 Research Motivation and Aim
 Related Work
 Research Contribution
 Methodology
 Experiments and ...
Malicious Bitly Link Detection - Data
Collection and Labeling
a repository of suspected phishing or malware pages maintain...
Malicious Bitly Link Detection – Feature
Selection
No. Feature Name Feature Description
1 Domain age Difference between do...
Malicious Bitly Link Detection -
Experimental Setup
1) Naive Bayes
2) Decision Tree
3) Random Forest
Machine Learning Algo...
Malicious Bitly Link Detection –
Evaluation Results
(a) Experiment 1
Mix dataset – Click and Non-click
All features
Malici...
Malicious Bitly Link Detection -
Evaluation Results
(c) Experiment 2
Mix dataset – Click and Non-click
WHOIS + Non-click b...
Malicious Bitly Link Detection –
Evaluation Results
(b) Experiment 3
Only Non-click data
WHOIS + Non-click based features
...
Malicious Bitly Link Detection -
Feature Ranks
Rank Feature
1 Type of referring domains
2 Link Creation domain creation di...
Malicious Bitly Link Detection - Result
Summary
 Increase in accuracy and F-measure on using only non-click based
feature...
Proposed Counter Measures - Summary
 Impose a check on the number of connected OSM accounts by a single
profile
 Either ...
 Domains created for a dedicated purpose of spamming eventually die out after
achieving a significant number of hits
 Sp...
 Since characteristics of spammers change over time , do a detailed comparative
analysis on a more exhaustive dataset
 B...
Acknowledgements
Dr. PK, IIIT-Delhi
Anupama Aggarwal, PhD ,IIIT-Delhi
Brian David Eoff (senior data scientist), Mark Josep...
References (I)
 Alexander Neumann, Johannes Barnickel, Ulrike Meyer. Security and Privacy Implications of URL Shortening
...
 Kurt Thomas, Chris Grier, Justin Ma, Vern Paxson, and Dawn Song. Design and Evaluation of a Real-Time URL
Spam Filtering...
 Saeed Abu-Nimeh, Dario Nappa, Xinlei Wang, and Suku Nair. A Comparison of Machine Learning Techniques
for Phishing Detec...
Thank You!
Questions?
50
Upcoming SlideShare
Loading in …5
×

Exploration of gaps in Bitly's spam detection and relevant countermeasures

1,667 views

Published on

Abstract: Existence of spam URLs over emails and Online Social Media (OSM) has become a growing phenomenon. To counter the dissemination issues associated with long complex URLs in emails and character limit imposed on various OSM (like Twitter), the concept of URL shortening gained a lot of traction. URL shorteners take as input a long URL and give a short URL with the same landing page in return. With its immense popularity over time, it has become a prime target for the attackers giving them an advantage to conceal malicious content. Bitly, a leading service in this domain is being exploited heavily to carry out phishing attacks, work from home scams, pornographic content propagation, etc. This imposes additional performance pressure on Bitly and other URL shorteners to be able to detect and take a timely action against the illegitimate content. In this study, we analyzed a dataset marked as suspicious by Bitly in the month of October 2013 to highlight some ground issues in their spam detection mechanism. In addition, we identified some short URL based features and coupled them with two domain specific features to classify a Bitly URL as malicious / benign and achieved a maximum accuracy of 86.41%. To the best our knowledge, this is the first large scale study to highlight the issues with Bitly’s spam detection policies and proposing a suitable countermeasure.

Published in: Engineering, Technology, Design
  • Be the first to comment

Exploration of gaps in Bitly's spam detection and relevant countermeasures

  1. 1. Exploration of gaps in Bitly's spam detection and relevant counter measures Neha Gupta Advisor: Dr. Ponnurangam Kumaraguru M.Tech Thesis Defense 23-April-2014
  2. 2. Thesis Committee  Mr. Sachin Gaur, MixORG  Dr. Vinayak Naik, IIIT-Delhi  Dr. PK (Chair), IIIT-Delhi 2
  3. 3. Achievements  Gupta, N., Aggarwal, A., and Kumaraguru, P. bit.ly/can-do-better. Poster at Security and Privacy Symposium (SPS), IIT-K, 2014. 3
  4. 4. Presentation Outline  Research Motivation and Aim  Related Work  Research Contribution  Methodology  Experiments and Analysis  Malicious Bitly Link Detection  Conclusion  Future Work  Questions 4 Presentation Outline
  5. 5. What are URL shortening services? Long URL Short URL … Others http://bit.ly/1oL7gi5 https://www.youtube.com/watch?v=ukUL_I14GPw URL shortening service hash The most popular 5 Research Motivation and Aim  Shortens close to 80 million links each day  Marks 2-3 million as suspicious every week  Twitter’s default URL shortening service before 2011
  6. 6. Use of URL shortening services  Space gain (Twitter’s 140 character limit)  More manageable  Prevent line breaks  Easy dissemination of content  Provides useful analytics (e.g. click data)  Online Social Media (OSM) connection  Complex link obfuscation Please like this picture http://3.bp.blogspot.com/_s5emCsFnEd E/TKUVi2BopBI/AAAAAAAADl8/lffvi7khF 7g/s1600/googl.png @abc Dr. ABC 10:44 PM Tue Aug 30, 2011 Please like this picture bit.ly/1hAVVaE @abc Dr. ABC 10:44 PM Tue Aug 30, 2011 versus 6 Research Motivation and Aim
  7. 7. Abuse of URL shortening services - Attack scenario URL shortening service One-level obfuscation Long malicious URL Short malicious URL Many short URL services detect and restrict long URLs at submission, but Bitly does not Not so popular URL shortening service Long malicious URL Short malicious URL Popular URL shortening service Multi-level obfuscation … 7 Research Motivation and Aim http://www.sexpixbox.com/kingnet/cute/index.html http://bit.ly/QCMW2S http://www.sexpixbox.com/kingnet/cute/index.html http://bit.ly/QCMW2Shttp://short.me/ABCD
  8. 8. Abuse of URL shortening services - Attack execution Legitimate looking tweet Scam!! 8 Research Motivation and Aim
  9. 9. Major attacks Year 2014 9 Research Motivation and Aim
  10. 10. Bitly's Spam Detection Policies + + More filters.. 10 ‘‘ ’’ ‘‘ ’’ Research Motivation and Aim
  11. 11. Research Aim A focused study on Bitly to:  characterize malicious URLs  examine Bitly’s security policies  identify Bitly specific features to detect spam 11 Research Motivation and Aim
  12. 12. Presentation Outline  Research Motivation and Aim  Related Work  Research Contribution  Methodology  Experiments and Analysis  Malicious Bitly Link Detection  Conclusion  Future Work  Questions 12 Presentation Outline
  13. 13. Related Work 2009 • Kandylas et al. Relative study of long and short Bitly URLs on Twitter 2010 • Benevenuto et al. + Grier et al. Identification of distinctive features to detect spammers on Twitter 2011 • Antoniades et al. Analysis of content, popularity, and impact of short URLs • Chhabra et al. Overview of evolving phishing attacks through short URLs on Twitter • Thomas et al. Classification of a long URL as malicious / benign in real time 2012 • Klien et al. Global usage pattern analysis of short URLs • Aggarwal et al. Real time phishing detection on Twitter using Twitter and URL based features • Lee et al. Real time suspicious URL detection technique on Twitter using conditional redirects 2013 • Maggi et al. Study of abuse of short URLs using 622 distinct shortening services 2014 • Nikiforakis et al. Study of ecosystem of ad-based URL shortening services 13 Related Work
  14. 14. Related Work 2013 • Click Traffic Analysis of Short URL Spam on Twitter Wang et al. Classification of a link as spam / non-spam using only click traffic based features 14 Related Work  No dedicated study to identify ground security issues specific to a URL shortener  Unexplored short URL based features to detect malicious content before it targets the audience Research Gaps
  15. 15. Research Contribution  Click traffic analysis and social network impact of malicious Bitly links  Detailed inspection to highlight weaknesses in Bitly’s spam detection techniques  Proposal to detect clicked / unclicked malicious Bitly links 15 Related Contribution
  16. 16. Methodology - Acquiring Dataset link_encoder_info link_encoder_link_history link_info link_expand link_clicks link_referring_domains link_encoders Bitly Global Hash Long URL #Warnings Link Dataset (763,160) Link Metric Dataset (413,119) Encoder/User Metric Dataset (12,344) (Bitly API) (Bitly API) Phase 1 Phase 2 Phase 3 (54.13%) (100%) 16 Methodology
  17. 17. Presentation Outline  Research Motivation and Aim  Related Work  Research Contribution  Methodology  Experiments and Analysis  Malicious Bitly Link Detection  Conclusion  Future Work  Questions 17 Presentation Outline
  18. 18. Metadata Analysis – Content Creation Bitly Global Hash Long URL #Warnings TLD Domains Status check after 5 months (22,038)Link Dataset (763,160) Non-existent Domains (18,966) Whitelist check Domains (21,982) Results:  83.06% suspicious domains non-existent before / after 5 months  Total number of click requests made to these dead domains (only in October) found to be 9,937,250 Inference: Created for a dedicated purpose of spamming and eventually die out after achieving significant number of hits!! 18 Experiments and Analysis
  19. 19. Experiments and Analysis 19 Network Analysis  Referrer Network  Connected Network Experiments and Analysis
  20. 20. Network Analysis - Referrer Network Referrer as Twitter 37,903 last <=200 tweets 788,759 Text + URL + Domain Jaccard Similarity |BA| |BA| B)J(A,   (Twitter API) 17 Twitter profiles with variance <=0.00012 … Status check after 5 months Manual annotation 21,679 20 4,336 (11.44%) 636 (1.68%) 5,444 (14.36%) 5,302 (13.99%) 22,185 (58.53%) Experiments and Analysis Hour of the day vs. Minute of the hour graph for Twitter user – (a) @dtitgp2. (b) @fujisakikaoru 3 profiles: pornographic content 1 profile: work from home scam 11 profiles: spam but <=3 tweets 2 profiles: suspended
  21. 21. 21 Network Analysis – Connected Network Experiments and Analysis
  22. 22. Twitter 3,415 (63.54%) Facebook 951 (17.69%) 1,009 18.77% 5,375 users connected Twitter / Facebook profile 22  Users can connect any number of Facebook / Twitter accounts Why more Twitter than Facebook?  Doesn't allow users to connect Facebook brand / fan pages for free Multiple connections  507 malicious users connected multiple Twitter accounts  28 malicious users connected at least 10 Twitter accounts Experiments and Analysis Network Analysis – Connected Network Connected OSM network of all encoders
  23. 23. 23 Bitly profiles (Link history) Bitly warning check (Connected Twitter accounts) (<=200 tweets) Inter Twitter profile Jaccard Similarity (Bitly user name) Inter Bitly profile Jaccard Similarity Manual annotation based on similarity scores 3 malicious communities detected Experiments and Analysis Network Analysis – Connected Network
  24. 24. Community 1 2 Bitly users, 9 associated Twitter accounts each All 18 Twitter accounts shared similar explicit pornographic content Dormant on Bitly, active on Twitter 24 Experiments and Analysis Network Analysis – Connected Network Counter measure 1 Bitly should impose a restriction on the number of OSM accounts a user can connect
  25. 25. Experiments and Analysis 25 Experiments and Analysis Security Analysis  Efficiency Check  Promptness Check  Tractability Check
  26. 26. (a) Malicious link identification Security Analysis - Efficiency Check International consortium that brings together businesses affected by phishing attacks Free checking of suspicious URLs using 52 different website / domain scanning engines and datasets Domain / IP level blacklist created from the occurrence of websites in unsolicited messages. 1 2 3 26 Experiments and Analysis
  27. 27. 1 APWGs live feed request setup 142,660 6 months 216 2,656 2,872 + Bitly warning check 382 (13.30%) Inference:  86% undetected malicious links by Bitly in 6 months  Such low detection rate looks alarming as APWG is a popular and trusted source to detect phishing! 27 Experiments and Analysis Security Analysis - Efficiency Check Bitly is not even using the claimed detection services effectively  Similar analysis when performed against Virustotal, 71.53% undetected links obtained  36.66% domains blacklisted by SURBL, but undetected by Bitly  Bitly claims to use SURBL (http://blog.bitly.com/post/138381844/spam-and-malware-protection)
  28. 28. (b) Malicious user profile Identification collectedlinksTotal pagewarningBitlytogredirectinLinks FactorSuspicion encoder # #  Suspicion Factor measures the credibility of a Bitly profile  Computed for all 12,344 encoders in our dataset 28 Experiments and Analysis Security Analysis - Efficiency Check
  29. 29. 2,018 (12,344 - 10,326) out of 12,344 encoders (16.35%) had a Suspicion Factor=1 i.e. they shortened only suspicious links 29 Experiments and Analysis Security Analysis - Efficiency Check Counter measure 2  Bitly should take some measures to detect and suspend such users  If not suspend, a credibility score can be added with a Bitly profile
  30. 30. Our blog Bitly’s reponse Tweet to our blog 30 Experiments and Analysis Security Analysis - Efficiency Check
  31. 31. Highly suspicious profiles: User has shortened at least 100 links + Suspicion Factor is 1 80 profiles 31 Experiments and Analysis Security Analysis – Promptness Check User: bamsesang, Month lag: 24  Also collected their recent link history (after 1 January 2014)  4 of these 80 users were still active and propagating malicious content Ease of penetration of spammers and delay in Bitly’s suspicious user detection process which it claims to follow User: iplayonlinegames, Month lag: 18
  32. 32. Result:  35.2% identified malicious links in October 2013 are also being actively clicked in year 2014 Inference:  By-passable Bitly warning page is alone not enough to curtail the dissemination of spam  No control over the access to already detected malicious links can heavily encourage spammers to use Bitly Popular malicious Bitly links Links with large number of warning pages displayed (URLs with high overall impact) Bitly Global Hash Long URL #Warnings Link Dataset (763,160) Reverse sort based on number of warnings Top 1000 links Bitly API Recent Click history 32 Experiments and Analysis Security Analysis – Tractability Check Counter measure 3 Bitly should not only throw a warning page but also block the visit on popular malicious Bitly links already detected
  33. 33.  Bitly is not using the claimed detection services effectively  Extreme delay in suspicious user identification (if at all)  By-passable Bitly warning page is alone not enough to control the problem of spam 33 Experiments and Analysis Security Analysis – Major Findings
  34. 34. Presentation Outline  Research Motivation and Aim  Related Work  Research Contribution  Methodology  Experiments and Analysis  Malicious Bitly Link Detection  Conclusion  Future Work  Questions 34 Presentation Outline
  35. 35. Malicious Bitly Link Detection - Data Collection and Labeling a repository of suspected phishing or malware pages maintained by Google Inc. a public crowdsourced database of phishing URLs 35 Malicious Bitly Link Detection Tweets from Twitter’s REST API (412,139) Blacklist + Bitly Warning Check Extract and expand bitly URLs (34,802) Malicious Benign labeled-datasetunlabeled-datasetCollect data 1. Google Safebrowsing 2. SURBL 3. PhishTank 4. VirusTotal Data Collection Data Labeling
  36. 36. Malicious Bitly Link Detection – Feature Selection No. Feature Name Feature Description 1 Domain age Difference between domain creation / updation date and expiration date 2 Link Creation domain creation difference Difference between domain creation date and bitly link creation date 3 Link creation hour Bitly link creation hour 4 Number of encoders Number of bitly users who encoded a particular link 5 Anonymous and API encoder ratio Ratio of encoders as ‘’anonymous’’ or from a Twitter based application (Twitterfeed, TweetDeck, Tweetbot) to the total number of encoders 6 Link creation first click difference Difference in days between bitly link creation date and date of first click received 7 Referring domains - direct by total Ratio of referring domains from a direct source to the total number of referring domains WHOIS specific Bitly specific Non-Click based Clickbased 36 Malicious Bitly Link Detection
  37. 37. Malicious Bitly Link Detection - Experimental Setup 1) Naive Bayes 2) Decision Tree 3) Random Forest Machine Learning Algorithms Training on pre-labeled dataset Predict labels of an unseen data  Used the classifiers implemented in Weka software package  Open source collection of machine learning classifiers for data mining tasks 37 Malicious Bitly Link Detection Training and Testing Data Testing 25% Training 75% 10 fold cross validation Performance evaluation on unlabeled data
  38. 38. Malicious Bitly Link Detection – Evaluation Results (a) Experiment 1 Mix dataset – Click and Non-click All features Malicious Benign 5,926 6,074 Malicious Benign 2,074 1,926 Training data (75%) Test Data (25%) Evaluation Metric Naive Bayes Decision Tree Random Forest Accuracy 72.15% 78.37% 80.43% Recall (malicious) 73.10% 82.40% 81.00% Recall (Benign) 71.10% 74.10% 79.90% Precision (malicious) 73.10% 77.40% 81.20% Precision (Benign) 71.10% 79.60% 79.60% F-measure (malicious) 73.10% 76.70% 81.10% F-measure (benign) 71.10% 76.70% 79.70% 38 TP FP FN TN Malicious Bitly Link Detection
  39. 39. Malicious Bitly Link Detection - Evaluation Results (c) Experiment 2 Mix dataset – Click and Non-click WHOIS + Non-click based features Evaluation Metric Naive Bayes Decision Tree Random Forest Accuracy 73.57% 81.93% 83.50% Recall (malicious) 73.40% 83.50% 84.20% Recall (Benign) 73.80% 80.30% 82.80% Precision (malicious) 75.10% 82.00% 84.00% Precision (Benign) 72.00% 81.80% 82.90% F-measure (malicious) 74.20% 82.70% 84.10% F-measure (benign) 72.90% 81.00% 82.80% Malicious Benign 5,926 6,074 Malicious Benign 2,074 1,926 Training data (75%) Test Data (25%) 39 TP FP FN TN Malicious Bitly Link Detection
  40. 40. Malicious Bitly Link Detection – Evaluation Results (b) Experiment 3 Only Non-click data WHOIS + Non-click based features Malicious Benign 2,743 2,796 Malicious Benign 950 897 Training data (75%) Test Data (25%) Evaluation Metric Naive Bayes Decision Tree Random Forest Accuracy 80.02% 85.06% 86.41% Recall (malicious) 79.60% 89.50% 89.60% Recall (Benign) 80.40% 80.80% 83.40% Precision (malicious) 79.30% 81.50% 83.60% Precision (Benign) 80.70% 89.10% 89.50% F-measure (malicious) 79.50% 85.30% 86.50% F-measure (benign) 80.50% 84.80% 86.30% 40 TP FP FN TN Malicious Bitly Link Detection
  41. 41. Malicious Bitly Link Detection - Feature Ranks Rank Feature 1 Type of referring domains 2 Link Creation domain creation difference 3 Domain age 4 Link creation hour 5 Type of encoders 6 Link creation-click lag 7 Number of encoders Rank Feature 1 Link creation hour 2 Link Creation domain creation difference 3 Domain age 4 Type of encoders 5 Number of encoders Complete labeled-dataset Non-Click labeled-dataset  Using Weka's InfoGainAttributeEval package for attribute selection  Evaluates the worth of an attribute by measuring the information gain with respect to the class 41 Malicious Bitly Link Detection
  42. 42. Malicious Bitly Link Detection - Result Summary  Increase in accuracy and F-measure on using only non-click based features  Not only efficient in detecting clicked malicious Bitly links, but can identify suspicious links even when no click is received  This solution can also capture the multi level obfuscation technique used by attackers 42 Malicious Bitly Link Detection Counter measure 4 In addition to the blacklists and other spam detection filters, Bitly specific feature set can also be used to detect malicious content
  43. 43. Proposed Counter Measures - Summary  Impose a check on the number of connected OSM accounts by a single profile  Either directly delete the identified malicious links or introduce a credibility score for each profile to warn users (of upcoming risks)  More than a warning page, should go ahead and block the popular malicious links to prevent its persistence over web  In addition to various blacklists and other filters, can also incorporate available Bitly analytics to better detect illegitimate content 43 Counter Measures
  44. 44.  Domains created for a dedicated purpose of spamming eventually die out after achieving a significant number of hits  Spammers exploit Bitly's policy of not imposing a cap on the number of connected OSM accounts  Existence of malicious communities which operate across Bitly and Twitter  Bitly is not using the claimed detection services effectively  Inability / extreme-delay in detection of malicious accounts  By-passable warning page does not restrict the overall problem of spam  High classification accuracy for non-click dataset; capable of identifying suspicious Bitly links much before they target their audience Conclusion 44 Conclusion
  45. 45.  Since characteristics of spammers change over time , do a detailed comparative analysis on a more exhaustive dataset  Broaden and generalize our feature set to detect spam from any short URL services  Develop a browser extension that can work in real time and classify any short link as malicious or benign Future Work 45 Future Work
  46. 46. Acknowledgements Dr. PK, IIIT-Delhi Anupama Aggarwal, PhD ,IIIT-Delhi Brian David Eoff (senior data scientist), Mark Josephson (CEO), Bitly CERC, IIIT-Delhi Precog members, friends and family 46
  47. 47. References (I)  Alexander Neumann, Johannes Barnickel, Ulrike Meyer. Security and Privacy Implications of URL Shortening Services. In proceedings of Web 2.0 Security and Privacy (W2SP) (2011).  Florian Klien, Markus Strohmaier. Short Links Under Attack: Geographical Analysis of Spam in a URL Shortener Network. In proceedings of the 23rd ACM conference on Hypertext and social media (2012), Pages 83-88.  Demetris Antoniades, Iasonas Polakis, Georgios Kontaxis. we.b: The web of short URLs. In proceedings of the 20th international conference on World wide web (2011), Pages 715-724.  Federico Maggi, Alessandro Frossi, Stefano Zanero, Gianluca Stringhini, Brett Stone-Gross, Christopher Kruegel, Giovanni Vigna. Two Years of Short URLs Internet Measurement: Security Threats and Countermeasures. In proceedings of the 22nd international conference on World Wide Web (2013), Pages 861- 872.  Sangho Lee and Jong Kim. WARNINGBIRD: Detecting Suspicious URLs in Twitter Stream. NDSS 2012 (2012).  De Wang, Shamkant B. Navathe, Ling Liu, Danesh Irani, Acar Tamersoy, and Calton Pu. Click Traffic Analysis of Short URL Spam on Twitter. Collaborative Computing: Networking, Applications and Worksharing (Collaboratecom), 2013 9th International Conference (2013), Pages 250-259.  Aditi Gupta and and Ponnurangam Kumaraguru. Credibility ranking of tweets during high impact events. In Proceedings of the 1st Workshop on Privacy and Security in Online Social Media (PSOSM), In conjunction with WWW'12 (2012). 47
  48. 48.  Kurt Thomas, Chris Grier, Justin Ma, Vern Paxson, and Dawn Song. Design and Evaluation of a Real-Time URL Spam Filtering Service. Security and Privacy (SP) IEEE Symposium (2011), Pages 447 - 462.  Hongyu Gao, Jun Hu, and Christo Wilson. Detecting and Characterizing Social Spam Campaigns. In proceedings of the 10th ACM SIGCOMM conference on Internet measurement (2010), Pages 35-47.  Fabricio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virgilio Almeida. Detecting Spammers on Twitter. In Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS) (2010).  Anupama Aggarwal, Ashwin Rajadesingan, and Ponnurangam Kumaraguru. PhishAri: Automatic Realtime Phishing Detection on Twitter. In Seventh IEEE APWG eCrime researchers summit (eCRS) (2012). Master's thesis, IIIT-Delhi, http://precog.iiitd.edu.in/Publications les/Anupama_MTech_Thesis.pdf (2012).  Chris Grier, Kurt Thomas, Vern Paxson, and Michael Zhang. @spam: The Underground on 140 Characters or Less. In proceedings of the 17th ACM conference on Computer and communications security (2010), Pages 27- 37.  Sidharth Chhabra, Anupama Aggarwal, Fabricio Benevenuto, and Ponnurangam Kumaraguru. Phi.sh/$oCiaL: The Phishing Landscape through Short URLs. CEAS '11 Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (2011), Pages 92-101. References (II) 48
  49. 49.  Saeed Abu-Nimeh, Dario Nappa, Xinlei Wang, and Suku Nair. A Comparison of Machine Learning Techniques for Phishing Detection. In proceedings of eCrime researchers summit (2007), ACM, Pages 60-69.  Mashable. Warning: Bit.ly Is Eating Other URL Shorteners for Breakfast. http://mashable.com/2009/10/12/bitly-domination/, October 2009.  Symantec. Spam with .gov URLs. http://www.symantec.com/connect/blogs/spam-gov-urls, October 2012.  Symantec. Malicious Shortened URLS on Social Networking Sites. http://www.symantec.com/threatreport/topic.jsp?id=threat_activity_trends&aid=malicious_shortened_urls, 2010.  Mark Hall, Eibe Frank, Georey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. The WEKA Data Mining Software: An Update. ACM SIGKDD Explorations Newsletter (2009), vol. 11, no. 1, Pages 1018. References (III) 49
  50. 50. Thank You! Questions? 50

×