Exploration of gaps in Bitly's spam detection and relevant countermeasures

Cybersecurity Education and Research Centre
Apr. 23, 2014
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
1 of 50

More Related Content

What's hot

IRJET-Fake Product Review MonitoringIRJET-Fake Product Review Monitoring
IRJET-Fake Product Review MonitoringIRJET Journal
Social networks protection against  fake profiles and social bots attacksSocial networks protection against  fake profiles and social bots attacks
Social networks protection against fake profiles and social bots attacksAboul Ella Hassanien
Game Theory Approach for Identity Crime DetectionGame Theory Approach for Identity Crime Detection
Game Theory Approach for Identity Crime DetectionIOSR Journals
IRJET-  	  Fake Profile Identification using Machine LearningIRJET-  	  Fake Profile Identification using Machine Learning
IRJET- Fake Profile Identification using Machine LearningIRJET Journal
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...IRJET Journal
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNINGDETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNINGijcsit

What's hot(20)

Similar to Exploration of gaps in Bitly's spam detection and relevant countermeasures

Phishing detection & protection schemePhishing detection & protection scheme
Phishing detection & protection schemeMussavir Shaikh
IRJET- Detecting the Phishing Websites using Enhance Secure AlgorithmIRJET- Detecting the Phishing Websites using Enhance Secure Algorithm
IRJET- Detecting the Phishing Websites using Enhance Secure AlgorithmIRJET Journal
DETECTION OF MALICIOUS SOCIAL BOTS USING ML TECHNIQUE IN TWITTER NETWORKDETECTION OF MALICIOUS SOCIAL BOTS USING ML TECHNIQUE IN TWITTER NETWORK
DETECTION OF MALICIOUS SOCIAL BOTS USING ML TECHNIQUE IN TWITTER NETWORKIRJET Journal
Detecting Phishing Websites Using Machine LearningDetecting Phishing Websites Using Machine Learning
Detecting Phishing Websites Using Machine LearningIRJET Journal
PHISHING URL DETECTION USING LSTM BASED ENSEMBLE LEARNING APPROACHESPHISHING URL DETECTION USING LSTM BASED ENSEMBLE LEARNING APPROACHES
PHISHING URL DETECTION USING LSTM BASED ENSEMBLE LEARNING APPROACHESIJCNCJournal
Phishing URL Detection using LSTM Based Ensemble Learning ApproachesPhishing URL Detection using LSTM Based Ensemble Learning Approaches
Phishing URL Detection using LSTM Based Ensemble Learning ApproachesIJCNCJournal

Similar to Exploration of gaps in Bitly's spam detection and relevant countermeasures(20)

More from Cybersecurity Education and Research Centre

Novel Instruction Set Architecture Based Side Channels in popular SSL/TLS Imp...Novel Instruction Set Architecture Based Side Channels in popular SSL/TLS Imp...
Novel Instruction Set Architecture Based Side Channels in popular SSL/TLS Imp...Cybersecurity Education and Research Centre
Video Inpainting detection using inconsistencies in optical FlowVideo Inpainting detection using inconsistencies in optical Flow
Video Inpainting detection using inconsistencies in optical FlowCybersecurity Education and Research Centre
TASVEER : Tomography of India’s Internet InfrastructureTASVEER : Tomography of India’s Internet Infrastructure
TASVEER : Tomography of India’s Internet InfrastructureCybersecurity Education and Research Centre
Data-Driven Assessment of Cyber Risk: Challenges in Assessing and Migrating C...Data-Driven Assessment of Cyber Risk: Challenges in Assessing and Migrating C...
Data-Driven Assessment of Cyber Risk: Challenges in Assessing and Migrating C...Cybersecurity Education and Research Centre
A Strategy for Addressing Cyber Security Challenges A Strategy for Addressing Cyber Security Challenges
A Strategy for Addressing Cyber Security Challenges Cybersecurity Education and Research Centre
Clotho : Saving Programs from Malformed Strings and IncorrectClotho : Saving Programs from Malformed Strings and Incorrect
Clotho : Saving Programs from Malformed Strings and IncorrectCybersecurity Education and Research Centre

More from Cybersecurity Education and Research Centre(15)

Recently uploaded

Grumman TBF-1/TBM-1 Avenger I Pilot's Handbook.pdfGrumman TBF-1/TBM-1 Avenger I Pilot's Handbook.pdf
Grumman TBF-1/TBM-1 Avenger I Pilot's Handbook.pdfTahirSadikovi
Collection and transport of Solid Waste-SWM.pptxCollection and transport of Solid Waste-SWM.pptx
Collection and transport of Solid Waste-SWM.pptxvinodnejkar1
DCES_Classroom_Building_100_CDs.pdfDCES_Classroom_Building_100_CDs.pdf
DCES_Classroom_Building_100_CDs.pdfRoshelleStahl1
Airbus A330 Flight Crew Operating Manual PDF.pdfAirbus A330 Flight Crew Operating Manual PDF.pdf
Airbus A330 Flight Crew Operating Manual PDF.pdfTahirSadikovi
HYDRAULICS -  Gillesania.pdfHYDRAULICS -  Gillesania.pdf
HYDRAULICS - Gillesania.pdfPrinceQuimno
Cessna Model 406 Pilot Operating Handbook.pdfCessna Model 406 Pilot Operating Handbook.pdf
Cessna Model 406 Pilot Operating Handbook.pdfTahirSadikovi

Exploration of gaps in Bitly's spam detection and relevant countermeasures

Editor's Notes

  1. Dominated the market and gained major traction when Twitter started to use it as a default URL shortener in year 2009 before the launch of its own service, t.co in the year 2011.
  2. This snapshot on the right clearly shows the amount of space saved
  3. Thus task of an attacker becomes even more simple on bitly
  4. Here we can see some major attacks carried out by exploiting bitlyWe are interested in studying bitly not only because it is the most popular but also because attackers take undue advantage of the trust users have on Bitly. This makes them propagate a lot of spam using bitly as a medium. All this imposes additional performance pressure on Bitly to timely detect and handle illegitimate content
  5. Domain / IP level blacklist created from the occurrence of websites in unsolicited messages.Google Safebrowsing 2 is a repository of suspected phishing or malware pages maintained by Google Inc.
  6. APWG: International consortium that brings together businesses affected by phishing attacksFirst analysis that we did was to analyze the metadata to understand the content creation
  7. Referrer network is the network that is directing traffic to the maliciousbitly linksAll this makes it evident that the detection of malicious Bitly URLs can also help identify suspicious Twitter profiles. The results also highlights that only detection but no deletion of these Bitly links does not restrict the activities of spammers.After jaccard value 0.00012, the next value Explain jaccard similarity
  8. Connected OSM network of all encoders in link metric dataset
  9. *In order to infer a possible reason behind the connection of multiple twitter accounts with a single bitly account, we did a small experiment*We again computed the jaccard similarity scores and compared all these accounts with each otherNext we collected all their connected twitter accounts with their tweets and inter compared them based on their jaccard similarity scoresAll the bitly profiles were then also inter compared the same wayWith all this experiment and a little manual annotation, we could identify 3 malicious communities in our dataset
  10. Sharing pornographic content is banned according to Twitter’s policiesExistence of these malicious communities across Bitly and Twitter shows that spammers exploit Bitly’s policy of imposing no restriction on the number of connected OSM accountsThis community of 2 Bitly and 45 Twitter accounts appears to be an active spam campaign
  11. In order to inspect the efficiency of Bitly in identifying malicious links, we did a comparison against 3 popular blacklists
  12. Bitly does not claim to use APWG, but such low rate still looks alarming as APWG is a popular and trusted source to detect phishing. Since VirusTotal is a collated result set from 52 detection services, such high undetection rate again highlights how Bitly misses on a lot of spamWe also did similar kind of experiments using virustotal and SURBL, details of these experiments are documented in my thesis report
  13. To make users aware of the malicious profiles and uncoming risks* 2,558 encoders (20.72%) had at least 80% of their shortened URLs as malicious (Suspicion Factor >= 0.8)Highlights the malicious intent of these encoders on creating their Bitly accounts
  14. Our results in the above experiment clearly brings to light as to how Bitly users keep shortening only malicious URLs. Till this point of our study, it was unknown to us as to how (if at all) Bitly reacts to suspicious user profiles. To report our findings and get some insights, we made a blog entry on our initial data analysis
  15. This particular graph gives the month separated timeline for the bitly user bamsesang* Extracted Link creation time and Number of clicks* user bamsesang shortened all malicious links for 7 months, remained inactive for close to one year and then shortened the bad URLs again.
  16. * 7 of 10 malicious Bitly links with more than 70,000 warning pages are also active in terms of luring users and receiving clicks* important to study because it gives a clear picture about the persistence of already identified malignant URL propagation through Bitly network.* Restricted our study to only popular links because we wanted to capture the URLs with high overall impact.* Collected click history of top 1000 Bitly links from our link-dataset based on the number of warning pages reported in the month of October 2013 * Bitly detected these suspicious links months before, users are still getting trapped and visiting these links
  17. * In order to collect the data, we used Twitter Rest API 1 and its search method to get only tweets with a Bitly URL.* restricted our search query to bit.lySURBL 3 is a consolidated list of websites that appear in unsolicited messages. SURBL lookup feature allows a user to check a domain name against the ones blacklisted by SURBL. For this we used SURBL client library implemented in python.VirusTotal is an aggregated information warehouse of malicious links and domains as marked by 52 website scanning engines and contributed byusers.
  18. WHOIS is a query and response protocol that gives information like domain name, domain creation / updation date and domain expiration date for a particular URL.Non-Click based features are the ones which define general characteristics of a short Bitly URL and are independent of its click history. Clickbased features on the other hand depends on the click analytics of a Bitly link.
  19. Simple probabilistic classifier based on the Bayes' theorem, Maximum likelihood principle, Data point classified into the class with highest probability Most popular , Decision tree as a predictive model,Binary (yes / no) decisions at each level Multiple decision trees, Output class is the mode of classes output by individual trees, Can run efficiently on large datasets
  20. * Of the 8,000 malicious ground truth links in our labeled-dataset, we found that 3,693 (46.16%) links were never clicked by anyone. * Although this property itself serves as a feature in the above classication, but we believe that our classier should perform even better if we segregate all links with zero clicks.
  21. On using onlyclick-dataset, accuracy dropped to 74% -> though this is decent, but this shows that the click pattern is not very distinct for malicious / benign linksThus our algorithm is not only efficient in detecting malicious links after they receive clicks, but can also identify such malicious content much before it target its audience
  22. our data is only october 13, Since the characteristics of spammers change over time, we can do a detailed and comparative analysis on a more exhaustive datasetClassier currently works better on non-click dataset, because only 2 click based features used for the classificationThis would require a temporal analysis and can serve as another good feature for our classierProfile attributes not considered because it requires extracting encoder information and analyzing their click history for each link. We understand that such an evaluation is a time consuming process.As a part of our future work, we would like to include these parameters since incorporation of such discriminating features can help improve the overall performance ofour classifiers
  23. AnupamaAggarwal for shepherding me and spending her valuable time to review my thesis.Bitly and particularly Brian David Eo (senior data scientist) and Mark Josephson (CEO) for sharing the data with us.express my sincere gratitude to CERC at IIIT-Delhi for providing me an exposure to share my ideas with experts from different parts of the world. I thank my fellow lab-mates from Precog research group at IIIT- Delhi for all their encouragement and insightful comments. Last but not the least, I would like to thank all my family members and friends who encouraged and kept me motivated throughout the project.