The Anatomy of Comment Spam

1,002 views

Published on

Comment spammers are most often motivated by search engine optimization for the purposes of advertisement, click fraud, and malware distribution. By spamming multiple targets over a long period of time, spammers are able to gain profit, and do harm. Comment spam attacks can cripple a website, impacting uptime, and compromise the user experience. Quickly identifying the source of an attack can greatly limit the attack’s effectiveness and minimize its impact on your website. This presentation will:

- Present an attack from both points of views – the attacker's and the victim’s
- Identify tools utilized by comment spam attackers
- Discuss mitigation techniques to stop comment spam in its early stages

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,002
On SlideShare
0
From Embeds
0
Number of Embeds
17
Actions
Shares
0
Downloads
22
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

The Anatomy of Comment Spam

  1. 1. © 2014 Imperva, Inc. All rights reserved. The Anatomy of Comment Spam Shelly Hershkovitz, Sr. Security Research Engineer, Imperva 1
  2. 2. © 2014 Imperva, Inc. All rights reserved. Agenda 2 §  Comment Spam - What & Why? §  Comment Spam Attacks §  Data Analysis §  Mitigation Techniques §  Case Studies §  Conclusion §  Q&A
  3. 3. © 2014 Imperva, Inc. All rights reserved. Shelly Hershkovitz, Sr. Security Research Engineer, Imperva 3 §  Leads the efforts to capture and analyze hacking activities •  Authored several Hacker Intelligence Initiative (HII) Reports §  Experienced in machine learning and computer vision §  Holds BA in Computer Science & M.Sc degree in Bio-Medical Engineering
  4. 4. © 2014 Imperva, Inc. All rights reserved. Comment Spam - What & Why? 4 §  What? •  Wikipedia: ”Comment spam is a term used to refer to a broad category of spam bot postings which abuse web-based forms to post unsolicited advertisements as comments on forums, blogs, wikis and online guest books.” §  Why? •  Search engine optimization •  Advertisements •  Malware distribution •  Click fraud
  5. 5. © 2014 Imperva, Inc. All rights reserved. Search Engine Optimization 5 MyWebSite.com OtherWebSite.com OtherBlog.com OtherWebSite.com OtherNewsWebSite.com Backlink Backlink
  6. 6. © 2014 Imperva, Inc. All rights reserved. Comment Spam Attack 6 Target Acquisition Comment Generation Posting Verification
  7. 7. © 2014 Imperva, Inc. All rights reserved. Comment Spam in Practice 7 §  Success relies on large scales §  Automated tools are used §  Inputs •  The site to be promoted •  Relevant keywords
  8. 8. © 2014 Imperva, Inc. All rights reserved. §  URL Harvesting •  Locate relevant websites •  Locate suitable URLs for commenting §  An alternative – buy ‘Quality URLs’ lists •  A typical price is $40 for ~13,000 URLs Target Acquisition 8
  9. 9. © 2014 Imperva, Inc. All rights reserved. Selecting the Targets 9 Target Selection Relevance Quality Difficulty Policy •  Relevance: Relevance to the promoted site •  Quality: The URL’s own search engine ranking •  Difficulty: The difficulty of posting comments (Captcha) •  Policy: The site’s policy regarding search engine (follow/ nofollow attribute)
  10. 10. © 2014 Imperva, Inc. All rights reserved. Target Acquisition in Action 10
  11. 11. © 2014 Imperva, Inc. All rights reserved. §  Verbal comments attached to the promoted site •  Input keywords Comment Generation 11
  12. 12. © 2014 Imperva, Inc. All rights reserved. Comment Generation in Action 12
  13. 13. © 2014 Imperva, Inc. All rights reserved. §  Post comments on many URLs §  Authentication, CAPTCHA, or user details handling Posting 13
  14. 14. © 2014 Imperva, Inc. All rights reserved. Posting in Action 14
  15. 15. © 2014 Imperva, Inc. All rights reserved. §  Collect feedback whether or not the comments were posted Verification 15
  16. 16. © 2014 Imperva, Inc. All rights reserved. Verification in Action 16
  17. 17. © 2014 Imperva, Inc. All rights reserved. Comment Spam in Action 17
  18. 18. © 2014 Imperva, Inc. All rights reserved. §  17% of the attackers generated 58% of comment spam traffic Data Analysis 18
  19. 19. © 2014 Imperva, Inc. All rights reserved. §  80% of comment spam traffic is generated by 28% of attackers Data Analysis 19 28.00% Source IP
  20. 20. © 2014 Imperva, Inc. All rights reserved. Mitigation Techniques 20 §  Content inspection §  Source reputation §  Anti-automation §  Demotivation §  Manual inspection
  21. 21. © 2014 Imperva, Inc. All rights reserved. Mitigation Techniques: Content Inspection 21 §  Inspecting the content of the posted comments §  Rule based •  Large number of links •  Logical sentences not related to the subject §  Akismet
  22. 22. © 2014 Imperva, Inc. All rights reserved. Mitigation Techniques: Source Reputation 22 §  Based on the reputation of the poster §  Online repositories based on crowdsourcing
  23. 23. © 2014 Imperva, Inc. All rights reserved. Mitigation Techniques: Anti-Automation 23 §  Anti-automation tools •  CAPTCHA •  Check-box for posting the comment •  Client type classification
  24. 24. © 2014 Imperva, Inc. All rights reserved. Mitigation Techniques: Demotivation 24 §  Make comment spam useless §  Follow/nofollow value of the rel attribute of an HTML anchor <A> •  Specifies whether a link should be followed by search engines §  Penguin update for Google search engine algorithms
  25. 25. © 2014 Imperva, Inc. All rights reserved. Mitigation Techniques: Manual Inspection 25 §  Effective but not scalable §  Effective against manual comment spam
  26. 26. © 2014 Imperva, Inc. All rights reserved. Case Studies 26 §  Attack Target: Specific Victim §  Attack Source: Specific Attacking IP §  Google App Engine
  27. 27. © 2014 Imperva, Inc. All rights reserved. §  A non-profit organization §  A single host with many URLs §  Our theory associates popular phrases within the URL address and page content, to the attack rate Specific Victim 27 Numberof Attacks
  28. 28. © 2014 Imperva, Inc. All rights reserved. §  52% of source IPs produce 80% of the traffic Specific Victim 28 52% Source IP
  29. 29. © 2014 Imperva, Inc. All rights reserved. Specific Attacking IP 29 §  Comment spam posting from a specific IP §  Rapid response (IP reputation feed) would have significantly reduce the impact of the attack Numberof Attacks
  30. 30. © 2014 Imperva, Inc. All rights reserved. §  Five target websites were attacked from this source §  Most had suffered a relative high amount of comment spam attacks Specific Attacking IP 30 1 41% 2 25% 3 21% 4 11% 5 2% Percentage of Traffic per Target
  31. 31. © 2014 Imperva, Inc. All rights reserved. §  Hyperlinks in a single request are for different websites §  Consecutive requests have similar hyperlinks §  Using different URLs for the same website avoids bad reputation Specific Attacking IP 31
  32. 32. © 2014 Imperva, Inc. All rights reserved. Case Studies: Google App Engine 32 §  Google App Engine can be used to spread comment spam through proxy services §  This technique can be used to bypass IP based mitigations
  33. 33. © 2014 Imperva, Inc. All rights reserved. Conclusion 33 §  Comment spam is a prosperous industry •  Many tools and services are available for comment spam generation and distribution §  Identifying the attacker as a comment spammer early on and blocking its requests prevents most of the malicious activity •  Reputation based controls are effective (IP / source application) §  Reputation based controls must be combined with some content based controls to avoid false positives §  Anti-automation and bot-detection controls can reduce the likelihood of an application becoming a target
  34. 34. © 2014 Imperva, Inc. All rights reserved. Webinar Materials 34 Post-Webinar Discussions Answers to Attendee Questions Webinar Recording Link Join Group Join Imperva LinkedIn Group, Imperva Data Security Direct, for…
  35. 35. © 2014 Imperva, Inc. All rights reserved. www.imperva.com 35

×