Spam Wars


Published on

Spam problems and antispam techniques for Drupal 6. Presented at the Stanford Design4Drupal conference, January 2010

Published in: Technology, News & Politics
1 Comment
1 Like
  • your post of spammers mail video is too good...
    Here is a blog about malicious scammers and lots of scam awareness videos.Please visit that
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Spam Wars

  1. 1. SPAM Wars Maurice Green, PhD Design 4 Drupal January 24, 2010
  2. 2. What is SPAM? (According to Wikipedia)
  3. 3. A canned meat product…
  4. 4. A “Weird Al” Yankovic song
  5. 5. A Monty Python comedy skit
  6. 6. Smooth Particle Applied Mechanics <ul><li>the use of smoothed particle hydrodynamics computation to study impact fractures in solids. </li></ul>
  7. 7. SPAM <ul><li>the abuse of electronic messaging systems (including most broadcast media, digital delivery systems) to send unsolicited bulk messages indiscriminately. </li></ul><ul><li>Named after the Monty Python sketch </li></ul><ul><li>May include: email, junk fax, instant messaging, internet forums, blogs and social networking. </li></ul><ul><li>Wikipedia </li></ul>
  8. 8. Why Website SPAM? <ul><li>User registration and login </li></ul><ul><li>Content spam in blog comments </li></ul><ul><li>Information harvesting (email scraping) </li></ul><ul><li>Hyperlinking for page rank </li></ul>
  9. 9. User Registration and Login <ul><li>Allows spammer access to email systems, especially trusted ones like gmail, hotmail and yahoo. </li></ul><ul><li>May allow spammer to post site link in user profile. </li></ul><ul><li>Allows spammer to post comments and other content spam in forums. </li></ul>
  10. 11. Registration SPAM
  11. 12. Controlling User Registration <ul><li>Validate user email address </li></ul><ul><li>Require user confirmation link </li></ul><ul><li>CAPTCHA </li></ul><ul><li>Gotcha </li></ul><ul><li>Validate against SPAM database </li></ul>
  12. 13. User Registration Page
  13. 14. Failed Registration
  14. 15.
  15. 16. Confirmation Email <ul><li>RegistrationTest, </li></ul><ul><li>Thank you for registering at Silicon Valley User Group Alliance. You may now </li></ul><ul><li>log in to http:// /user using the following username and </li></ul><ul><li>password: </li></ul><ul><li>username: RegistrationTest </li></ul><ul><li>password: TQxaQxN9Kz </li></ul><ul><li>You may also log in by clicking on this link or copying and pasting it in </li></ul><ul><li>your browser: </li></ul><ul><li> </li></ul><ul><li>This is a one-time login, so it can be used only once. </li></ul><ul><li>After logging in, you will be redirected to </li></ul><ul><li> so you can change your password. </li></ul><ul><li>-- Silicon Valley User Group Alliance team </li></ul>
  16. 17. Confirmation Login
  17. 18. CAPTCHA <ul><li>A contrived acronym for &quot; C ompletely A utomated P ublic T uring test to tell C omputers and H umans A part.“ </li></ul><ul><li>A type of challenge-response test used in computing to ensure that the response is not generated by a computer. The process usually involves one computer (a server) asking a user to complete a simple test which the computer is able to generate and grade. </li></ul><ul><li>A reverse Turing test (a computer testing a human). </li></ul>
  18. 19. CAPTCHA
  19. 20. CAPTCHA
  20. 21. CAPTCHA
  21. 22. CAPTCHA <ul><li>Newer forms include: </li></ul><ul><ul><li>Multiple captchas in a single challenge </li></ul></ul><ul><ul><li>Image recognition </li></ul></ul><ul><ul><li>Puzzle solving </li></ul></ul><ul><ul><li>Logical reasoning tests </li></ul></ul>
  22. 23. Non-character CAPTCHA <ul><li>Image CAPTCHA -- Scalable and simple to use, but if the letters in the CAPTCHA are not distorted enough, they can be cracked by OCR techniques; if they are too distorted, visitors can't read. Image CAPTCHA is slowly being replaced by other CAPTCHA. </li></ul><ul><li>Math-based CAPTCHA -- Uses a simple interface and simple to use, but not scalable because most spam bots can parse the CAPTCHA questions. </li></ul><ul><li>Other text-based CAPTCHA -- Most text-based CAPTCHA are also quite simple, but they provide about 4 solution candidates, giving brute force spam a success rate of 1 in 4. </li></ul><ul><li>(Open) Semantic CAPTCHA -- Scalable and intuitive, but they are rare. The CAPTCHA generation is unsupervised, so they are generally hard to build. <-- This is where Egglue belongs. </li></ul>
  23. 24. CAPTCHA <ul><li>Difficult for visually impaired or illiterate users. Images can not be read by screen reader programs. </li></ul><ul><ul><li>May add “audio” captcha to test </li></ul></ul><ul><li>May be susceptible to improved OCR </li></ul><ul><li>Can be defeated by ‘relay’ attacks using cheap labor or malware. </li></ul>
  24. 25. CAPTCHA Legal Issues <ul><li>Circumvention of CAPTCHA violates the Digital Millenium Copyright Act (Ticketmaster vs. RMG Technologies). </li></ul><ul><li>CAPTCHA without audio alternate may violate Americans with Disabilities Act. </li></ul>
  25. 26. Image CAPTCHA <ul><li>Microsoft ‘Asirra’ free website system uses a dog and cat recognition captcha with a database of 3,000,000 images. </li></ul><ul><li>Stanford researchers have reported being able to crack this captcha with an algorithm that is 82.7% accurate in distinguishing between dogs and cats. </li></ul>
  26. 27. Image CAPTCHA
  27. 28. Google’s Socially Adjusted CAPTCHA
  28. 29. Another Image CAPTCHA
  29. 30. 3-Dimensional Image CAPTCHA
  30. 31. Humerous Image CAPTCHA
  31. 32. Egglue CAPTCHA Features <ul><li>CAPTCHA challenges require only basic intuitions about the world. </li></ul><ul><li>The distinct CAPTCHA challenges and layout cannot be handled by typical spam bots. </li></ul><ul><li>Different from conventional CAPTCHA, visitors are free to use any words deemed fit. </li></ul><ul><li>Unlike other CAPTCHA web services, Egglue CAPTCHA does not require registration. </li></ul>
  32. 33. Egglue Semantic CAPTCHA
  33. 34. The Battle Against the Captcha <ul><li>“There's no system foolproof enough to defeat a sufficiently great fool.” </li></ul><ul><li>Edward Teller </li></ul>
  34. 35. Classified Ads for Anti-CAPTCHA Services <ul><li>“ Our Team is very much interested in your project and we could easily handle more than 50,000 captcha entries per day.” </li></ul><ul><li>“ We having more then 10 teams,we are operating 24/7 data entry works and delivering 700k/day captchas daily.” </li></ul><ul><li>“ I have a team of 7 people, willing to do captchas at $2 per 1000 entries. Please consider my bid. We can definitely provide 50K captchas per day.” </li></ul>
  35. 36. Classified Ads for Anti-CAPTCHA Services <ul><li>“ I have 40 PCs and 55 Persons working in my office for data entry work. As 1 person can do 800 captcha entry per hour. We can deliver you good quantity with quality.” </li></ul><ul><li>“ My team is equipped to offer the services. 20 person team, T1 business speed internet with an on hand technical staff. We are able to start right away..” </li></ul>
  36. 37. Classified Ads for Anti-CAPTCHA Services <ul><li>“ Hello Sir, I will kindly introduce myself.. This is shivakumar.. we have a team to type capcthas 24/7 and we can type more than 200k captchas per day.” </li></ul><ul><li>“ WE ARE PROFESSIONAL CAPCHA ENTRY OPEATORS AND WE CAN DO EVEN 25000 ENTRIES PER DAY AS MY COMPANY IS A 25 SEATER FIRM SPEALISED IN DATA ENTRY.” </li></ul>
  37. 38. Anti-CAPTCHA System
  38. 39. Indian Anti-CAPTCHA System <ul><li>“ Our captcha system is very complex and complicated. It is built to process up to one million captchas per day. We have several big teams and hundreds of active agents solving captchas, all at one time, especially during daytime in India. The backend of this project involves over 45 powerful, expensive servers communicating with the MySpace site to pull the captchas and then queue them up on this site, and then process the results to push back to MySpace all within 20 seconds per captcha..” </li></ul>
  39. 40. Test for Anti-CAPTCHA worker
  40. 42. Captchar_A Trojan (2007) <ul><li>Why pay anyone to solve the captcha when you can get someone to do it for FREE? </li></ul>
  41. 43. Captchar_A Trojan (2007)
  42. 44. Captchar_A Trojan (2007)
  43. 45. Drupal Anti-Spam Modules
  44. 46. AntiSpam <ul><li>AntiSpam module is the successor of the Akismet module, and it provides spam protection to your drupal site using external antispam service like Akismet. </li></ul><ul><li>AntiSpam module is fully compatible with Drupal 6.x (Akismet module for Drupal 6.x release had many compatibility issues and was not usable as it was), and it expanded the support of the external antispam service with TypePad AntiSpam and Defensio service as well as Akismet service. Now you can choose one of the antispam service you wish to use. </li></ul>
  45. 47. AntiSpam <ul><li>Supported Anti-spam Services </li></ul><ul><li>Akismet http:// </li></ul><ul><li>TypePad AntiSpam http:// </li></ul><ul><li>Defensio http:// </li></ul>
  46. 48. Bad Behavior <ul><li>Spammers run automated scripts which: </li></ul><ul><ul><li>read everything on your web site, </li></ul></ul><ul><ul><li>harvest email addresses, and </li></ul></ul><ul><ul><li>will post spam directly to your site, </li></ul></ul><ul><ul><li>put false referrers in your server log trying to get their links posted through your stats page. </li></ul></ul><ul><li>As the operator of a Web site, this can cause you several problems. The spammers are: </li></ul><ul><ul><li>wasting your bandwidth, </li></ul></ul><ul><ul><li>posting comments to any form they can find, </li></ul></ul><ul><ul><li>filling your web site with unwanted ads for their products. </li></ul></ul><ul><ul><li>harvesting any email addresses they can find and sell those to other spammers. </li></ul></ul>
  47. 49. Bad Behavior <ul><li>Bad Behavior is a set of PHP scripts which prevents spambots from accessing your site by analyzing their actual HTTP requests and comparing them to profiles from known spambots. It goes far beyond User-Agent and Referer, however. </li></ul><ul><li>Bad Behavior intends to target any malicious software directed at a Web site, whether it be a spambot, ill-designed search engine bot, or system crackers. It blocks such access and then logs their attempts. </li></ul>
  48. 50. Block Anonymous Links <ul><li>This is a simple module which blocks comments from anonymous users that contain links. </li></ul><ul><li>It relies on the fact that most spam messages contain hyperlinks and also on the fact that (for now) (most) spambots don't register on the sites they want to spam. It tries to block comment-spam at an early stage. </li></ul><ul><li>If an anonymous user tries to post a comment which contains a link, he/she will get a message explaining why the comment has been blocked. </li></ul>
  49. 51. BlogSpam <ul><li>BlogSpam provides a central location where comments can be checked for various spam indicators. </li></ul><ul><li>The BlogSpam service makes use of a plugin architecture to provide checking. If you are running your own blogspam server then the plugin list may vary. At present the following plugins are available (and running on the public BlogSpam server at </li></ul><ul><li>Checks available: </li></ul><ul><li>00blacklist Is the given IP blacklisted? </li></ul><ul><li>00whitelist Is the given IP whitelisted? </li></ul>
  50. 52. BlogSpam <ul><li>badip Block a comment if the IP address it has been submitted from has been locally blacklisted. The local blacklist is read from /etc/blogspam/badips and each line is assumed to be a Class C address. </li></ul><ul><li>bogusip Is this an internal IP? That might be fine for local use, but in the real world such IPs are not going to be seen and can be safely marked as spam. </li></ul><ul><li>dnsrbl Test whether the IP address submitting the comment is listed in the DNS RBL </li></ul><ul><li>dropme This plugin is a simple test one - if a comment mentions the IP address it is coming from in the subject along with the key then we'll always report it as spam. </li></ul>
  51. 53. BlogSpam <ul><li>emailtests Perform some simple tests on the submitted email address. </li></ul><ul><li>lotsaurls Block if we find more than a given number of links in message. The default is 10 links, but this may be changed by the caller. </li></ul><ul><li>requiremx If we've got an email address make sure that the domain : a. Has an MX record. </li></ul><ul><li>sfs Test whether the IP address submitting the comment is listed in the database. </li></ul>
  52. 54. BlogSpam <ul><li>size Is the given post too large, or too small? </li></ul><ul><li>stopwords Block if we find some particular stop-words in the body of the message. </li></ul><ul><li>surbl Lookup each URL in the body of the comment and test against </li></ul><ul><li>wordcount Block posts that are only a few words long. </li></ul>
  53. 55. Comment Lockdown <ul><li>Comment Lockdown is a drug of last resort in battling comment spam. You should not use this if you haven't tried something less likely to cause side effects like Mollom. You should continue use of Mollom with Comment Lockdown. </li></ul><ul><li>This module has some very specific rules for comments, and unlike Mollom, is incapable of learning, has no settings, does not care what kind of user you are, and rejects anything written in a language other than English. </li></ul>
  54. 56. Comment Lockdown <ul><li>* Link (A) tags cannot account for more than 20% of all characters. * No more than 20% of all characters can be non-ASCII--this accounts for words like &quot;fiancé&quot; while preventing comments in other languages. * At least 10% of all words must be in the list of top 100 English words. * Javascript must be enabled. This isn't foolproof by any means, but a spam robot would have to be customized to defeat it. </li></ul><ul><li>These rules aren't arbitrary; they're based on experience with The New York Observer's massive database of spam comments. This module won't help sites that accept comments in languages other than English. </li></ul>
  55. 57. Gotcha <ul><li>Gotcha is sort of a take off on &quot;captcha.&quot; </li></ul><ul><li>Basically you place a bogus input field on a contact form, and use CSS to not display it. </li></ul><ul><li>On submission you check for a value. If there is a value entered, then that means a non-human has been blanketing form fields, and the form post can be ignored as spam. </li></ul><ul><li>Gotcha adds a field labeled &quot;Subject&quot; at the top of the contact form. </li></ul><ul><ul><li>It uses a &quot;div&quot; tag to render the field as &quot;display: none&quot; so human users shouldn't see it, and won't enter any data there. </li></ul></ul><ul><ul><li>The suspected spam bot will see &quot;Subject&quot; and be enticed to enter something there. </li></ul></ul><ul><ul><li>There is descriptive text to encourage a human (whose browser might be set to display it anyway) to ignore this field. </li></ul></ul><ul><li>Requires the Spam module (for content filters) </li></ul>
  56. 58. InvisiMail <ul><li>Invisimail provides a content filter to hide email addresses from spam-bots. </li></ul><ul><li>Email addresses are converted to ascii code and optionally written to the page using a concatenated JavaScript &quot;write&quot; command. The email addresses will appear on the page normally, but their html source will be obscured so as not to appear as an email address to email harvesting robots. </li></ul><ul><li>Invisimail also provides an option to automatically create mailto links for email addresses. </li></ul><ul><li>Obviously, the best protection is not to publish email addresses at all. But on a community site, some users are going to do this regardless. Invisimail provides protection for these email addresses. </li></ul>
  57. 59. IP Anonymize <ul><li>Stale IP addresses clog up your database with useless data, not to mention, may be subject to subpoena by legal authorities in some jurisdictions. </li></ul><ul><li>The IP anonymize module helps ensure users' privacy by establishing a retention policy for IP addresses logged in Drupal's database tables. IP addresses are scrubbed on each cron run according to a configurable retention period. For example, you may wish to preserve IP addresses for a short while for purposes of identifying spam. </li></ul>
  58. 60. Mollom <ul><li>Mollom provides a one stop solution for all spam problems and can protect the following Drupal forms. It offers and intelligently combines: </li></ul><ul><ul><li>CAPTCHAs -- both image and audio CAPTCHAs </li></ul></ul><ul><ul><li>text analysis </li></ul></ul><ul><ul><li>user reputations </li></ul></ul><ul><li>and can: </li></ul><ul><ul><li>block comment form spam </li></ul></ul><ul><ul><li>block contact form spam </li></ul></ul><ul><ul><li>protect the user registration form against fake user accounts </li></ul></ul><ul><ul><li>protect the password request form </li></ul></ul><ul><ul><li>block spam on any node form, such as forum topics, articles, stories, pages, and more </li></ul></ul>
  59. 61. reCAPTCHA <ul><li>Uses the reCAPTCHA web service to improve the CAPTCHA system and protect email addresses. </li></ul>
  60. 62. Riddler <ul><li>While modules like akismet and spam offer a great way of filtering the spam after it's being submitted the purpose of Riddler is to compliment these modules by catching it before it gets submitted. </li></ul><ul><li>What exactly does it do? </li></ul><ul><ul><li>This module will add a 'riddle' to a form of your choice (using Captcha) requiring guests to answer the 'riddle' before being allowed to submit the form. </li></ul></ul><ul><li>This is the default question: </li></ul><ul><ul><li>Q: Do you hate spam? (yes or no) </li></ul></ul><ul><ul><li>A: yes </li></ul></ul><ul><ul><li>Questions and answers can be customized through the '/admin/user/captcha/riddler' page. </li></ul></ul>
  61. 63. Spamicide <ul><li>The purpose of Spamicide is to prevent spam submission to any form on your Drupal web site. </li></ul><ul><li>Spamicide adds an input field to each form then hides it with css. The field, and matching .css file, are named in such a way as to not let on that it is a spam defeating device, and can be set by admins to almost anything they like(machine readable please). </li></ul><ul><li>When spam bots fill in the field the form is discarded. </li></ul><ul><li>If logging is set, the log will show if and when a particular form has been compromised, and the admin can change the form's field name (and corresponding .css file) to something else. </li></ul>
  62. 64. SpamSpan <ul><li>The SpamSpan module obfuscates email addresses to help prevent spambots from collecting them. It implements the technique at the SpamSpan website . </li></ul><ul><li>The problem with most email address obfuscators is that they rely upon JavaScript being enabled on the client side. This makes the technique inaccessible to people with screen readers. </li></ul><ul><li>SpamSpan however will produce clickable links if JavaScript is enabled, and will show the email address as example [at] example [dot] com if the browser does not support JavaScript or if JavaScript is disabled. </li></ul><ul><li>This technique is unlikely to be absolutely foolproof. It is possible in theory for a determined spambot to harvest addresses from your site no matter how you disguise them. </li></ul><ul><li>The great majority of spambots do not bother to attempt to collect addresses which have been hidden using JavaScript. Indeed, most spambots cannot currently read JavaScript at all. </li></ul>
  63. 65. http:BL <ul><li>Implementation of http:BL for Drupal. http:BL can prevent email address harvesters and comment spammers from visiting your site by using a centralized DNS blacklist. It requires a free Project Honey Pot membership . This module provides efficient blacklist lookups and blocks malicious visitors effectively. </li></ul><ul><li>Note that the module can also function without http:BL functionality -- its use will then be limited to the placement of a Honeypot link in the footer. </li></ul>
  64. 66. http:BL <ul><li>Features: </li></ul><ul><ul><li>http:BL lookups for visitor IPs </li></ul></ul><ul><ul><li>Blocking of requests coming from blacklisted IPs </li></ul></ul><ul><ul><li>Database caching, decreasing the number of DNS lookups </li></ul></ul><ul><ul><li>Honeypot link placement on ban page and optionally in footer </li></ul></ul><ul><ul><li>Custom ban message </li></ul></ul><ul><ul><li>Whitelisting through the access table (admin/user/access) </li></ul></ul><ul><ul><li>Greylisting: grants the user access if he passes a simple test </li></ul></ul><ul><ul><li>Checking only when a comment is placed, queueing the comment for moderation if the lookup is positive </li></ul></ul><ul><ul><li>Basic statistics on the number of blocked visits </li></ul></ul>
  65. 67. Honeypots <ul><li>In computer terminology, a honeypot is a trap set to detect, deflect, or in some manner counteract attempts at unauthorized use of information systems. </li></ul><ul><li>Generally it consists of a computer, data, or a network site that appears to be part of a network, but is actually isolated, (un)protected, and monitored, and which seems to contain information or a resource of value to attackers. </li></ul>
  66. 68. Honeypots <ul><li>The software installed on, and run by, victim hosts is dual purpose: </li></ul><ul><ul><li>these dummy programs keep a network intruder occupied looking for valuable information where none exists. </li></ul></ul><ul><ul><li>The second part of the victim host strategy is intelligence gathering. Once an intruder has broken into the victim host, the machine or a network administrator can examine the intrusion methods used by the intruder. </li></ul></ul>
  67. 69.