Published on

Spam Filter

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Spam Filter -Apeksha Agarwal -Kashika Srivatava
  2. 2. What is spam?• Spam is the use of electronic messaging systems to send unsolicited bulk messages, especially 11/6/2012 advertising, indiscriminately. 2
  3. 3. Types of Spam• Email Spam ( Most Well Known, and topic for today )• Comment Spam ( Probably that’s why we have capcha ) 11/6/2012• Instant Messaging Spam ( E.g. In yahoo messengers, unknown messengers sending weird urls )• Junk Fax ( Your machine is printing hundreds of spam messages and you cant delete them, thankfully now a horror of past )• Unsolicited text messages. ( Offers make me think, I am luckiest girl alive )• Social Networking Spams ( They are send by your friend who clicks on similar message send by their friend ) 3
  4. 4. Geographical Origins of spams Origin or source of spam refers to the geographical location of the computer 11/6/2012 from which the spam is sent; it is not the country where the spammer resides, nor the country that hosts the spamvertised site. Interesting Fact: As much as 80% of spam received by Internet users in North America and Europe can be traced to fewer than 200 spammers 4
  5. 5. Spam Topics in Q3 2012 11/6/2012 5
  6. 6. Other Fast Facts• Spam accounts for 14.5 billion messages globally per day. In other words, spam makes up 45% of all emails. 11/6/2012• A 2004 survey estimated that lost productivity costs Internet users in the United States $21.58 billion annually.• People switched to gmail from yahoo because of better spam filter• Spam mails fill your email space and cause users to ask for more free space. Another technique used by gmail to lure users. 6
  7. 7. Current Works :Bayesian Model • Based on Document Filtering concept 11/6/2012Pr(S|W) is the probability that a message is a spam, knowing that the word "replica"is in it;Pr(S) is the overall probability that any given message is spam;Pr(W|S) is the probability that the word "replica" appears in spam messages;Pr(H) is the overall probability that any given message is not spam (is "ham");Pr(W|H) is the probability that the word "replica" appears in ham messages.Combining Words: p :is the probability that the suspect message is spam; p1: is the probability that it is a spam knowing it contains a first word (for example"replica");Problem:Bayesian Poisioning 7
  8. 8. Other Models( machine Learning Based)• Neural Networks• Graphical Models 11/6/2012• Logistic Regression• Support Vector Machines (SVMs)• all make fewer assumptions• These kinds of relationships between words implicitly or explicitly, at the expense of more complexity 8
  9. 9. MSR: Challenge Response system• Idea of Cynthia Dwork (now at Microsoft Research, Silicon Valley) and Moni Naor (at the Weizmann Institute of Science 11/6/2012 in Israel.)• First determine if a message is ham or spam and take action• Aim try to search even false positive spams.• Idea increase recall of ham messages• So you send challenge ofsmall puzzle to sender,who will answer it if it isgenuine• Spammers do not have time 9
  10. 10. My idea: Collaborative intelligence• Distinguish message as spam of ham from previous techniques• Try to warn user of probable spam from mails classified as 11/6/2012 ham, from response of other readers• A mail if send to 50 people. If it is classified as ham.• Check the rate if others recipients try to mark it as spam.• If a new user opens it, you say it is in inbox, but probably a spam, with some confidence.• User is pre warned of possible spam in his inbox. 10
  11. 11. References• Commtouch: Internet Threats Trend Report October 2012 11/6/2012 (• Semantic: Internet security report ( us.pdf)• Cisco: Security Report (• Wikipedia :• 84272.aspx• http://www.• 11• MSR: spam_edited2-times.pdf