Spam

Spam Filter
-Apeksha Agarwal
-Kashika Srivatava

What is spam?
• Spam is the use of electronic messaging systems to send
unsolicited bulk messages, especially

11/6/2012
advertising, indiscriminately.

2

Types of Spam
• Email Spam ( Most Well Known, and topic for today )
• Comment Spam ( Probably that’s why we have capcha )

11/6/2012
• Instant Messaging Spam ( E.g. In yahoo messengers, unknown
messengers sending weird urls )
• Junk Fax ( Your machine is printing hundreds of spam
messages and you cant delete them, thankfully now a horror
of past )
• Unsolicited text messages. ( Offers make me think, I am
luckiest girl alive )
• Social Networking Spams ( They are send by your friend who
clicks on similar message send by their friend )
3

Geographical Origins of spams
Origin or source of spam
refers to the geographical
location of the computer

11/6/2012
from which the spam is
sent; it is not the country
where the spammer
resides, nor the country that
hosts the spamvertised site.

Interesting Fact:
As much as 80% of spam
received by Internet users in
North America and Europe
can be traced to fewer than
200 spammers
4

Spam Topics in Q3 2012

11/6/2012
5

Other Fast Facts
• Spam accounts for 14.5 billion messages globally per day. In
other words, spam makes up 45% of all emails.

11/6/2012
• A 2004 survey estimated that lost productivity costs Internet
users in the United States $21.58 billion annually.

• People switched to gmail from yahoo because of better spam
filter

• Spam mails fill your email space and cause users to ask for
more free space. Another technique used by gmail to lure
users. 6

Current Works :Bayesian Model
• Based on Document Filtering concept

11/6/2012
Pr(S|W) is the probability that a message is a spam, knowing that the word "replica"
is in it;
Pr(S) is the overall probability that any given message is spam;
Pr(W|S) is the probability that the word "replica" appears in spam messages;
Pr(H) is the overall probability that any given message is not spam (is "ham");
Pr(W|H) is the probability that the word "replica" appears in ham messages.

Combining Words:

p :is the probability that the suspect message is spam;
p1: is the probability that it is a spam knowing it contains a first word (for example
"replica");

Problem:
Bayesian Poisioning
7

Other Models( machine Learning Based)
• Neural Networks
• Graphical Models

11/6/2012
• Logistic Regression
• Support Vector Machines (SVMs)
• all make fewer assumptions
• These kinds of relationships between words implicitly or
explicitly, at the expense of more complexity

8

MSR: Challenge Response system
• Idea of Cynthia Dwork (now at Microsoft Research, Silicon
Valley) and Moni Naor (at the Weizmann Institute of Science

11/6/2012
in Israel.)
• First determine if a message is ham or spam and take action
• Aim try to search even false positive spams.
• Idea increase recall of ham messages
• So you send challenge of
small puzzle to sender,
who will answer it if it is
genuine
• Spammers do not have time 9

My idea: Collaborative intelligence

• Distinguish message as spam of ham from previous techniques
• Try to warn user of probable spam from mails classified as

11/6/2012
ham, from response of other readers
• A mail if send to 50 people. If it is classified as ham.
• Check the rate if others recipients try to mark it as spam.
• If a new user opens it, you say it is in inbox, but probably a
spam, with some confidence.
• User is pre warned of possible spam in his inbox.

10

References

• Commtouch: Internet Threats Trend Report October 2012

11/6/2012
(http://www.commtouch.com/download/2389)

• Semantic: Internet security report
(http://www.symantec.com/content/en/us/enterprise/other_resources/b-istr_main_report_2011_21239364.en-
us.pdf)

• Cisco: Security Report
(http://www.cisco.com/en/US/prod/collateral/vpndevc/security_annual_report_2011.pdf)

• Wikipedia : http://en.wikipedia.org/wiki/Email_spam
• http://www.destinationcrm.com/Articles/Editorial/Magazine-Features/Avoid-the-Spam-Folder-
84272.aspx

• techsupportalert.com/content/how-why-switch-yahoo-mail-gmail.htm
http://www.

• http://www.spamhaus.org/statistics/countries/
11
• MSR:http://research.microsoft.com/en-us/um/people/joshuago/significance-
spam_edited2-times.pdf

Spam

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Spam

Similar to Spam (20)

Spam