SlideShare a Scribd company logo
1 of 11
Spam Filter
      -Apeksha Agarwal
      -Kashika Srivatava
What is spam?
• Spam is the use of electronic messaging systems to send
  unsolicited bulk messages, especially




                                                            11/6/2012
  advertising, indiscriminately.




                                                              2
Types of Spam
• Email Spam ( Most Well Known, and topic for today )
• Comment Spam ( Probably that’s why we have capcha )




                                                               11/6/2012
• Instant Messaging Spam ( E.g. In yahoo messengers, unknown
  messengers sending weird urls )
• Junk Fax ( Your machine is printing hundreds of spam
  messages and you cant delete them, thankfully now a horror
  of past )
• Unsolicited text messages. ( Offers make me think, I am
  luckiest girl alive )
• Social Networking Spams ( They are send by your friend who
  clicks on similar message send by their friend )
                                                                 3
Geographical Origins of spams
 Origin or source of spam
 refers to the geographical
 location of the computer




                                 11/6/2012
 from which the spam is
 sent; it is not the country
 where the spammer
 resides, nor the country that
 hosts the spamvertised site.

 Interesting Fact:
 As much as 80% of spam
 received by Internet users in
 North America and Europe
 can be traced to fewer than
 200 spammers
                                   4
Spam Topics in Q3 2012




                         11/6/2012
                           5
Other Fast Facts
• Spam accounts for 14.5 billion messages globally per day. In
  other words, spam makes up 45% of all emails.




                                                                  11/6/2012
• A 2004 survey estimated that lost productivity costs Internet
  users in the United States $21.58 billion annually.

• People switched to gmail from yahoo because of better spam
  filter

• Spam mails fill your email space and cause users to ask for
  more free space. Another technique used by gmail to lure
  users.                                                            6
Current Works :Bayesian Model
 • Based on Document Filtering concept




                                                                                          11/6/2012
Pr(S|W) is the probability that a message is a spam, knowing that the word "replica"
is in it;
Pr(S)     is the overall probability that any given message is spam;
Pr(W|S) is the probability that the word "replica" appears in spam messages;
Pr(H) is the overall probability that any given message is not spam (is "ham");
Pr(W|H) is the probability that the word "replica" appears in ham messages.

Combining Words:

 p :is the probability that the suspect message is spam;
 p1: is the probability that it is a spam knowing it contains a first word (for example
"replica");

Problem:
Bayesian Poisioning
                                                                                            7
Other Models( machine Learning Based)
•   Neural Networks
•   Graphical Models




                                                                11/6/2012
•   Logistic Regression
•   Support Vector Machines (SVMs)
•   all make fewer assumptions
•    These kinds of relationships between words implicitly or
    explicitly, at the expense of more complexity




                                                                  8
MSR: Challenge Response system
• Idea of Cynthia Dwork (now at Microsoft Research, Silicon
  Valley) and Moni Naor (at the Weizmann Institute of Science




                                                                11/6/2012
  in Israel.)
• First determine if a message is ham or spam and take action
• Aim try to search even false positive spams.
• Idea increase recall of ham messages
• So you send challenge of
small puzzle to sender,
who will answer it if it is
genuine
• Spammers do not have time                                       9
My idea: Collaborative intelligence

• Distinguish message as spam of ham from previous techniques
• Try to warn user of probable spam from mails classified as




                                                                   11/6/2012
  ham, from response of other readers
• A mail if send to 50 people. If it is classified as ham.
• Check the rate if others recipients try to mark it as spam.
• If a new user opens it, you say it is in inbox, but probably a
  spam, with some confidence.
• User is pre warned of possible spam in his inbox.



                                                                   10
References

• Commtouch: Internet Threats Trend Report October 2012




                                                                                                             11/6/2012
    (http://www.commtouch.com/download/2389)

• Semantic: Internet security report
    (http://www.symantec.com/content/en/us/enterprise/other_resources/b-istr_main_report_2011_21239364.en-
    us.pdf)

• Cisco: Security Report
    (http://www.cisco.com/en/US/prod/collateral/vpndevc/security_annual_report_2011.pdf)

• Wikipedia : http://en.wikipedia.org/wiki/Email_spam
• http://www.destinationcrm.com/Articles/Editorial/Magazine-Features/Avoid-the-Spam-Folder-
    84272.aspx

•            techsupportalert.com/content/how-why-switch-yahoo-mail-gmail.htm
    http://www.

• http://www.spamhaus.org/statistics/countries/
                                                                                                             11
• MSR:http://research.microsoft.com/en-us/um/people/joshuago/significance-
    spam_edited2-times.pdf

More Related Content

What's hot (20)

Basics of E-Mail
Basics of E-MailBasics of E-Mail
Basics of E-Mail
 
E Mail Ppt
E Mail PptE Mail Ppt
E Mail Ppt
 
Email use in business ppt
Email use in business pptEmail use in business ppt
Email use in business ppt
 
Email - Electronic Mail
Email - Electronic MailEmail - Electronic Mail
Email - Electronic Mail
 
Introduction to Email
Introduction to EmailIntroduction to Email
Introduction to Email
 
All about email
All about emailAll about email
All about email
 
E Mail & Spam Presentation
E Mail & Spam PresentationE Mail & Spam Presentation
E Mail & Spam Presentation
 
Electronic mail
Electronic mailElectronic mail
Electronic mail
 
Electronic mail
Electronic mailElectronic mail
Electronic mail
 
How e-mail, chat, ATM, Skype, VOIP, online submission (online forms), online ...
How e-mail, chat, ATM, Skype, VOIP, online submission (online forms), online ...How e-mail, chat, ATM, Skype, VOIP, online submission (online forms), online ...
How e-mail, chat, ATM, Skype, VOIP, online submission (online forms), online ...
 
E mail protocol - SMTP
E mail protocol - SMTPE mail protocol - SMTP
E mail protocol - SMTP
 
Introduction to Internet
Introduction to InternetIntroduction to Internet
Introduction to Internet
 
Basic Email
Basic EmailBasic Email
Basic Email
 
Lecture 9 electronic_mail_representation_and_transfer
Lecture 9 electronic_mail_representation_and_transferLecture 9 electronic_mail_representation_and_transfer
Lecture 9 electronic_mail_representation_and_transfer
 
Email - electronic mail
Email - electronic mailEmail - electronic mail
Email - electronic mail
 
Email ppt
Email pptEmail ppt
Email ppt
 
The internet
The internetThe internet
The internet
 
HISTORY OF THE INTERNET
HISTORY OF THE INTERNETHISTORY OF THE INTERNET
HISTORY OF THE INTERNET
 
Email: Introduction
Email: IntroductionEmail: Introduction
Email: Introduction
 
how email works
how email workshow email works
how email works
 

Viewers also liked

Spamming and Spam Filtering
Spamming and Spam FilteringSpamming and Spam Filtering
Spamming and Spam FilteringiNazneen
 
10 tips to promote your content without spamming people
10 tips to promote your content without spamming people10 tips to promote your content without spamming people
10 tips to promote your content without spamming peopleMark Schaefer
 
E mail image spam filtering techniques
E mail image spam filtering techniquesE mail image spam filtering techniques
E mail image spam filtering techniquesranjit banshpal
 
Spamming Ict
Spamming   IctSpamming   Ict
Spamming Ictsiewying
 

Viewers also liked (7)

Spam Filtering
Spam FilteringSpam Filtering
Spam Filtering
 
Spam and Anti Spam Techniques
Spam and Anti Spam TechniquesSpam and Anti Spam Techniques
Spam and Anti Spam Techniques
 
Spamming and Spam Filtering
Spamming and Spam FilteringSpamming and Spam Filtering
Spamming and Spam Filtering
 
10 tips to promote your content without spamming people
10 tips to promote your content without spamming people10 tips to promote your content without spamming people
10 tips to promote your content without spamming people
 
E mail image spam filtering techniques
E mail image spam filtering techniquesE mail image spam filtering techniques
E mail image spam filtering techniques
 
Spamming Ict
Spamming   IctSpamming   Ict
Spamming Ict
 
What is SPAM?
What is SPAM?What is SPAM?
What is SPAM?
 

Similar to Spam

Modern cyber threats_and_how_to_combat_them_panel
Modern cyber threats_and_how_to_combat_them_panelModern cyber threats_and_how_to_combat_them_panel
Modern cyber threats_and_how_to_combat_them_panelRamsés Gallego
 
Introduction to apache spark and machine learning
Introduction to apache spark and machine learningIntroduction to apache spark and machine learning
Introduction to apache spark and machine learningAwoyemi Ezekiel
 
A review of spam filtering and measures of antispam
A review of spam filtering and measures of antispamA review of spam filtering and measures of antispam
A review of spam filtering and measures of antispamAlexander Decker
 
David Troy - Presentation at Emerging Communications Conference & Awards (eCo...
David Troy - Presentation at Emerging Communications Conference & Awards (eCo...David Troy - Presentation at Emerging Communications Conference & Awards (eCo...
David Troy - Presentation at Emerging Communications Conference & Awards (eCo...eCommConf
 
Tcf 335 chapter 11 12 email and mobile 2014
Tcf 335 chapter 11 12 email and mobile 2014Tcf 335 chapter 11 12 email and mobile 2014
Tcf 335 chapter 11 12 email and mobile 2014carleigh2000
 
Winning the Big Data SPAM Challenge__HadoopSummit2010
Winning the Big Data SPAM Challenge__HadoopSummit2010Winning the Big Data SPAM Challenge__HadoopSummit2010
Winning the Big Data SPAM Challenge__HadoopSummit2010Yahoo Developer Network
 
Internet etiquette
Internet etiquetteInternet etiquette
Internet etiquetteAdy Setiawan
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Editor IJCATR
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Editor IJCATR
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Editor IJCATR
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Editor IJCATR
 
Lesson 2_Rules_of_Netiquette.pptx
Lesson 2_Rules_of_Netiquette.pptxLesson 2_Rules_of_Netiquette.pptx
Lesson 2_Rules_of_Netiquette.pptxJoy Dugayo
 

Similar to Spam (20)

Modern cyber threats_and_how_to_combat_them_panel
Modern cyber threats_and_how_to_combat_them_panelModern cyber threats_and_how_to_combat_them_panel
Modern cyber threats_and_how_to_combat_them_panel
 
Fighting Spam at Flickr
Fighting Spam at FlickrFighting Spam at Flickr
Fighting Spam at Flickr
 
Introduction to apache spark and machine learning
Introduction to apache spark and machine learningIntroduction to apache spark and machine learning
Introduction to apache spark and machine learning
 
A review of spam filtering and measures of antispam
A review of spam filtering and measures of antispamA review of spam filtering and measures of antispam
A review of spam filtering and measures of antispam
 
David Troy - Presentation at Emerging Communications Conference & Awards (eCo...
David Troy - Presentation at Emerging Communications Conference & Awards (eCo...David Troy - Presentation at Emerging Communications Conference & Awards (eCo...
David Troy - Presentation at Emerging Communications Conference & Awards (eCo...
 
B0940509
B0940509B0940509
B0940509
 
Tcf 335 chapter 11 12 email and mobile 2014
Tcf 335 chapter 11 12 email and mobile 2014Tcf 335 chapter 11 12 email and mobile 2014
Tcf 335 chapter 11 12 email and mobile 2014
 
Malware
MalwareMalware
Malware
 
Winning the Big Data SPAM Challenge__HadoopSummit2010
Winning the Big Data SPAM Challenge__HadoopSummit2010Winning the Big Data SPAM Challenge__HadoopSummit2010
Winning the Big Data SPAM Challenge__HadoopSummit2010
 
Spam
SpamSpam
Spam
 
Fighting spam
Fighting spamFighting spam
Fighting spam
 
Aisb cyberbullying
Aisb cyberbullyingAisb cyberbullying
Aisb cyberbullying
 
402 406
402 406402 406
402 406
 
Internet etiquette
Internet etiquetteInternet etiquette
Internet etiquette
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
 
Lesson 2_Rules_of_Netiquette.pptx
Lesson 2_Rules_of_Netiquette.pptxLesson 2_Rules_of_Netiquette.pptx
Lesson 2_Rules_of_Netiquette.pptx
 
Guide to pc_security
Guide to pc_securityGuide to pc_security
Guide to pc_security
 

Spam

  • 1. Spam Filter -Apeksha Agarwal -Kashika Srivatava
  • 2. What is spam? • Spam is the use of electronic messaging systems to send unsolicited bulk messages, especially 11/6/2012 advertising, indiscriminately. 2
  • 3. Types of Spam • Email Spam ( Most Well Known, and topic for today ) • Comment Spam ( Probably that’s why we have capcha ) 11/6/2012 • Instant Messaging Spam ( E.g. In yahoo messengers, unknown messengers sending weird urls ) • Junk Fax ( Your machine is printing hundreds of spam messages and you cant delete them, thankfully now a horror of past ) • Unsolicited text messages. ( Offers make me think, I am luckiest girl alive ) • Social Networking Spams ( They are send by your friend who clicks on similar message send by their friend ) 3
  • 4. Geographical Origins of spams Origin or source of spam refers to the geographical location of the computer 11/6/2012 from which the spam is sent; it is not the country where the spammer resides, nor the country that hosts the spamvertised site. Interesting Fact: As much as 80% of spam received by Internet users in North America and Europe can be traced to fewer than 200 spammers 4
  • 5. Spam Topics in Q3 2012 11/6/2012 5
  • 6. Other Fast Facts • Spam accounts for 14.5 billion messages globally per day. In other words, spam makes up 45% of all emails. 11/6/2012 • A 2004 survey estimated that lost productivity costs Internet users in the United States $21.58 billion annually. • People switched to gmail from yahoo because of better spam filter • Spam mails fill your email space and cause users to ask for more free space. Another technique used by gmail to lure users. 6
  • 7. Current Works :Bayesian Model • Based on Document Filtering concept 11/6/2012 Pr(S|W) is the probability that a message is a spam, knowing that the word "replica" is in it; Pr(S) is the overall probability that any given message is spam; Pr(W|S) is the probability that the word "replica" appears in spam messages; Pr(H) is the overall probability that any given message is not spam (is "ham"); Pr(W|H) is the probability that the word "replica" appears in ham messages. Combining Words: p :is the probability that the suspect message is spam; p1: is the probability that it is a spam knowing it contains a first word (for example "replica"); Problem: Bayesian Poisioning 7
  • 8. Other Models( machine Learning Based) • Neural Networks • Graphical Models 11/6/2012 • Logistic Regression • Support Vector Machines (SVMs) • all make fewer assumptions • These kinds of relationships between words implicitly or explicitly, at the expense of more complexity 8
  • 9. MSR: Challenge Response system • Idea of Cynthia Dwork (now at Microsoft Research, Silicon Valley) and Moni Naor (at the Weizmann Institute of Science 11/6/2012 in Israel.) • First determine if a message is ham or spam and take action • Aim try to search even false positive spams. • Idea increase recall of ham messages • So you send challenge of small puzzle to sender, who will answer it if it is genuine • Spammers do not have time 9
  • 10. My idea: Collaborative intelligence • Distinguish message as spam of ham from previous techniques • Try to warn user of probable spam from mails classified as 11/6/2012 ham, from response of other readers • A mail if send to 50 people. If it is classified as ham. • Check the rate if others recipients try to mark it as spam. • If a new user opens it, you say it is in inbox, but probably a spam, with some confidence. • User is pre warned of possible spam in his inbox. 10
  • 11. References • Commtouch: Internet Threats Trend Report October 2012 11/6/2012 (http://www.commtouch.com/download/2389) • Semantic: Internet security report (http://www.symantec.com/content/en/us/enterprise/other_resources/b-istr_main_report_2011_21239364.en- us.pdf) • Cisco: Security Report (http://www.cisco.com/en/US/prod/collateral/vpndevc/security_annual_report_2011.pdf) • Wikipedia : http://en.wikipedia.org/wiki/Email_spam • http://www.destinationcrm.com/Articles/Editorial/Magazine-Features/Avoid-the-Spam-Folder- 84272.aspx • techsupportalert.com/content/how-why-switch-yahoo-mail-gmail.htm http://www. • http://www.spamhaus.org/statistics/countries/ 11 • MSR:http://research.microsoft.com/en-us/um/people/joshuago/significance- spam_edited2-times.pdf