SlideShare a Scribd company logo
Shades of Grey:
A Closer Look at Emails in the Gray Area
Jelena Isacenkova
Davide Balzarotti
June 23, 2014 Eurecom 2
Evolution of Spam
Spam rate100%
0%
50%
1994 1997 1998
Abuse of dynamic
dial-up IP addresses
Lawyers
Canter and Siegel
commercial spam scandal
Message classifiers
(Bayesian)
RBLs
June 23, 2014 Eurecom 3
Evolution of Spam
2002 2003
Release of “Ratware”
spamming tools:
DarkMailer, SenderSafe
Open-relay for
proxying spam
Appearance of viruses
automatically downloading
email lists
Spam rate100%
0%
50%
9%
40%
Directive 2002/58 on
Privacy and Electronic
Communications
CAN-SPAM
Act of 2003
1994 1997 1998
Abuse of dynamic
dial-up IP addresses
Lawyers
Canter and Siegel
commercial spam scandal
Message classifiers
(Bayesian)
RBLs
June 23, 2014 Eurecom 4
Evolution of Spam
2002 2003 2004 2007
2008 2009-2012
Release of “Ratware”
spamming tools:
DarkMailer, SenderSafe
Open-relay for
proxying spam
Appearance of viruses
automatically downloading
email lists
First botnets:
Bagle, Bobax
Distributed spamming tool:
Reactor Mailer
Spam rate100%
0%
50%
9%
40%
72%
85%
Spammers got
sentenced
Srizbi takedown
7 botnet takedowns
Directive 2002/58 on
Privacy and Electronic
Communications
CAN-SPAM
Act of 2003
68%
1994 1997 1998
Abuse of dynamic
dial-up IP addresses
Lawyers
Canter and Siegel
commercial spam scandal
Message classifiers
(Bayesian)
RBLs
June 23, 2014 Eurecom 5
Botnet spam
419 scam
Phishing
Targeted Email Attacks
Spear Phishing
Blackhole Spam
Snowshoe Spam
Personal User Emails
GRAY
Email Categories
SPAM HAM
GRAY
June 23, 2014 Eurecom 6
Botnet spam
419 scam
Phishing
Targeted Email Attacks
Spear Phishing
Blackhole Spam
Snowshoe Spam
Personal User Emails
Newsletters
Notifications
GRAY
Email Categories
SPAM HAM
GRAY
Customer Prospecting
Commercial ads
June 23, 2014 Eurecom 7
Gmail Spam folder
June 23, 2014 Eurecom 8
Gmail Spam folder
Within our study users
checked 5-6 messages per day
1.5% of harmful spam emails had
a malicious attachment
June 23, 2014 Eurecom 9
How significant gray category is?
June 23, 2014 Eurecom 10
Botnet spam
419 scam
Phishing
Targeted Email Attacks
Spear Phishing
Blackhole Spam
Snowshoe Spam
Personal User Emails
GRAY
Gray Category in 2007
SPAM HAM
GRAY
Newsletters
Notifications
Customer Prospecting
Commercial ads“Most misclassified ham messages are advertising, news digests, …
[that] represent a small fraction of incoming mail, ... [which] filters
find more difficult to classify.”
- Cormack & Lynam, “Online Supervised Spam Filter
Evaluation”, 2007
June 23, 2014 Eurecom 11
Botnet spam
419 scam
Phishing
Targeted Email Attacks
Spear Phishing
Blackhole Spam
Snowshoe Spam
Personal User Emails
GRAY
Gray Category in 2012
SPAM HAM
GRAY
“49% of consumers subscribe to 1-10 brands”
- Direct Marketing Association
“70% of 'this is spam' are actually
legitimate newsletters, offers or
notifications”
- 2012, ReturnPath
Newsletters
Notifications
Customer Prospecting
“Graymail emails represent 50% of all
inbox traffic”
- 2012, Hotmail
“Graymail – the source of 75% of all
spam complaints”
- 2012, Hotmail
Commercial ads
June 23, 2014 Eurecom 12
Selecting a gray email dataset
June 23, 2014 Eurecom 13
Challenge-Response (CR) filtering
June 23, 2014 Eurecom 14
Challenge-Response (CR) filtering
Ham
Spam
June 23, 2014 Eurecom 15
Challenge-Response (CR) filtering
Ham
Spam
June 23, 2014 Eurecom 16
Gray email analysis
June 23, 2014 Eurecom 17
Identification and classification
of campaigns
N-grams
Classification
LEGITIMATESPAM
Evaluation of email headers similarity per campaign
Grouping emails into campaigns
- Campaign sender consistency
and geo-distribution
- Delivery statistics
- CAPTCHAs solved
- Bulk headers
Exact string matching
Limitation: only email
header information
was used
June 23, 2014 Eurecom 18
Identification and classification
of campaigns
N-grams
Classification
LEGITIMATESPAM
Evaluation of email headers similarity per campaign
Grouping emails into campaigns
- Campaign sender consistency
and geo-distribution
- Delivery rejections
- CAPTCHAs solved
- Bulk headers
Exact string matching― False Positives: 0.9%
― False Negatives: 8.6%
― Classifier uncertainty zone: 6.4%
18% 82%
June 23, 2014 Eurecom 19
Refinement with Graph Analysis
SPAM: 16%
UNCERTAIN: 7%
LEGITIMATE: 77%
June 23, 2014 Eurecom 20
Refinement with Graph Analysis
SPAM: 16%
UNCERTAIN: 7%
LEGITIMATE: 77%
- Decompose into groups with a
community finding algorithm
- Propagate labels in homogeneous groups
June 23, 2014 Eurecom 21
Refinement with Graph Analysis
SPAM: 16%
UNCERTAIN: 7%
LEGITIMATE: 77%
- Extract graph metrics
- Compare them with known clusters
June 23, 2014 Eurecom 22
Refinement with Graph Analysis
SPAM: 16%
UNCERTAIN: 7%
LEGITIMATE: 77%
False positives drop from 0.9% to 0.2%
June 23, 2014 Eurecom 23
Campaign types
June 23, 2014 Eurecom 24
Campaign Categories
June 23, 2014 Eurecom 25
Campaign Categories
Snowshoe spammers?
June 23, 2014 Eurecom 26
Campaign Categories
June 23, 2014 Eurecom 27
Campaign Categories
The owners websites underline the fact
that “they are not spammers”, and that they
provide to other companies a way to send
marketing emails within the boundaries of
the current legislation
June 23, 2014 Eurecom 28
Gray Email Campaign Categories
― Commercial campaigns (42% of total)
─ Use wide IP address ranges to run the campaigns
─ Provide a pre-compiled list of categorized email addresses
─ Distributed, but consistent campaign sending patterns
― Newsletters and notifications
― Botnet-generated campaigns
― Scam and phishing campaigns
─ Behavior similar to
commercial camp.
─ Hide behind webmail accounts
June 23, 2014 Eurecom 29
Gray Email Campaign Categories
― Commercial campaigns (42% of total)
─ Use wide IP address ranges to run the campaigns
─ Provide a pre-compiled list of categorized email addresses
─ Distributed, but consistent campaign sending patterns
― Newsletters and notifications
― Botnet-generated campaigns
― Scam and phishing campaigns
─ Behavior similar to
commercial camp.
─ Hide behind webmail accounts
June 23, 2014 Eurecom 30
User Behavior
Users are pro-active
towards newsletters
June 23, 2014 Eurecom 31
User Behavior
Users are pro-active
towards newsletters
June 23, 2014 Eurecom 32
User Behavior
But also curious to check
on malicious/illegal content
- 20% of the users have opened botnet-generated emails
- Each user on average viewing 5 messages
June 23, 2014 Eurecom 33
User Behavior
But also curious to check
on malicious/illegal content
- 20% of the users have opened botnet-generated emails
- Each user on average viewing 5 messages
June 23, 2014 Eurecom 34
Summary
June 23, 2014 Eurecom 35
Summary
― Presented a first empirical study of gray emails and commercial and
newsletter campaigns
― Classified 50% of the gray emails (15% of all incoming email) and
categorized into 4 categories
― Lessons learned:
─ Email classification cannot stay binary anymore
─ By neglecting gray emails and placing them in spam folder, we increase
user security threat level instead of helping to lower it
─ Scam campaigns, especially sent from webmail accounts, were the most
challenging to deal with
June 23, 2014 Eurecom 36
Questions

More Related Content

Similar to Unveiling the gray emails: A Closer Look at Emails in the Gray Area

IRJET- Image Spam Detection: Problem and Existing Solution
IRJET-  	  Image Spam Detection: Problem and Existing SolutionIRJET-  	  Image Spam Detection: Problem and Existing Solution
IRJET- Image Spam Detection: Problem and Existing Solution
IRJET Journal
 
Internet Security Threat Report 2014 :: Volume 19 Appendices - The hardcore n...
Internet Security Threat Report 2014 :: Volume 19 Appendices - The hardcore n...Internet Security Threat Report 2014 :: Volume 19 Appendices - The hardcore n...
Internet Security Threat Report 2014 :: Volume 19 Appendices - The hardcore n...
Symantec
 
Kavach Spam tracker - Fresh Spar Technologies - Manojkumar C
Kavach Spam tracker - Fresh Spar Technologies - Manojkumar CKavach Spam tracker - Fresh Spar Technologies - Manojkumar C
Kavach Spam tracker - Fresh Spar Technologies - Manojkumar C
Manojkumar C
 
Kavach Spam tracker - Fresh Spar Technologies - Manojkumar C
Kavach Spam tracker - Fresh Spar Technologies - Manojkumar CKavach Spam tracker - Fresh Spar Technologies - Manojkumar C
Kavach Spam tracker - Fresh Spar Technologies - Manojkumar C
Manojkumar C
 
ACO-email spam filtering
ACO-email spam filtering ACO-email spam filtering
ACO-email spam filtering
Sukhvir Singh Lal
 
Spamming and Spam Filtering
Spamming and Spam FilteringSpamming and Spam Filtering
Spamming and Spam Filtering
iNazneen
 
83517754.ppt
83517754.ppt83517754.ppt
83517754.ppt
ssuser2e304b
 

Similar to Unveiling the gray emails: A Closer Look at Emails in the Gray Area (7)

IRJET- Image Spam Detection: Problem and Existing Solution
IRJET-  	  Image Spam Detection: Problem and Existing SolutionIRJET-  	  Image Spam Detection: Problem and Existing Solution
IRJET- Image Spam Detection: Problem and Existing Solution
 
Internet Security Threat Report 2014 :: Volume 19 Appendices - The hardcore n...
Internet Security Threat Report 2014 :: Volume 19 Appendices - The hardcore n...Internet Security Threat Report 2014 :: Volume 19 Appendices - The hardcore n...
Internet Security Threat Report 2014 :: Volume 19 Appendices - The hardcore n...
 
Kavach Spam tracker - Fresh Spar Technologies - Manojkumar C
Kavach Spam tracker - Fresh Spar Technologies - Manojkumar CKavach Spam tracker - Fresh Spar Technologies - Manojkumar C
Kavach Spam tracker - Fresh Spar Technologies - Manojkumar C
 
Kavach Spam tracker - Fresh Spar Technologies - Manojkumar C
Kavach Spam tracker - Fresh Spar Technologies - Manojkumar CKavach Spam tracker - Fresh Spar Technologies - Manojkumar C
Kavach Spam tracker - Fresh Spar Technologies - Manojkumar C
 
ACO-email spam filtering
ACO-email spam filtering ACO-email spam filtering
ACO-email spam filtering
 
Spamming and Spam Filtering
Spamming and Spam FilteringSpamming and Spam Filtering
Spamming and Spam Filtering
 
83517754.ppt
83517754.ppt83517754.ppt
83517754.ppt
 

Recently uploaded

aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
hozt8xgk
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
Vandana Devesh Sharma
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
Areesha Ahmad
 
Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
Sciences of Europe
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
Carl Bergstrom
 
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills MN
 
Modelo de slide quimica para powerpoint
Modelo  de slide quimica para powerpointModelo  de slide quimica para powerpoint
Modelo de slide quimica para powerpoint
Karen593256
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
PirithiRaju
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
International Food Policy Research Institute- South Asia Office
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
Frédéric Baudron
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
frank0071
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
PirithiRaju
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
by6843629
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
PirithiRaju
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Leonel Morgado
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
MaheshaNanjegowda
 

Recently uploaded (20)

aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
 
Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
 
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
 
Modelo de slide quimica para powerpoint
Modelo  de slide quimica para powerpointModelo  de slide quimica para powerpoint
Modelo de slide quimica para powerpoint
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
 

Unveiling the gray emails: A Closer Look at Emails in the Gray Area

  • 1. Shades of Grey: A Closer Look at Emails in the Gray Area Jelena Isacenkova Davide Balzarotti
  • 2. June 23, 2014 Eurecom 2 Evolution of Spam Spam rate100% 0% 50% 1994 1997 1998 Abuse of dynamic dial-up IP addresses Lawyers Canter and Siegel commercial spam scandal Message classifiers (Bayesian) RBLs
  • 3. June 23, 2014 Eurecom 3 Evolution of Spam 2002 2003 Release of “Ratware” spamming tools: DarkMailer, SenderSafe Open-relay for proxying spam Appearance of viruses automatically downloading email lists Spam rate100% 0% 50% 9% 40% Directive 2002/58 on Privacy and Electronic Communications CAN-SPAM Act of 2003 1994 1997 1998 Abuse of dynamic dial-up IP addresses Lawyers Canter and Siegel commercial spam scandal Message classifiers (Bayesian) RBLs
  • 4. June 23, 2014 Eurecom 4 Evolution of Spam 2002 2003 2004 2007 2008 2009-2012 Release of “Ratware” spamming tools: DarkMailer, SenderSafe Open-relay for proxying spam Appearance of viruses automatically downloading email lists First botnets: Bagle, Bobax Distributed spamming tool: Reactor Mailer Spam rate100% 0% 50% 9% 40% 72% 85% Spammers got sentenced Srizbi takedown 7 botnet takedowns Directive 2002/58 on Privacy and Electronic Communications CAN-SPAM Act of 2003 68% 1994 1997 1998 Abuse of dynamic dial-up IP addresses Lawyers Canter and Siegel commercial spam scandal Message classifiers (Bayesian) RBLs
  • 5. June 23, 2014 Eurecom 5 Botnet spam 419 scam Phishing Targeted Email Attacks Spear Phishing Blackhole Spam Snowshoe Spam Personal User Emails GRAY Email Categories SPAM HAM GRAY
  • 6. June 23, 2014 Eurecom 6 Botnet spam 419 scam Phishing Targeted Email Attacks Spear Phishing Blackhole Spam Snowshoe Spam Personal User Emails Newsletters Notifications GRAY Email Categories SPAM HAM GRAY Customer Prospecting Commercial ads
  • 7. June 23, 2014 Eurecom 7 Gmail Spam folder
  • 8. June 23, 2014 Eurecom 8 Gmail Spam folder Within our study users checked 5-6 messages per day 1.5% of harmful spam emails had a malicious attachment
  • 9. June 23, 2014 Eurecom 9 How significant gray category is?
  • 10. June 23, 2014 Eurecom 10 Botnet spam 419 scam Phishing Targeted Email Attacks Spear Phishing Blackhole Spam Snowshoe Spam Personal User Emails GRAY Gray Category in 2007 SPAM HAM GRAY Newsletters Notifications Customer Prospecting Commercial ads“Most misclassified ham messages are advertising, news digests, … [that] represent a small fraction of incoming mail, ... [which] filters find more difficult to classify.” - Cormack & Lynam, “Online Supervised Spam Filter Evaluation”, 2007
  • 11. June 23, 2014 Eurecom 11 Botnet spam 419 scam Phishing Targeted Email Attacks Spear Phishing Blackhole Spam Snowshoe Spam Personal User Emails GRAY Gray Category in 2012 SPAM HAM GRAY “49% of consumers subscribe to 1-10 brands” - Direct Marketing Association “70% of 'this is spam' are actually legitimate newsletters, offers or notifications” - 2012, ReturnPath Newsletters Notifications Customer Prospecting “Graymail emails represent 50% of all inbox traffic” - 2012, Hotmail “Graymail – the source of 75% of all spam complaints” - 2012, Hotmail Commercial ads
  • 12. June 23, 2014 Eurecom 12 Selecting a gray email dataset
  • 13. June 23, 2014 Eurecom 13 Challenge-Response (CR) filtering
  • 14. June 23, 2014 Eurecom 14 Challenge-Response (CR) filtering Ham Spam
  • 15. June 23, 2014 Eurecom 15 Challenge-Response (CR) filtering Ham Spam
  • 16. June 23, 2014 Eurecom 16 Gray email analysis
  • 17. June 23, 2014 Eurecom 17 Identification and classification of campaigns N-grams Classification LEGITIMATESPAM Evaluation of email headers similarity per campaign Grouping emails into campaigns - Campaign sender consistency and geo-distribution - Delivery statistics - CAPTCHAs solved - Bulk headers Exact string matching Limitation: only email header information was used
  • 18. June 23, 2014 Eurecom 18 Identification and classification of campaigns N-grams Classification LEGITIMATESPAM Evaluation of email headers similarity per campaign Grouping emails into campaigns - Campaign sender consistency and geo-distribution - Delivery rejections - CAPTCHAs solved - Bulk headers Exact string matching― False Positives: 0.9% ― False Negatives: 8.6% ― Classifier uncertainty zone: 6.4% 18% 82%
  • 19. June 23, 2014 Eurecom 19 Refinement with Graph Analysis SPAM: 16% UNCERTAIN: 7% LEGITIMATE: 77%
  • 20. June 23, 2014 Eurecom 20 Refinement with Graph Analysis SPAM: 16% UNCERTAIN: 7% LEGITIMATE: 77% - Decompose into groups with a community finding algorithm - Propagate labels in homogeneous groups
  • 21. June 23, 2014 Eurecom 21 Refinement with Graph Analysis SPAM: 16% UNCERTAIN: 7% LEGITIMATE: 77% - Extract graph metrics - Compare them with known clusters
  • 22. June 23, 2014 Eurecom 22 Refinement with Graph Analysis SPAM: 16% UNCERTAIN: 7% LEGITIMATE: 77% False positives drop from 0.9% to 0.2%
  • 23. June 23, 2014 Eurecom 23 Campaign types
  • 24. June 23, 2014 Eurecom 24 Campaign Categories
  • 25. June 23, 2014 Eurecom 25 Campaign Categories Snowshoe spammers?
  • 26. June 23, 2014 Eurecom 26 Campaign Categories
  • 27. June 23, 2014 Eurecom 27 Campaign Categories The owners websites underline the fact that “they are not spammers”, and that they provide to other companies a way to send marketing emails within the boundaries of the current legislation
  • 28. June 23, 2014 Eurecom 28 Gray Email Campaign Categories ― Commercial campaigns (42% of total) ─ Use wide IP address ranges to run the campaigns ─ Provide a pre-compiled list of categorized email addresses ─ Distributed, but consistent campaign sending patterns ― Newsletters and notifications ― Botnet-generated campaigns ― Scam and phishing campaigns ─ Behavior similar to commercial camp. ─ Hide behind webmail accounts
  • 29. June 23, 2014 Eurecom 29 Gray Email Campaign Categories ― Commercial campaigns (42% of total) ─ Use wide IP address ranges to run the campaigns ─ Provide a pre-compiled list of categorized email addresses ─ Distributed, but consistent campaign sending patterns ― Newsletters and notifications ― Botnet-generated campaigns ― Scam and phishing campaigns ─ Behavior similar to commercial camp. ─ Hide behind webmail accounts
  • 30. June 23, 2014 Eurecom 30 User Behavior Users are pro-active towards newsletters
  • 31. June 23, 2014 Eurecom 31 User Behavior Users are pro-active towards newsletters
  • 32. June 23, 2014 Eurecom 32 User Behavior But also curious to check on malicious/illegal content - 20% of the users have opened botnet-generated emails - Each user on average viewing 5 messages
  • 33. June 23, 2014 Eurecom 33 User Behavior But also curious to check on malicious/illegal content - 20% of the users have opened botnet-generated emails - Each user on average viewing 5 messages
  • 34. June 23, 2014 Eurecom 34 Summary
  • 35. June 23, 2014 Eurecom 35 Summary ― Presented a first empirical study of gray emails and commercial and newsletter campaigns ― Classified 50% of the gray emails (15% of all incoming email) and categorized into 4 categories ― Lessons learned: ─ Email classification cannot stay binary anymore ─ By neglecting gray emails and placing them in spam folder, we increase user security threat level instead of helping to lower it ─ Scam campaigns, especially sent from webmail accounts, were the most challenging to deal with
  • 36. June 23, 2014 Eurecom 36 Questions