Unveiling the gray emails: A Closer Look at Emails in the Gray Area

Shades of Grey:
A Closer Look at Emails in the Gray Area
Jelena Isacenkova
Davide Balzarotti

June 23, 2014 Eurecom 2
Evolution of Spam
Spam rate100%
0%
50%
1994 1997 1998
Abuse of dynamic
dial-up IP addresses
Lawyers
Canter and Siegel
commercial spam scandal
Message classifiers
(Bayesian)
RBLs

Evolution of Spam
2002 2003
Release of “Ratware”
spamming tools:
DarkMailer, SenderSafe
Open-relay for
proxying spam
Appearance of viruses
automatically downloading
email lists
Spam rate100%
0%
50%
9%
40%
Directive 2002/58 on
Privacy and Electronic
Communications
CAN-SPAM
Act of 2003
1994 1997 1998
Abuse of dynamic
Lawyers
Canter and Siegel
Message classifiers
(Bayesian)
RBLs

Evolution of Spam
2002 2003 2004 2007
2008 2009-2012
Release of “Ratware”
spamming tools:
DarkMailer, SenderSafe
Open-relay for
proxying spam
Appearance of viruses
automatically downloading
email lists
First botnets:
Bagle, Bobax
Distributed spamming tool:
Reactor Mailer
Spam rate100%
0%
50%
9%
40%
72%
85%
Spammers got
sentenced
Srizbi takedown
7 botnet takedowns
Directive 2002/58 on
Privacy and Electronic
Communications
CAN-SPAM
Act of 2003
68%
1994 1997 1998
Abuse of dynamic
Lawyers
Canter and Siegel
Message classifiers
(Bayesian)
RBLs

Botnet spam
419 scam
Phishing
Targeted Email Attacks
Spear Phishing
Blackhole Spam
Snowshoe Spam
Personal User Emails
GRAY
Email Categories
SPAM HAM
GRAY

Botnet spam
419 scam
Phishing
Spear Phishing
Blackhole Spam
Snowshoe Spam
Newsletters
Notifications
GRAY
Email Categories
SPAM HAM
GRAY
Customer Prospecting
Commercial ads

Gmail Spam folder

Gmail Spam folder
Within our study users
checked 5-6 messages per day
1.5% of harmful spam emails had
a malicious attachment

How significant gray category is?

Botnet spam
419 scam
Phishing
Spear Phishing
Blackhole Spam
Snowshoe Spam
GRAY
Gray Category in 2007
SPAM HAM
GRAY
Newsletters
Notifications
Commercial ads“Most misclassified ham messages are advertising, news digests, …
[that] represent a small fraction of incoming mail, ... [which] filters
find more difficult to classify.”
- Cormack & Lynam, “Online Supervised Spam Filter
Evaluation”, 2007

Botnet spam
419 scam
Phishing
Spear Phishing
Blackhole Spam
Snowshoe Spam
GRAY
Gray Category in 2012
SPAM HAM
GRAY
“49% of consumers subscribe to 1-10 brands”
- Direct Marketing Association
“70% of 'this is spam' are actually
legitimate newsletters, offers or
notifications”
- 2012, ReturnPath
Newsletters
Notifications
“Graymail emails represent 50% of all
inbox traffic”
- 2012, Hotmail
“Graymail – the source of 75% of all
spam complaints”
- 2012, Hotmail
Commercial ads

Selecting a gray email dataset

Challenge-Response (CR) filtering

Ham
Spam

Gray email analysis

Identification and classification
of campaigns
N-grams
Classification
LEGITIMATESPAM
Evaluation of email headers similarity per campaign
Grouping emails into campaigns
- Campaign sender consistency
and geo-distribution
- Delivery statistics
- CAPTCHAs solved
- Bulk headers
Exact string matching
Limitation: only email
header information
was used

Identification and classification
of campaigns
N-grams
Classification
LEGITIMATESPAM
Evaluation of email headers similarity per campaign
Grouping emails into campaigns
- Campaign sender consistency
and geo-distribution
- Delivery rejections
- CAPTCHAs solved
- Bulk headers
Exact string matching― False Positives: 0.9%
― False Negatives: 8.6%
― Classifier uncertainty zone: 6.4%
18% 82%

Refinement with Graph Analysis
SPAM: 16%
UNCERTAIN: 7%
LEGITIMATE: 77%

SPAM: 16%
UNCERTAIN: 7%
LEGITIMATE: 77%
- Decompose into groups with a
community finding algorithm
- Propagate labels in homogeneous groups

SPAM: 16%
UNCERTAIN: 7%
LEGITIMATE: 77%
- Extract graph metrics
- Compare them with known clusters

SPAM: 16%
UNCERTAIN: 7%
LEGITIMATE: 77%
False positives drop from 0.9% to 0.2%

Campaign types

Campaign Categories

Campaign Categories
Snowshoe spammers?

Campaign Categories

Campaign Categories
The owners websites underline the fact
that “they are not spammers”, and that they
provide to other companies a way to send
marketing emails within the boundaries of
the current legislation

Gray Email Campaign Categories
― Commercial campaigns (42% of total)
─ Use wide IP address ranges to run the campaigns
─ Provide a pre-compiled list of categorized email addresses
─ Distributed, but consistent campaign sending patterns
― Newsletters and notifications
― Botnet-generated campaigns
― Scam and phishing campaigns
─ Behavior similar to
commercial camp.
─ Hide behind webmail accounts

User Behavior
Users are pro-active
towards newsletters

User Behavior
But also curious to check
on malicious/illegal content
- 20% of the users have opened botnet-generated emails
- Each user on average viewing 5 messages

Summary

Summary
― Presented a first empirical study of gray emails and commercial and
newsletter campaigns
― Classified 50% of the gray emails (15% of all incoming email) and
categorized into 4 categories
― Lessons learned:
─ Email classification cannot stay binary anymore
─ By neglecting gray emails and placing them in spam folder, we increase
user security threat level instead of helping to lower it
─ Scam campaigns, especially sent from webmail accounts, were the most
challenging to deal with

Questions

Unveiling the gray emails: A Closer Look at Emails in the Gray Area

Recommended

Recommended

More Related Content

Similar to Unveiling the gray emails: A Closer Look at Emails in the Gray Area

Similar to Unveiling the gray emails: A Closer Look at Emails in the Gray Area (7)

Recently uploaded

Recently uploaded (20)

Unveiling the gray emails: A Closer Look at Emails in the Gray Area