Every day, millions of users spend a considerable amount of time browsing through the messages in their spam folders. In this work we look into the often overlooked area of emails -- gray emails, i.e., those messages that cannot
be clearly categorized one way or the other by automated
spam filters. We analyze real-world emails by
grouping them into clusters of 4 categories of bulk email campaigns, where some contain potentially harmful content, and some not, thus having a different security risk levels on the users.
Using lexigraphical distancing, an algorithm that estimates the probability that one term is an edited version of another, can help identify variants of spam terms like "Viagra" and improve spam detection rates. The algorithm identified 51 out of 60 variants of Viagra, while spell checking only caught 24. When included as a pre-processing step in a naive Bayes classifier, it incorrectly flagged few additional good emails as spam but caught 27% more spam messages, improving spam detection rates. Lexigraphical distancing provides a robust way to identify term variants for applications like spam filtering and content analysis.
Improving urban health in African megacities, a case study of Lagos ( A Prese...Dr. Ebele Mogo
This document discusses considerations for improving urban health outcomes in African megacities, using Lagos, Nigeria as a case study. It analyzes progress on key social determinants of health in the Lagos megacity region (LMCR), including transportation infrastructure, governance challenges, and recommendations. Transportation in LMCR faces issues like inadequate infrastructure, poor maintenance, lack of integrated systems, and overreliance on motor vehicles. This leads to health problems from traffic, accidents, and pollution. The document recommends expanding diversified transportation and improving governance coordination between various agencies to better deliver social services.
DEVNET-1129 WAN Automation Engine - Develop Traffic Aware Applications Using ...Cisco DevNet
The Cisco WAN Automation Engine (WAE) is multivendor software designed to automate, plan, build and optimize your network. This session will introduce WAE and how to leverage its REST APIs.
Introduction to the DevNet Sandbox and IVTCisco DevNet
A session in the DevNet Zone at Cisco Live, Berlin. Come to this session to hear about the DevNet Sandbox and how it can accelerate your product development and reduce IVT costs! DevNet Sandboxes are an easy to use, cost-effective alternative to building out your own hardware lab and testing environment for many applications integrating with Cisco Technologies. All DevNet members have access to our sandbox labs for development, internal testing and in some cases IVT! In this session you will learn about technologies offered, lab features and our roadmap for new labs and IVT programs.
DEVNET-1157 Meet Magnum, OpenStack¹s New Containers-as-a-Service ProjectCisco DevNet
Magnum is an API service developed by the OpenStack Containers Team for OpenStack to make container management tools such as Docker and Kubernetes available as first class resources in OpenStack.
Using lexigraphical distancing, an algorithm that estimates the probability that one term is an edited version of another, can help identify variants of spam terms like "Viagra" and improve spam detection rates. The algorithm identified 51 out of 60 variants of Viagra, while spell checking only caught 24. When included as a pre-processing step in a naive Bayes classifier, it incorrectly flagged few additional good emails as spam but caught 27% more spam messages, improving spam detection rates. Lexigraphical distancing provides a robust way to identify term variants for applications like spam filtering and content analysis.
Improving urban health in African megacities, a case study of Lagos ( A Prese...Dr. Ebele Mogo
This document discusses considerations for improving urban health outcomes in African megacities, using Lagos, Nigeria as a case study. It analyzes progress on key social determinants of health in the Lagos megacity region (LMCR), including transportation infrastructure, governance challenges, and recommendations. Transportation in LMCR faces issues like inadequate infrastructure, poor maintenance, lack of integrated systems, and overreliance on motor vehicles. This leads to health problems from traffic, accidents, and pollution. The document recommends expanding diversified transportation and improving governance coordination between various agencies to better deliver social services.
DEVNET-1129 WAN Automation Engine - Develop Traffic Aware Applications Using ...Cisco DevNet
The Cisco WAN Automation Engine (WAE) is multivendor software designed to automate, plan, build and optimize your network. This session will introduce WAE and how to leverage its REST APIs.
Introduction to the DevNet Sandbox and IVTCisco DevNet
A session in the DevNet Zone at Cisco Live, Berlin. Come to this session to hear about the DevNet Sandbox and how it can accelerate your product development and reduce IVT costs! DevNet Sandboxes are an easy to use, cost-effective alternative to building out your own hardware lab and testing environment for many applications integrating with Cisco Technologies. All DevNet members have access to our sandbox labs for development, internal testing and in some cases IVT! In this session you will learn about technologies offered, lab features and our roadmap for new labs and IVT programs.
DEVNET-1157 Meet Magnum, OpenStack¹s New Containers-as-a-Service ProjectCisco DevNet
Magnum is an API service developed by the OpenStack Containers Team for OpenStack to make container management tools such as Docker and Kubernetes available as first class resources in OpenStack.
IRJET- Image Spam Detection: Problem and Existing SolutionIRJET Journal
This document summarizes the problem of image spam detection and existing solutions. It begins by defining image spam as spam where the message text is presented as a picture in an image file, allowing it to evade normal email filtering. It then discusses how image spam undermines current filters' ability to efficiently detect spam within images due to obfuscation techniques. Finally, it outlines the impact of spam, how spammers collect emails, and different types of spam threats like phishing, appending, and blank spam that pose risks to users' inboxes.
Internet Security Threat Report 2014 :: Volume 19 Appendices - The hardcore n...Symantec
Internet Security Threat Report 2014 :: Volume 19 :: Appendices
Hardcore data from Symantec’s Internet Security Threat Report.
Real number crunching on Threat Malicious Code, Fraud & Vulnerability trends including
Threat Activity Trends
• Malicious Activity by Source
• Malicious Web-Based Attack Prevalence
• Analysis of Malicious Web Activity by Attack Toolkits
• Analysis of Web-Based Spyware, Adware, and Potentially Unwanted Programs
• Analysis of Web Policy Risks from Inappropriate Use
• Analysis of Website Categories Exploited to Deliver Malicious Code
• Bot-Infected Computers
• Analysis of Mobile Threats
• Quantified Self – A Path to Self-Enlightenment or Just Another Security Nightmare?
• Data Breaches that could lead to Identity Theft
• Threat of the Insider
• Gaming Attacks
• The New Black Market
Malicious Code Trends
• Top Malicious Code Families
• Analysis of Malicious Code Activity by Geography, Industry Sector, and Company Size
• Propagation Mechanisms
• Email-Targeted Spear-Phishing Attacks Intelligence
Spam and Fraud Activity Trends
• Analysis of Spam Activity Trends
• Analysis of Spam Activity by Geography, Industry Sector, and Company Size
• Analysis of Spam Delivered by Botnets
• Significant Spam Tactics
• Analysis of Spam by Categorization
• Phishing Activity Trends
• Analysis of Phishing Activity by Geography, Industry Sector, and Company Size
• New Spam Trend: BGP Hijacking
Vulnerability Trends
• Total Number of Vulnerabilities
• Zero-Day Vulnerabilities
• Web Browser Vulnerabilities
• Web Browser Plug-in Vulnerabilities
• Web Attack Toolkits SCADA Vulnerabilities
Our Ppt for our Spam Alert System for the Kavach Hackathon
Social Links:
Linkedin:
https://www.linkedin.com/in/manojkumar--c/
instagram:
https://www.instagram.com/manojkumar._.c/
Youtube:
https://www.youtube.com/@Manojkumar_C/
#FreshSparTechnologies
The PPT presentation for the Spam Alert System developed by fresh spar technology for the government of india for the Kavach Hackthon that occured in Bangalore
Social Links:
Linkedin:
https://www.linkedin.com/in/manojkumar--c/
instagram:
https://www.instagram.com/manojkumar._.c/
Youtube:
https://www.youtube.com/@Manojkumar_C/
Hashtags: #fresh_spar_technologies #spyder_templates #manojkumar_c #startup #business
This document summarizes a research paper that develops an ant colony optimization (ACO) approach for filtering spam emails. It begins by noting the negative impacts of spam and how machine learning techniques have improved spam filtering over traditional rule-based methods. It then provides an overview of how ACO has been applied to data mining problems. The document proposes an ACO-based spam filtering model called AntSFilter and evaluates its performance on a public email dataset compared to other classifiers like Naive Bayes and Ripper. The preliminary results found AntSFilter yielded better accuracy with smaller rule sets, highlighting important features for identifying email categories.
This document summarizes spamming and spam filtering techniques. It discusses how spamming works by sending unsolicited messages from individual email accounts or open relay servers. It then outlines various spam filtering methods like blacklist, whitelist, content-based filters that analyze words or use heuristics. The document implements a simple spam sending program and shows how gmail and outlook spam filters work. It concludes by discussing the effectiveness of different filtering approaches and references further reading on minimizing spam effects.
The document discusses email spam filtering techniques. It begins by defining spam and outlining the problem it poses. It then describes various filtering techniques like blacklists, rule-based filters, and Bayesian filters. Bayesian filters are highlighted as the most successful current approach. The document also proposes combining multiple filters using a filter selector that compares email content to filter profiles, and selects the most accurate filter based on similarity. In conclusion, it discusses Bayesian filtering as a reliable current technique and opportunities to improve combined filtering approaches.
The cost of acquiring information by natural selectionCarl Bergstrom
This is a short talk that I gave at the Banff International Research Station workshop on Modeling and Theory in Population Biology. The idea is to try to understand how the burden of natural selection relates to the amount of information that selection puts into the genome.
It's based on the first part of this research paper:
The cost of information acquisition by natural selection
Ryan Seamus McGee, Olivia Kosterlitz, Artem Kaznatcheev, Benjamin Kerr, Carl T. Bergstrom
bioRxiv 2022.07.02.498577; doi: https://doi.org/10.1101/2022.07.02.498577
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...Travis Hills MN
By harnessing the power of High Flux Vacuum Membrane Distillation, Travis Hills from MN envisions a future where clean and safe drinking water is accessible to all, regardless of geographical location or economic status.
The binding of cosmological structures by massless topological defectsSérgio Sacani
Assuming spherical symmetry and weak field, it is shown that if one solves the Poisson equation or the Einstein field
equations sourced by a topological defect, i.e. a singularity of a very specific form, the result is a localized gravitational
field capable of driving flat rotation (i.e. Keplerian circular orbits at a constant speed for all radii) of test masses on a thin
spherical shell without any underlying mass. Moreover, a large-scale structure which exploits this solution by assembling
concentrically a number of such topological defects can establish a flat stellar or galactic rotation curve, and can also deflect
light in the same manner as an equipotential (isothermal) sphere. Thus, the need for dark matter or modified gravity theory is
mitigated, at least in part.
The debris of the ‘last major merger’ is dynamically youngSérgio Sacani
The Milky Way’s (MW) inner stellar halo contains an [Fe/H]-rich component with highly eccentric orbits, often referred to as the
‘last major merger.’ Hypotheses for the origin of this component include Gaia-Sausage/Enceladus (GSE), where the progenitor
collided with the MW proto-disc 8–11 Gyr ago, and the Virgo Radial Merger (VRM), where the progenitor collided with the
MW disc within the last 3 Gyr. These two scenarios make different predictions about observable structure in local phase space,
because the morphology of debris depends on how long it has had to phase mix. The recently identified phase-space folds in Gaia
DR3 have positive caustic velocities, making them fundamentally different than the phase-mixed chevrons found in simulations
at late times. Roughly 20 per cent of the stars in the prograde local stellar halo are associated with the observed caustics. Based
on a simple phase-mixing model, the observed number of caustics are consistent with a merger that occurred 1–2 Gyr ago.
We also compare the observed phase-space distribution to FIRE-2 Latte simulations of GSE-like mergers, using a quantitative
measurement of phase mixing (2D causticality). The observed local phase-space distribution best matches the simulated data
1–2 Gyr after collision, and certainly not later than 3 Gyr. This is further evidence that the progenitor of the ‘last major merger’
did not collide with the MW proto-disc at early times, as is thought for the GSE, but instead collided with the MW disc within
the last few Gyr, consistent with the body of work surrounding the VRM.
PPT on Direct Seeded Rice presented at the three-day 'Training and Validation Workshop on Modules of Climate Smart Agriculture (CSA) Technologies in South Asia' workshop on April 22, 2024.
IRJET- Image Spam Detection: Problem and Existing SolutionIRJET Journal
This document summarizes the problem of image spam detection and existing solutions. It begins by defining image spam as spam where the message text is presented as a picture in an image file, allowing it to evade normal email filtering. It then discusses how image spam undermines current filters' ability to efficiently detect spam within images due to obfuscation techniques. Finally, it outlines the impact of spam, how spammers collect emails, and different types of spam threats like phishing, appending, and blank spam that pose risks to users' inboxes.
Internet Security Threat Report 2014 :: Volume 19 Appendices - The hardcore n...Symantec
Internet Security Threat Report 2014 :: Volume 19 :: Appendices
Hardcore data from Symantec’s Internet Security Threat Report.
Real number crunching on Threat Malicious Code, Fraud & Vulnerability trends including
Threat Activity Trends
• Malicious Activity by Source
• Malicious Web-Based Attack Prevalence
• Analysis of Malicious Web Activity by Attack Toolkits
• Analysis of Web-Based Spyware, Adware, and Potentially Unwanted Programs
• Analysis of Web Policy Risks from Inappropriate Use
• Analysis of Website Categories Exploited to Deliver Malicious Code
• Bot-Infected Computers
• Analysis of Mobile Threats
• Quantified Self – A Path to Self-Enlightenment or Just Another Security Nightmare?
• Data Breaches that could lead to Identity Theft
• Threat of the Insider
• Gaming Attacks
• The New Black Market
Malicious Code Trends
• Top Malicious Code Families
• Analysis of Malicious Code Activity by Geography, Industry Sector, and Company Size
• Propagation Mechanisms
• Email-Targeted Spear-Phishing Attacks Intelligence
Spam and Fraud Activity Trends
• Analysis of Spam Activity Trends
• Analysis of Spam Activity by Geography, Industry Sector, and Company Size
• Analysis of Spam Delivered by Botnets
• Significant Spam Tactics
• Analysis of Spam by Categorization
• Phishing Activity Trends
• Analysis of Phishing Activity by Geography, Industry Sector, and Company Size
• New Spam Trend: BGP Hijacking
Vulnerability Trends
• Total Number of Vulnerabilities
• Zero-Day Vulnerabilities
• Web Browser Vulnerabilities
• Web Browser Plug-in Vulnerabilities
• Web Attack Toolkits SCADA Vulnerabilities
Our Ppt for our Spam Alert System for the Kavach Hackathon
Social Links:
Linkedin:
https://www.linkedin.com/in/manojkumar--c/
instagram:
https://www.instagram.com/manojkumar._.c/
Youtube:
https://www.youtube.com/@Manojkumar_C/
#FreshSparTechnologies
The PPT presentation for the Spam Alert System developed by fresh spar technology for the government of india for the Kavach Hackthon that occured in Bangalore
Social Links:
Linkedin:
https://www.linkedin.com/in/manojkumar--c/
instagram:
https://www.instagram.com/manojkumar._.c/
Youtube:
https://www.youtube.com/@Manojkumar_C/
Hashtags: #fresh_spar_technologies #spyder_templates #manojkumar_c #startup #business
This document summarizes a research paper that develops an ant colony optimization (ACO) approach for filtering spam emails. It begins by noting the negative impacts of spam and how machine learning techniques have improved spam filtering over traditional rule-based methods. It then provides an overview of how ACO has been applied to data mining problems. The document proposes an ACO-based spam filtering model called AntSFilter and evaluates its performance on a public email dataset compared to other classifiers like Naive Bayes and Ripper. The preliminary results found AntSFilter yielded better accuracy with smaller rule sets, highlighting important features for identifying email categories.
This document summarizes spamming and spam filtering techniques. It discusses how spamming works by sending unsolicited messages from individual email accounts or open relay servers. It then outlines various spam filtering methods like blacklist, whitelist, content-based filters that analyze words or use heuristics. The document implements a simple spam sending program and shows how gmail and outlook spam filters work. It concludes by discussing the effectiveness of different filtering approaches and references further reading on minimizing spam effects.
The document discusses email spam filtering techniques. It begins by defining spam and outlining the problem it poses. It then describes various filtering techniques like blacklists, rule-based filters, and Bayesian filters. Bayesian filters are highlighted as the most successful current approach. The document also proposes combining multiple filters using a filter selector that compares email content to filter profiles, and selects the most accurate filter based on similarity. In conclusion, it discusses Bayesian filtering as a reliable current technique and opportunities to improve combined filtering approaches.
Similar to Unveiling the gray emails: A Closer Look at Emails in the Gray Area (7)
The cost of acquiring information by natural selectionCarl Bergstrom
This is a short talk that I gave at the Banff International Research Station workshop on Modeling and Theory in Population Biology. The idea is to try to understand how the burden of natural selection relates to the amount of information that selection puts into the genome.
It's based on the first part of this research paper:
The cost of information acquisition by natural selection
Ryan Seamus McGee, Olivia Kosterlitz, Artem Kaznatcheev, Benjamin Kerr, Carl T. Bergstrom
bioRxiv 2022.07.02.498577; doi: https://doi.org/10.1101/2022.07.02.498577
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...Travis Hills MN
By harnessing the power of High Flux Vacuum Membrane Distillation, Travis Hills from MN envisions a future where clean and safe drinking water is accessible to all, regardless of geographical location or economic status.
The binding of cosmological structures by massless topological defectsSérgio Sacani
Assuming spherical symmetry and weak field, it is shown that if one solves the Poisson equation or the Einstein field
equations sourced by a topological defect, i.e. a singularity of a very specific form, the result is a localized gravitational
field capable of driving flat rotation (i.e. Keplerian circular orbits at a constant speed for all radii) of test masses on a thin
spherical shell without any underlying mass. Moreover, a large-scale structure which exploits this solution by assembling
concentrically a number of such topological defects can establish a flat stellar or galactic rotation curve, and can also deflect
light in the same manner as an equipotential (isothermal) sphere. Thus, the need for dark matter or modified gravity theory is
mitigated, at least in part.
The debris of the ‘last major merger’ is dynamically youngSérgio Sacani
The Milky Way’s (MW) inner stellar halo contains an [Fe/H]-rich component with highly eccentric orbits, often referred to as the
‘last major merger.’ Hypotheses for the origin of this component include Gaia-Sausage/Enceladus (GSE), where the progenitor
collided with the MW proto-disc 8–11 Gyr ago, and the Virgo Radial Merger (VRM), where the progenitor collided with the
MW disc within the last 3 Gyr. These two scenarios make different predictions about observable structure in local phase space,
because the morphology of debris depends on how long it has had to phase mix. The recently identified phase-space folds in Gaia
DR3 have positive caustic velocities, making them fundamentally different than the phase-mixed chevrons found in simulations
at late times. Roughly 20 per cent of the stars in the prograde local stellar halo are associated with the observed caustics. Based
on a simple phase-mixing model, the observed number of caustics are consistent with a merger that occurred 1–2 Gyr ago.
We also compare the observed phase-space distribution to FIRE-2 Latte simulations of GSE-like mergers, using a quantitative
measurement of phase mixing (2D causticality). The observed local phase-space distribution best matches the simulated data
1–2 Gyr after collision, and certainly not later than 3 Gyr. This is further evidence that the progenitor of the ‘last major merger’
did not collide with the MW proto-disc at early times, as is thought for the GSE, but instead collided with the MW disc within
the last few Gyr, consistent with the body of work surrounding the VRM.
PPT on Direct Seeded Rice presented at the three-day 'Training and Validation Workshop on Modules of Climate Smart Agriculture (CSA) Technologies in South Asia' workshop on April 22, 2024.
When I was asked to give a companion lecture in support of ‘The Philosophy of Science’ (https://shorturl.at/4pUXz) I decided not to walk through the detail of the many methodologies in order of use. Instead, I chose to employ a long standing, and ongoing, scientific development as an exemplar. And so, I chose the ever evolving story of Thermodynamics as a scientific investigation at its best.
Conducted over a period of >200 years, Thermodynamics R&D, and application, benefitted from the highest levels of professionalism, collaboration, and technical thoroughness. New layers of application, methodology, and practice were made possible by the progressive advance of technology. In turn, this has seen measurement and modelling accuracy continually improved at a micro and macro level.
Perhaps most importantly, Thermodynamics rapidly became a primary tool in the advance of applied science/engineering/technology, spanning micro-tech, to aerospace and cosmology. I can think of no better a story to illustrate the breadth of scientific methodologies and applications at their best.
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Leonel Morgado
Current descriptions of immersive learning cases are often difficult or impossible to compare. This is due to a myriad of different options on what details to include, which aspects are relevant, and on the descriptive approaches employed. Also, these aspects often combine very specific details with more general guidelines or indicate intents and rationales without clarifying their implementation. In this paper we provide a method to describe immersive learning cases that is structured to enable comparisons, yet flexible enough to allow researchers and practitioners to decide which aspects to include. This method leverages a taxonomy that classifies educational aspects at three levels (uses, practices, and strategies) and then utilizes two frameworks, the Immersive Learning Brain and the Immersion Cube, to enable a structured description and interpretation of immersive learning cases. The method is then demonstrated on a published immersive learning case on training for wind turbine maintenance using virtual reality. Applying the method results in a structured artifact, the Immersive Learning Case Sheet, that tags the case with its proximal uses, practices, and strategies, and refines the free text case description to ensure that matching details are included. This contribution is thus a case description method in support of future comparative research of immersive learning cases. We then discuss how the resulting description and interpretation can be leveraged to change immersion learning cases, by enriching them (considering low-effort changes or additions) or innovating (exploring more challenging avenues of transformation). The method holds significant promise to support better-grounded research in immersive learning.
8. June 23, 2014 Eurecom 8
Gmail Spam folder
Within our study users
checked 5-6 messages per day
1.5% of harmful spam emails had
a malicious attachment
9. June 23, 2014 Eurecom 9
How significant gray category is?
10. June 23, 2014 Eurecom 10
Botnet spam
419 scam
Phishing
Targeted Email Attacks
Spear Phishing
Blackhole Spam
Snowshoe Spam
Personal User Emails
GRAY
Gray Category in 2007
SPAM HAM
GRAY
Newsletters
Notifications
Customer Prospecting
Commercial ads“Most misclassified ham messages are advertising, news digests, …
[that] represent a small fraction of incoming mail, ... [which] filters
find more difficult to classify.”
- Cormack & Lynam, “Online Supervised Spam Filter
Evaluation”, 2007
11. June 23, 2014 Eurecom 11
Botnet spam
419 scam
Phishing
Targeted Email Attacks
Spear Phishing
Blackhole Spam
Snowshoe Spam
Personal User Emails
GRAY
Gray Category in 2012
SPAM HAM
GRAY
“49% of consumers subscribe to 1-10 brands”
- Direct Marketing Association
“70% of 'this is spam' are actually
legitimate newsletters, offers or
notifications”
- 2012, ReturnPath
Newsletters
Notifications
Customer Prospecting
“Graymail emails represent 50% of all
inbox traffic”
- 2012, Hotmail
“Graymail – the source of 75% of all
spam complaints”
- 2012, Hotmail
Commercial ads
12. June 23, 2014 Eurecom 12
Selecting a gray email dataset
13. June 23, 2014 Eurecom 13
Challenge-Response (CR) filtering
14. June 23, 2014 Eurecom 14
Challenge-Response (CR) filtering
Ham
Spam
15. June 23, 2014 Eurecom 15
Challenge-Response (CR) filtering
Ham
Spam
17. June 23, 2014 Eurecom 17
Identification and classification
of campaigns
N-grams
Classification
LEGITIMATESPAM
Evaluation of email headers similarity per campaign
Grouping emails into campaigns
- Campaign sender consistency
and geo-distribution
- Delivery statistics
- CAPTCHAs solved
- Bulk headers
Exact string matching
Limitation: only email
header information
was used
18. June 23, 2014 Eurecom 18
Identification and classification
of campaigns
N-grams
Classification
LEGITIMATESPAM
Evaluation of email headers similarity per campaign
Grouping emails into campaigns
- Campaign sender consistency
and geo-distribution
- Delivery rejections
- CAPTCHAs solved
- Bulk headers
Exact string matching― False Positives: 0.9%
― False Negatives: 8.6%
― Classifier uncertainty zone: 6.4%
18% 82%
19. June 23, 2014 Eurecom 19
Refinement with Graph Analysis
SPAM: 16%
UNCERTAIN: 7%
LEGITIMATE: 77%
20. June 23, 2014 Eurecom 20
Refinement with Graph Analysis
SPAM: 16%
UNCERTAIN: 7%
LEGITIMATE: 77%
- Decompose into groups with a
community finding algorithm
- Propagate labels in homogeneous groups
21. June 23, 2014 Eurecom 21
Refinement with Graph Analysis
SPAM: 16%
UNCERTAIN: 7%
LEGITIMATE: 77%
- Extract graph metrics
- Compare them with known clusters
22. June 23, 2014 Eurecom 22
Refinement with Graph Analysis
SPAM: 16%
UNCERTAIN: 7%
LEGITIMATE: 77%
False positives drop from 0.9% to 0.2%
27. June 23, 2014 Eurecom 27
Campaign Categories
The owners websites underline the fact
that “they are not spammers”, and that they
provide to other companies a way to send
marketing emails within the boundaries of
the current legislation
28. June 23, 2014 Eurecom 28
Gray Email Campaign Categories
― Commercial campaigns (42% of total)
─ Use wide IP address ranges to run the campaigns
─ Provide a pre-compiled list of categorized email addresses
─ Distributed, but consistent campaign sending patterns
― Newsletters and notifications
― Botnet-generated campaigns
― Scam and phishing campaigns
─ Behavior similar to
commercial camp.
─ Hide behind webmail accounts
29. June 23, 2014 Eurecom 29
Gray Email Campaign Categories
― Commercial campaigns (42% of total)
─ Use wide IP address ranges to run the campaigns
─ Provide a pre-compiled list of categorized email addresses
─ Distributed, but consistent campaign sending patterns
― Newsletters and notifications
― Botnet-generated campaigns
― Scam and phishing campaigns
─ Behavior similar to
commercial camp.
─ Hide behind webmail accounts
30. June 23, 2014 Eurecom 30
User Behavior
Users are pro-active
towards newsletters
31. June 23, 2014 Eurecom 31
User Behavior
Users are pro-active
towards newsletters
32. June 23, 2014 Eurecom 32
User Behavior
But also curious to check
on malicious/illegal content
- 20% of the users have opened botnet-generated emails
- Each user on average viewing 5 messages
33. June 23, 2014 Eurecom 33
User Behavior
But also curious to check
on malicious/illegal content
- 20% of the users have opened botnet-generated emails
- Each user on average viewing 5 messages
35. June 23, 2014 Eurecom 35
Summary
― Presented a first empirical study of gray emails and commercial and
newsletter campaigns
― Classified 50% of the gray emails (15% of all incoming email) and
categorized into 4 categories
― Lessons learned:
─ Email classification cannot stay binary anymore
─ By neglecting gray emails and placing them in spam folder, we increase
user security threat level instead of helping to lower it
─ Scam campaigns, especially sent from webmail accounts, were the most
challenging to deal with