SlideShare a Scribd company logo
1 of 28
Download to read offline
Analyzing Social and Stylometric Features to 
Identify Spearphishing Emails 
Prateek Dewan, Anand Kashyap, 
Ponnurangam Kumaraguru 
Indraprastha Institute of Information Technology – Delhi (IIITD), India 
Unifying the 
Global Response 
to Cybercrime
Unifying the 
Global Response 
to Cybercrime 
Overview 
• What is spearphishing? 
• Spearphishing and Online Social Media 
• Challenges and dataset 
• Feature extraction 
• Classification results 
• Discussion 
1
What is spearphishing? 
Unifying the 
Global Response 
to Cybercrime 
• Targeted phishing attack 
• Contains contextual content instead of random 
messages 
• Harder to detect, since spearphishing emails look 
more genuine 
• Victims are asked to 
• Download malicious attachments 
• Reply with sensitive information 
• Click on URLs 
• … 
2
Why study spearphishing? 
• Victims are 4.5 times more likely to fall for spear 
phishing, than normal phishing [1]. 
• One of the main entry points for Advanced 
Persistent Threats. 
• Causes losses worth millions. 
[1] M. Jakobsson. Modeling and preventing phishing attacks. In Financial Cryptography, volume 5. Citeseer, 2005. 
Unifying the 
Global Response 
to Cybercrime 
3
Spearphishing and social media 
• Social media profiles can be a good source for 
the “context” part of spear phishing emails 
• FBI warning on July 04, 20131 
• “…emails typically contain accurate information about 
victims obtained from data posted on social networking 
sites…” 
1 http://www.computerweekly.com/news/2240187487/FBI-warns-of-increased-spear-phishing-attacks 
Unifying the 
Global Response 
to Cybercrime 
4
Unifying the 
Global Response 
to Cybercrime 
Data 
• Emails 
• Spear phishing emails (Symantec) 
• Spam / phishing emails (Symantec) 
• Benign emails (Enron) 
• LinkedIn profiles 
• Recipients of emails in the three datasets mentioned 
above 
• LinkedIn People Search API 
5
Challenges (social features) 
• Limited information about victim to identify her on 
social media 
• Only first name, last name, organization available from 
Unifying the 
Global Response 
to Cybercrime 
victim’s email ID 
• Hard to find victim on Facebook, Twitter, Google+ 
• Too many profiles with same first name, last name 
• Work field not searchable. 
6
Challenges (social features) contd. 
• LinkedIn – Only network which provides searching 
using work field 
• People search API access restricted. 
• We requested for access under their Vetted API access 
Unifying the 
Global Response 
to Cybercrime 
scheme. 
• Rate limited 
• Only 100 requests per day per app 
7
Unifying the 
Global Response 
to Cybercrime 
Dataset 
• Emails sent to employees of 14 international 
organizations 
• SPEAR (Targeted spear phishing emails from Symantec) 
• 4,742 emails à 2,434 victims / LinkedIn profiles 
• SPAM (Spam / phishing emails from Symantec) 
• 9,353 emails à 5,912 victims / LinkedIn profiles 
• BENIGN (Sample from Enron email corpus) 
• 6,601 emails à 1,240 victims / LinkedIn profiles 
8
Feature set creation 
Final 
feature 
vector 
Unifying the 
Global Response 
to Cybercrime 
SPAM 
SPEAR 
BENIGN 
Stylometric 
features 
from 
emails 
1. firstName 
2. lastName 
3. organization 
http://api.linkedin.com/v1/people-search: 
LinkedIn 
Profile(s) 
Social 
features 
from 
LinkedIn 
Recipient 
email 
address 
9
Stylometric Features 
Unifying the 
Global Response 
to Cybercrime 
• Subject based (7) 
• Num. words, Num. characters, Richness 
• Has words: “bank”, “verify” 
• isReply, isForwarded 
• Attachment based (2) 
• Length of attachment name 
• Attachment size 
• Body based (9) 
• Num. words, Num. characters, Num. unique words 
• Has words: “attach”, “suspension”, “verify your account” 
• Num. newlines, Richness, function words 
10
Unifying the 
Global Response 
to Cybercrime 
Social Features 
• Location 
• Connections 
• Summary based (5) 
• Num. words, Num. Characters, Num. unique words 
• Length, Richness 
• Profession based (2) 
• Job Level (0-7) 
• Job Type (0-9) 
11
Results (SPEAR v/s SPAM) 
Unifying the 
Global Response 
to Cybercrime 
Feature Set (num. 
features) 
Classifier Random Forest J48 Decision 
Tree 
Naïve Bayes 
Subject (7) Accuracy (%) 83.91 83.10 58.87 
FP Rate 0.208 0.227 0.371 
Attachment (2) Accuracy (%) 97.86 96.69 69.15 
FP Rate 0.035 0.046 0.218 
All email (9) Accuracy (%) 98.28 97.32 68.69 
FP Rate 0.024 0.035 0.221 
Social (9) Accuracy (%) 81.73 76.63 65.85 
FP Rate 0.229 0.356 0.445 
Email + Social (18) Accuracy (%) 96.47 95.90 69.35 
FP Rate 0.052 0.054 0.232 
12
Results (SPEAR v/s SPAM) contd. 
Unifying the 
Global Response 
to Cybercrime 
• Most informative features 
• Attachment size 
• Length of attachment name 
• Subject Richness 
• No. of characters in subject 
• Location (from LinkedIn profile) 
• No. of words in subject 
• LinkedIn connections 
• … 
13
Results (SPEAR v/s SPAM) contd. 
Unifying the 
Global Response 
to Cybercrime 
14
SPEAR v/s SPAM subjects 
ß Spam / phishing 
Spear phishing à 
15
Results (SPEAR v/s BENIGN) 
Unifying the 
Global Response 
to Cybercrime 
Feature Set (num. 
features) 
Classifier Random Forest J48 Decision 
Tree 
Naïve Bayes 
Subject (7) Accuracy (%) 81.19 81.11 61.75 
FP Rate 0.210 0.217 0.489 
Body(9) Accuracy (%) 97.17 95.62 53.81 
FP Rate 0.031 0.048 0.338 
All email (16) Accuracy (%) 97.39 95.84 54.14 
FP Rate 0.029 0.044 0.334 
Social (9) Accuracy (%) 94.48 91.79 69.76 
FP Rate 0.067 0.103 0.278 
Email + Social (25) Accuracy (%) 97.04 95.28 57.27 
FP Rate 0.032 0.052 0.316 
16
Results (SPEAR v/s BENIGN) contd. 
Unifying the 
Global Response 
to Cybercrime 
• Most informative features 
• Body richness 
• No. of characters in body 
• No. of words in body 
• No. of unique words in body 
• Location (from LinkedIn) 
• No. of newlines in body 
• Subject richness 
• … 
17
Results (SPEAR v/s SPAM + BENIGN) 
Unifying the 
Global Response 
to Cybercrime 
Feature Set (num. 
features) 
Classifier Random Forest J48 Decision 
Tree 
Naïve Bayes 
Subject (7) Accuracy (%) 86.48 86.35 77.99 
FP Rate 0.333 0.352 0.681 
Social (9) Accuracy (%) 88.04 84.69 74.46 
FP Rate 0.241 0.371 0.454 
Email + Social (16) Accuracy (%) 89.86 88.38 73.97 
FP Rate 0.202 0.248 0.381 
18
Results (SPEAR v/s SPAM + BENIGN) contd. 
Unifying the 
Global Response 
to Cybercrime 
• Most informative features 
• Subject richness 
• No. of characters in subject 
• Location (from LinkedIn) 
• LinkedIn connections 
• No. of words in subject 
• Email forwarded? (True / false) 
• Email is a reply? (True / false) 
• … 
19
Unifying the 
Global Response 
to Cybercrime 
Discussion 
• Social features features (from LinkedIn) did not help in 
distinguishing spear phishing emails from non spear 
phishing emails. 
• Stylometric features from emails suffice to do so. 
• Real world scenarios may be much different 
• Attackers may use information from other sources / social 
networks, viz. Facebook, Twitter, etc. 
• Dataset limitation 
• It is possible that no spear phishing mails in our dataset were 
crafted using LinkedIn features 
• We cannot conclude that such behavior would not 
be found outside our dataset, or in future. 
20
Unifying the 
Global Response 
to Cybercrime 
Thanks! 
Prateek Dewan 
E: prateekd@iiitd.ac.in 
W: http://precog.iiitd.edu.in/people/prateek 
21
Unifying the 
Global Response 
to Cybercrime 
Backup slides…
Results (SPEAR v/s SPAM) contd. 
Unifying the 
Global Response 
to Cybercrime
Unifying the 
Global Response 
to Cybercrime 
Attachment names
Results (SPEAR v/s BENIGN) contd. 
ß Benign emails 
Spear phishing à
Unifying the 
Global Response 
to Cybercrime 
Attachment types
Details of organizations 
Unifying the 
Global Response 
to Cybercrime

More Related Content

What's hot

Phishing detection in ims using domain ontology and cba an innovative rule ...
Phishing detection in ims using domain ontology and cba   an innovative rule ...Phishing detection in ims using domain ontology and cba   an innovative rule ...
Phishing detection in ims using domain ontology and cba an innovative rule ...
ijistjournal
 

What's hot (20)

How to Spot and Combat a Phishing Attack - Cyber Security Webinar | ControlScan
How to Spot and Combat a Phishing Attack - Cyber Security Webinar | ControlScanHow to Spot and Combat a Phishing Attack - Cyber Security Webinar | ControlScan
How to Spot and Combat a Phishing Attack - Cyber Security Webinar | ControlScan
 
Phishing attack seminar presentation
Phishing attack seminar presentation Phishing attack seminar presentation
Phishing attack seminar presentation
 
Phishing detection in ims using domain ontology and cba an innovative rule ...
Phishing detection in ims using domain ontology and cba   an innovative rule ...Phishing detection in ims using domain ontology and cba   an innovative rule ...
Phishing detection in ims using domain ontology and cba an innovative rule ...
 
Prevent phishing scams
Prevent phishing scamsPrevent phishing scams
Prevent phishing scams
 
Phishing ppt
Phishing pptPhishing ppt
Phishing ppt
 
Anti phishing presentation
Anti phishing presentationAnti phishing presentation
Anti phishing presentation
 
Phishing - A modern web attack
Phishing -  A modern web attackPhishing -  A modern web attack
Phishing - A modern web attack
 
Phishing
PhishingPhishing
Phishing
 
Phishing technology
Phishing technologyPhishing technology
Phishing technology
 
Phishing awareness
Phishing awarenessPhishing awareness
Phishing awareness
 
Phishing
PhishingPhishing
Phishing
 
Phishing and hacking
Phishing and hackingPhishing and hacking
Phishing and hacking
 
A presentation on Phishing
A presentation on PhishingA presentation on Phishing
A presentation on Phishing
 
PPT on Phishing
PPT on PhishingPPT on Phishing
PPT on Phishing
 
Phishing Attacks
Phishing AttacksPhishing Attacks
Phishing Attacks
 
Phishing technique tanish khilani
Phishing technique tanish  khilani Phishing technique tanish  khilani
Phishing technique tanish khilani
 
Phishing & Pharming
Phishing & PharmingPhishing & Pharming
Phishing & Pharming
 
Protecting Corporete Credentials Against Threats 4 48159 wgw03071_usen
Protecting Corporete Credentials Against Threats 4 48159 wgw03071_usenProtecting Corporete Credentials Against Threats 4 48159 wgw03071_usen
Protecting Corporete Credentials Against Threats 4 48159 wgw03071_usen
 
Phishing detection & protection scheme
Phishing detection & protection schemePhishing detection & protection scheme
Phishing detection & protection scheme
 
Phishing Attack : A big Threat
Phishing Attack : A big ThreatPhishing Attack : A big Threat
Phishing Attack : A big Threat
 

Similar to Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Attacking the Privacy of Social Network users (HITB 2011)
Attacking the Privacy of Social Network users (HITB 2011)Attacking the Privacy of Social Network users (HITB 2011)
Attacking the Privacy of Social Network users (HITB 2011)
Marco Balduzzi
 
Emerging Phishing Trends and Effectiveness of the Anti-Phishing Landing Page
Emerging Phishing Trends and Effectiveness of the Anti-Phishing Landing PageEmerging Phishing Trends and Effectiveness of the Anti-Phishing Landing Page
Emerging Phishing Trends and Effectiveness of the Anti-Phishing Landing Page
IIIT Hyderabad
 
Emerging Phishing Trends and Effectiveness of the Anti-Phishing Landing Page
Emerging Phishing Trends and Effectiveness of the Anti-Phishing Landing PageEmerging Phishing Trends and Effectiveness of the Anti-Phishing Landing Page
Emerging Phishing Trends and Effectiveness of the Anti-Phishing Landing Page
Cybersecurity Education and Research Centre
 
SOD-Presentation-Des-Moines-10.19.21-v2.pptx
SOD-Presentation-Des-Moines-10.19.21-v2.pptxSOD-Presentation-Des-Moines-10.19.21-v2.pptx
SOD-Presentation-Des-Moines-10.19.21-v2.pptx
TamaOlan1
 

Similar to Analyzing Social and Stylometric Features to Identify Spear phishing Emails (20)

IS Awareness in practice, isaca moscow 2019 10
IS Awareness in practice, isaca moscow 2019 10IS Awareness in practice, isaca moscow 2019 10
IS Awareness in practice, isaca moscow 2019 10
 
Attacking the Privacy of Social Network users (HITB 2011)
Attacking the Privacy of Social Network users (HITB 2011)Attacking the Privacy of Social Network users (HITB 2011)
Attacking the Privacy of Social Network users (HITB 2011)
 
Ethical Hacking by Krutarth Vasavada
Ethical Hacking by Krutarth VasavadaEthical Hacking by Krutarth Vasavada
Ethical Hacking by Krutarth Vasavada
 
Emerging Phishing Trends and Effectiveness of the Anti-Phishing Landing Page
Emerging Phishing Trends and Effectiveness of the Anti-Phishing Landing PageEmerging Phishing Trends and Effectiveness of the Anti-Phishing Landing Page
Emerging Phishing Trends and Effectiveness of the Anti-Phishing Landing Page
 
Emerging Phishing Trends and Effectiveness of the Anti-Phishing Landing Page
Emerging Phishing Trends and Effectiveness of the Anti-Phishing Landing PageEmerging Phishing Trends and Effectiveness of the Anti-Phishing Landing Page
Emerging Phishing Trends and Effectiveness of the Anti-Phishing Landing Page
 
Phishing Attacks: Trends, Detection Systems and Computer Vision as a Promisin...
Phishing Attacks: Trends, Detection Systems and Computer Vision as a Promisin...Phishing Attacks: Trends, Detection Systems and Computer Vision as a Promisin...
Phishing Attacks: Trends, Detection Systems and Computer Vision as a Promisin...
 
Voight-Kampff for Email Addresses: Quantifying Email Address Reputation to Id...
Voight-Kampff for Email Addresses: Quantifying Email Address Reputation to Id...Voight-Kampff for Email Addresses: Quantifying Email Address Reputation to Id...
Voight-Kampff for Email Addresses: Quantifying Email Address Reputation to Id...
 
Technical track chris calvert-1 30 pm-issa conference-calvert
Technical track chris calvert-1 30 pm-issa conference-calvertTechnical track chris calvert-1 30 pm-issa conference-calvert
Technical track chris calvert-1 30 pm-issa conference-calvert
 
How to Use Artificial Intelligence to Minimize your Cybersecurity Attack Surface
How to Use Artificial Intelligence to Minimize your Cybersecurity Attack SurfaceHow to Use Artificial Intelligence to Minimize your Cybersecurity Attack Surface
How to Use Artificial Intelligence to Minimize your Cybersecurity Attack Surface
 
Why Do Some People Fall for Phishing Scams and What Do I Do About it?
Why Do Some People Fall for Phishing Scams and What Do I Do About it?Why Do Some People Fall for Phishing Scams and What Do I Do About it?
Why Do Some People Fall for Phishing Scams and What Do I Do About it?
 
Working Together to Build a Cyber Security Program
Working Together to Build a Cyber Security ProgramWorking Together to Build a Cyber Security Program
Working Together to Build a Cyber Security Program
 
SOD-Presentation-Des-Moines-10.19.21-v2.pptx
SOD-Presentation-Des-Moines-10.19.21-v2.pptxSOD-Presentation-Des-Moines-10.19.21-v2.pptx
SOD-Presentation-Des-Moines-10.19.21-v2.pptx
 
Unveiling the dark web. The importance of your cybersecurity posture
Unveiling the dark web. The importance of your cybersecurity postureUnveiling the dark web. The importance of your cybersecurity posture
Unveiling the dark web. The importance of your cybersecurity posture
 
Application Security-Understanding The Horizon
Application Security-Understanding The HorizonApplication Security-Understanding The Horizon
Application Security-Understanding The Horizon
 
Artificial Intelligence – Time Bomb or The Promised Land?
Artificial Intelligence – Time Bomb or The Promised Land?Artificial Intelligence – Time Bomb or The Promised Land?
Artificial Intelligence – Time Bomb or The Promised Land?
 
Email: still the favourite route of attack
Email: still the favourite route of attackEmail: still the favourite route of attack
Email: still the favourite route of attack
 
An Introduction To IT Security And Privacy In Libraries & Anywhere
An Introduction To IT Security And Privacy In Libraries & AnywhereAn Introduction To IT Security And Privacy In Libraries & Anywhere
An Introduction To IT Security And Privacy In Libraries & Anywhere
 
The Future Of Threat Intelligence Platforms
The Future Of Threat Intelligence PlatformsThe Future Of Threat Intelligence Platforms
The Future Of Threat Intelligence Platforms
 
protecting your digital personal life
protecting your digital personal lifeprotecting your digital personal life
protecting your digital personal life
 
Data mining in security: Ja'far Alqatawna
Data mining in security: Ja'far AlqatawnaData mining in security: Ja'far Alqatawna
Data mining in security: Ja'far Alqatawna
 

More from Cybersecurity Education and Research Centre

Automated Methods for Identity Resolution across Online Social Networks
Automated Methods for Identity Resolution across Online Social NetworksAutomated Methods for Identity Resolution across Online Social Networks
Automated Methods for Identity Resolution across Online Social Networks
Cybersecurity Education and Research Centre
 
Video Inpainting detection using inconsistencies in optical Flow
Video Inpainting detection using inconsistencies in optical FlowVideo Inpainting detection using inconsistencies in optical Flow
Video Inpainting detection using inconsistencies in optical Flow
Cybersecurity Education and Research Centre
 
Identification and Analysis of Malicious Content on Facebook: A Survey
Identification and Analysis of Malicious Content on Facebook: A SurveyIdentification and Analysis of Malicious Content on Facebook: A Survey
Identification and Analysis of Malicious Content on Facebook: A Survey
Cybersecurity Education and Research Centre
 
Clotho : Saving Programs from Malformed Strings and Incorrect
Clotho : Saving Programs from Malformed Strings and IncorrectClotho : Saving Programs from Malformed Strings and Incorrect
Clotho : Saving Programs from Malformed Strings and Incorrect
Cybersecurity Education and Research Centre
 
Clotho: Saving Programs from Malformed Strings and Incorrect String-handling
Clotho: Saving Programs from Malformed Strings and Incorrect String-handling�Clotho: Saving Programs from Malformed Strings and Incorrect String-handling�
Clotho: Saving Programs from Malformed Strings and Incorrect String-handling
Cybersecurity Education and Research Centre
 
Securing the Digital Enterprise
Securing the Digital EnterpriseSecuring the Digital Enterprise
Securing the Digital Enterprise
Cybersecurity Education and Research Centre
 
Broker Bots: Analyzing automated activity during High Impact Events on Twitter
Broker Bots: Analyzing automated activity during High Impact Events on TwitterBroker Bots: Analyzing automated activity during High Impact Events on Twitter
Broker Bots: Analyzing automated activity during High Impact Events on Twitter
Cybersecurity Education and Research Centre
 
Twitter and Polls: What Do 140 Characters Say About India General Elections 2014
Twitter and Polls: What Do 140 Characters Say About India General Elections 2014Twitter and Polls: What Do 140 Characters Say About India General Elections 2014
Twitter and Polls: What Do 140 Characters Say About India General Elections 2014
Cybersecurity Education and Research Centre
 
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasuresExploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Cybersecurity Education and Research Centre
 

More from Cybersecurity Education and Research Centre (16)

Automated Methods for Identity Resolution across Online Social Networks
Automated Methods for Identity Resolution across Online Social NetworksAutomated Methods for Identity Resolution across Online Social Networks
Automated Methods for Identity Resolution across Online Social Networks
 
Novel Instruction Set Architecture Based Side Channels in popular SSL/TLS Imp...
Novel Instruction Set Architecture Based Side Channels in popular SSL/TLS Imp...Novel Instruction Set Architecture Based Side Channels in popular SSL/TLS Imp...
Novel Instruction Set Architecture Based Side Channels in popular SSL/TLS Imp...
 
Video Inpainting detection using inconsistencies in optical Flow
Video Inpainting detection using inconsistencies in optical FlowVideo Inpainting detection using inconsistencies in optical Flow
Video Inpainting detection using inconsistencies in optical Flow
 
TASVEER : Tomography of India’s Internet Infrastructure
TASVEER : Tomography of India’s Internet InfrastructureTASVEER : Tomography of India’s Internet Infrastructure
TASVEER : Tomography of India’s Internet Infrastructure
 
Data-Driven Assessment of Cyber Risk: Challenges in Assessing and Migrating C...
Data-Driven Assessment of Cyber Risk: Challenges in Assessing and Migrating C...Data-Driven Assessment of Cyber Risk: Challenges in Assessing and Migrating C...
Data-Driven Assessment of Cyber Risk: Challenges in Assessing and Migrating C...
 
A Strategy for Addressing Cyber Security Challenges
A Strategy for Addressing Cyber Security Challenges A Strategy for Addressing Cyber Security Challenges
A Strategy for Addressing Cyber Security Challenges
 
Identification and Analysis of Malicious Content on Facebook: A Survey
Identification and Analysis of Malicious Content on Facebook: A SurveyIdentification and Analysis of Malicious Content on Facebook: A Survey
Identification and Analysis of Malicious Content on Facebook: A Survey
 
Clotho : Saving Programs from Malformed Strings and Incorrect
Clotho : Saving Programs from Malformed Strings and IncorrectClotho : Saving Programs from Malformed Strings and Incorrect
Clotho : Saving Programs from Malformed Strings and Incorrect
 
National Critical Information Infrastructure Protection Centre (NCIIPC): Role...
National Critical Information Infrastructure Protection Centre (NCIIPC): Role...National Critical Information Infrastructure Protection Centre (NCIIPC): Role...
National Critical Information Infrastructure Protection Centre (NCIIPC): Role...
 
Clotho: Saving Programs from Malformed Strings and Incorrect String-handling
Clotho: Saving Programs from Malformed Strings and Incorrect String-handling�Clotho: Saving Programs from Malformed Strings and Incorrect String-handling�
Clotho: Saving Programs from Malformed Strings and Incorrect String-handling
 
Securing the Digital Enterprise
Securing the Digital EnterpriseSecuring the Digital Enterprise
Securing the Digital Enterprise
 
Broker Bots: Analyzing automated activity during High Impact Events on Twitter
Broker Bots: Analyzing automated activity during High Impact Events on TwitterBroker Bots: Analyzing automated activity during High Impact Events on Twitter
Broker Bots: Analyzing automated activity during High Impact Events on Twitter
 
Twitter and Polls: What Do 140 Characters Say About India General Elections 2014
Twitter and Polls: What Do 140 Characters Say About India General Elections 2014Twitter and Polls: What Do 140 Characters Say About India General Elections 2014
Twitter and Polls: What Do 140 Characters Say About India General Elections 2014
 
Web Application Security 101
Web Application Security 101Web Application Security 101
Web Application Security 101
 
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasuresExploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
 
The future of interaction & its security challenges
The future of interaction & its security challengesThe future of interaction & its security challenges
The future of interaction & its security challenges
 

Recently uploaded

Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
Kamal Acharya
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptx
chumtiyababu
 

Recently uploaded (20)

Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptx
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptx
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 

Analyzing Social and Stylometric Features to Identify Spear phishing Emails

  • 1. Analyzing Social and Stylometric Features to Identify Spearphishing Emails Prateek Dewan, Anand Kashyap, Ponnurangam Kumaraguru Indraprastha Institute of Information Technology – Delhi (IIITD), India Unifying the Global Response to Cybercrime
  • 2. Unifying the Global Response to Cybercrime Overview • What is spearphishing? • Spearphishing and Online Social Media • Challenges and dataset • Feature extraction • Classification results • Discussion 1
  • 3. What is spearphishing? Unifying the Global Response to Cybercrime • Targeted phishing attack • Contains contextual content instead of random messages • Harder to detect, since spearphishing emails look more genuine • Victims are asked to • Download malicious attachments • Reply with sensitive information • Click on URLs • … 2
  • 4. Why study spearphishing? • Victims are 4.5 times more likely to fall for spear phishing, than normal phishing [1]. • One of the main entry points for Advanced Persistent Threats. • Causes losses worth millions. [1] M. Jakobsson. Modeling and preventing phishing attacks. In Financial Cryptography, volume 5. Citeseer, 2005. Unifying the Global Response to Cybercrime 3
  • 5. Spearphishing and social media • Social media profiles can be a good source for the “context” part of spear phishing emails • FBI warning on July 04, 20131 • “…emails typically contain accurate information about victims obtained from data posted on social networking sites…” 1 http://www.computerweekly.com/news/2240187487/FBI-warns-of-increased-spear-phishing-attacks Unifying the Global Response to Cybercrime 4
  • 6. Unifying the Global Response to Cybercrime Data • Emails • Spear phishing emails (Symantec) • Spam / phishing emails (Symantec) • Benign emails (Enron) • LinkedIn profiles • Recipients of emails in the three datasets mentioned above • LinkedIn People Search API 5
  • 7. Challenges (social features) • Limited information about victim to identify her on social media • Only first name, last name, organization available from Unifying the Global Response to Cybercrime victim’s email ID • Hard to find victim on Facebook, Twitter, Google+ • Too many profiles with same first name, last name • Work field not searchable. 6
  • 8. Challenges (social features) contd. • LinkedIn – Only network which provides searching using work field • People search API access restricted. • We requested for access under their Vetted API access Unifying the Global Response to Cybercrime scheme. • Rate limited • Only 100 requests per day per app 7
  • 9. Unifying the Global Response to Cybercrime Dataset • Emails sent to employees of 14 international organizations • SPEAR (Targeted spear phishing emails from Symantec) • 4,742 emails à 2,434 victims / LinkedIn profiles • SPAM (Spam / phishing emails from Symantec) • 9,353 emails à 5,912 victims / LinkedIn profiles • BENIGN (Sample from Enron email corpus) • 6,601 emails à 1,240 victims / LinkedIn profiles 8
  • 10. Feature set creation Final feature vector Unifying the Global Response to Cybercrime SPAM SPEAR BENIGN Stylometric features from emails 1. firstName 2. lastName 3. organization http://api.linkedin.com/v1/people-search: LinkedIn Profile(s) Social features from LinkedIn Recipient email address 9
  • 11. Stylometric Features Unifying the Global Response to Cybercrime • Subject based (7) • Num. words, Num. characters, Richness • Has words: “bank”, “verify” • isReply, isForwarded • Attachment based (2) • Length of attachment name • Attachment size • Body based (9) • Num. words, Num. characters, Num. unique words • Has words: “attach”, “suspension”, “verify your account” • Num. newlines, Richness, function words 10
  • 12. Unifying the Global Response to Cybercrime Social Features • Location • Connections • Summary based (5) • Num. words, Num. Characters, Num. unique words • Length, Richness • Profession based (2) • Job Level (0-7) • Job Type (0-9) 11
  • 13. Results (SPEAR v/s SPAM) Unifying the Global Response to Cybercrime Feature Set (num. features) Classifier Random Forest J48 Decision Tree Naïve Bayes Subject (7) Accuracy (%) 83.91 83.10 58.87 FP Rate 0.208 0.227 0.371 Attachment (2) Accuracy (%) 97.86 96.69 69.15 FP Rate 0.035 0.046 0.218 All email (9) Accuracy (%) 98.28 97.32 68.69 FP Rate 0.024 0.035 0.221 Social (9) Accuracy (%) 81.73 76.63 65.85 FP Rate 0.229 0.356 0.445 Email + Social (18) Accuracy (%) 96.47 95.90 69.35 FP Rate 0.052 0.054 0.232 12
  • 14. Results (SPEAR v/s SPAM) contd. Unifying the Global Response to Cybercrime • Most informative features • Attachment size • Length of attachment name • Subject Richness • No. of characters in subject • Location (from LinkedIn profile) • No. of words in subject • LinkedIn connections • … 13
  • 15. Results (SPEAR v/s SPAM) contd. Unifying the Global Response to Cybercrime 14
  • 16. SPEAR v/s SPAM subjects ß Spam / phishing Spear phishing à 15
  • 17. Results (SPEAR v/s BENIGN) Unifying the Global Response to Cybercrime Feature Set (num. features) Classifier Random Forest J48 Decision Tree Naïve Bayes Subject (7) Accuracy (%) 81.19 81.11 61.75 FP Rate 0.210 0.217 0.489 Body(9) Accuracy (%) 97.17 95.62 53.81 FP Rate 0.031 0.048 0.338 All email (16) Accuracy (%) 97.39 95.84 54.14 FP Rate 0.029 0.044 0.334 Social (9) Accuracy (%) 94.48 91.79 69.76 FP Rate 0.067 0.103 0.278 Email + Social (25) Accuracy (%) 97.04 95.28 57.27 FP Rate 0.032 0.052 0.316 16
  • 18. Results (SPEAR v/s BENIGN) contd. Unifying the Global Response to Cybercrime • Most informative features • Body richness • No. of characters in body • No. of words in body • No. of unique words in body • Location (from LinkedIn) • No. of newlines in body • Subject richness • … 17
  • 19. Results (SPEAR v/s SPAM + BENIGN) Unifying the Global Response to Cybercrime Feature Set (num. features) Classifier Random Forest J48 Decision Tree Naïve Bayes Subject (7) Accuracy (%) 86.48 86.35 77.99 FP Rate 0.333 0.352 0.681 Social (9) Accuracy (%) 88.04 84.69 74.46 FP Rate 0.241 0.371 0.454 Email + Social (16) Accuracy (%) 89.86 88.38 73.97 FP Rate 0.202 0.248 0.381 18
  • 20. Results (SPEAR v/s SPAM + BENIGN) contd. Unifying the Global Response to Cybercrime • Most informative features • Subject richness • No. of characters in subject • Location (from LinkedIn) • LinkedIn connections • No. of words in subject • Email forwarded? (True / false) • Email is a reply? (True / false) • … 19
  • 21. Unifying the Global Response to Cybercrime Discussion • Social features features (from LinkedIn) did not help in distinguishing spear phishing emails from non spear phishing emails. • Stylometric features from emails suffice to do so. • Real world scenarios may be much different • Attackers may use information from other sources / social networks, viz. Facebook, Twitter, etc. • Dataset limitation • It is possible that no spear phishing mails in our dataset were crafted using LinkedIn features • We cannot conclude that such behavior would not be found outside our dataset, or in future. 20
  • 22. Unifying the Global Response to Cybercrime Thanks! Prateek Dewan E: prateekd@iiitd.ac.in W: http://precog.iiitd.edu.in/people/prateek 21
  • 23. Unifying the Global Response to Cybercrime Backup slides…
  • 24. Results (SPEAR v/s SPAM) contd. Unifying the Global Response to Cybercrime
  • 25. Unifying the Global Response to Cybercrime Attachment names
  • 26. Results (SPEAR v/s BENIGN) contd. ß Benign emails Spear phishing à
  • 27. Unifying the Global Response to Cybercrime Attachment types
  • 28. Details of organizations Unifying the Global Response to Cybercrime