SlideShare a Scribd company logo
1 of 24
Shady Paths: Leveraging
Surfing Crowds to Detect
Malicious Web Pages
Gianluca Stringhini, Christopher Kruegel, and Giovanni Vigna
University of California, Santa Barbara
The Web is a Dangerous
Place

• Drive-by downloads
• Social engineering

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages

2
Current Detection Techniques
Static Analysis

Dynamic Analysis

Suspicious elements in
• URLs
• JavaScript
• Flash

Visit the web page (honeyclients)
• Signs of exploitation

Obfuscation

Cloaking

Can only detect attacks that
exploit vulnerabilities!
Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages

3
Our Technique

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages

4
Redirection Graphs

No need to
analyze the
final page!

By analyzing the characteristics of the set of visitors and of the redirection
graph, we can determine if the destination page is malicious
Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages

5
Legitimate Uses of
Redirections
• Inform that a web page has moved
• Login functionalities
• Advertisements

We cannot flag all redirections as malicious
Luckily, malicious redirection graphs look different

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages

6
Malicious Redirection Graphs
Uniform software configuration

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages

7
Malicious Redirection Graphs
Cross-domain redirections

evil.co.cc

malicious.ru

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages

8
Malicious Redirection Graphs
“Hubs” to aggregate traffic

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages

9
Malicious Redirection Graphs
“Infected” websites

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages

10
System Overview
Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages

11
Our System: SpiderWeb
We leverage the differences between
legitimate and malicious redirection
graphs for detection
Three components:
• Data collection
• Creation of redirection graphs
• Classification component
Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages

12
Data Collection
SpiderWeb needs a set of
navigation data from a
diverse population of users
Dataset obtained from a
large AV vendor
• Users of a browser
security tool
• Data collection was optin only
• Data was anonymized
Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages

13
Creation of Redirection
Graphs

b.com

c.com

d.com

c.com

a.com

d.com

c.com

d.com

When we specify the final page, we allow wildcards
(e.g., malicious.com/*) → Groupings
We need to discard groupings that are too general
Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages

14
Classification Component
Five categories of features
• Client features (3 features)
• Referrer features (4 features)
• Landing page features (4 features)
• Final page features (5 features)

}

how diverse are
these elements

Distinct URLs, Parameters, TLD, Domain is an IP

• Redirection graph features (12 features)
Length of chains, same country across referrer and final page,
intra-domain redirections, hubs

We use Support Vector Machines for classification
Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages

15
Evaluation
Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages

16
Evaluation Dataset
388,098 redirection chains, collected over two months
• 34,011 final URLs
• 13,780 distinct user IP addresses per week
• 145 countries

Labeled dataset for training
•
•

2,533 redirection chains leading to 1,854 malicious URLs
2,466 redirection chains leading to 510 legitimate URLs

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages

17
Analysis of the Classifier
SpiderWeb’s performance depends on the redirection graph
complexity
• Complexity ≥ 6 causes no FPs and no FNs
• Our dataset is limited → we discard graphs with complexity < 4
We need to accept a certain amount of FPs and FNs
Full URL grouping: 1.2% FP rate, 17% FN rate
Redirection-graph specific features are the most important:
Without them, FNs raise to 67%

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages

18
Detection in the Wild
3,549 redirection graphs with complexity ≥ 4

564 flagged as malicious → 3,368 URLs
778 URLs undetected by the AV vendor
• We could not confirm 1.5% of them
• Effectively complements state of the art

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages

19
Comparison with Previous
Work
A few previous systems leverage redirection information to
detect malicious web pages
These systems also use other type of information
• WarningBird: uses Twitter profile information
• SURF: SEO specific
If this additional information is not present, SpiderWeb
outperforms previous systems

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages

20
Possible Use Cases
Offline detection (blacklist)
Online detection
Users get infected until the required “complexity” is reached
We performed a chronological experiment
SpiderWeb would have protected 93% users

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages

21
Discussion
Limitations
• Graphs with high complexity are required
• Groupings are not perfect
• Attackers might redirect users to legitimate pages

Attackers might make their redirections look legitimate
• Stop using cloaking (easier to detect by previous work)
• Stop using hubs (raises the bar)

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages

22
Conclusions
• We showed that malicious and legitimate
redirection graphs differ
• We presented a system that analyzes redirection
graphs to detect malicious web pages
• We showed that our system is effective, and
complements existing systems

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages

23
Questions?
gianluca@cs.ucsb.edu
@gianlucaSB

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages

24

More Related Content

Similar to Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages

Chasing web-based malware
Chasing web-based malwareChasing web-based malware
Chasing web-based malwareFACE
 
Report - Final_New_phishila
Report - Final_New_phishilaReport - Final_New_phishila
Report - Final_New_phishilaAshwin Palani
 
Phishing Website Detection by Machine Learning Techniques Presentation.pdf
Phishing Website Detection by Machine Learning Techniques Presentation.pdfPhishing Website Detection by Machine Learning Techniques Presentation.pdf
Phishing Website Detection by Machine Learning Techniques Presentation.pdfVaralakshmiKC
 
State of the Art Analysis Approach for Identification of the Malignant URLs
State of the Art Analysis Approach for Identification of the Malignant URLsState of the Art Analysis Approach for Identification of the Malignant URLs
State of the Art Analysis Approach for Identification of the Malignant URLsIOSRjournaljce
 
Compromised Website Report 2012
Compromised Website Report 2012Compromised Website Report 2012
Compromised Website Report 2012Cyren, Inc
 
Understanding and Mitigating the Security Risks of Content Inclusion in Web B...
Understanding and Mitigating the Security Risks of Content Inclusion in Web B...Understanding and Mitigating the Security Risks of Content Inclusion in Web B...
Understanding and Mitigating the Security Risks of Content Inclusion in Web B...Sajjad "JJ" Arshad
 
AutoBLG by Sun Bo
AutoBLG by Sun Bo AutoBLG by Sun Bo
AutoBLG by Sun Bo mori_tatsuya
 
The Personal and Website Security Mindset
The Personal and Website Security MindsetThe Personal and Website Security Mindset
The Personal and Website Security MindsetAdam W. Warner
 
NZNOG 2022: Routing Security
NZNOG 2022: Routing SecurityNZNOG 2022: Routing Security
NZNOG 2022: Routing SecurityAPNIC
 
Cyber Security Project : Comprehensive Vulnerability Analysis Report.pptx
Cyber Security Project : Comprehensive Vulnerability Analysis Report.pptxCyber Security Project : Comprehensive Vulnerability Analysis Report.pptx
Cyber Security Project : Comprehensive Vulnerability Analysis Report.pptxBoston Institute of Analytics
 
How i'm going to own your organization v2
How i'm going to own your organization v2How i'm going to own your organization v2
How i'm going to own your organization v2RazorEQX
 
Heat seeking honeypot
Heat seeking honeypotHeat seeking honeypot
Heat seeking honeypotAmeya Vp
 
PhD Thesis presentation
PhD Thesis presentationPhD Thesis presentation
PhD Thesis presentationJavier Ortega
 
Browser isolation (isc)2 may presentation v2
Browser isolation (isc)2 may presentation v2Browser isolation (isc)2 may presentation v2
Browser isolation (isc)2 may presentation v2Wen-Pai Lu
 
Malware detection-using-machine-learning
Malware detection-using-machine-learningMalware detection-using-machine-learning
Malware detection-using-machine-learningSecurity Bootcamp
 
Detection of Phishing Websites
Detection of Phishing Websites Detection of Phishing Websites
Detection of Phishing Websites Nikhil Soni
 
Practical White Hat Hacker Training - Passive Information Gathering(OSINT)
Practical White Hat Hacker Training -  Passive Information Gathering(OSINT)Practical White Hat Hacker Training -  Passive Information Gathering(OSINT)
Practical White Hat Hacker Training - Passive Information Gathering(OSINT)PRISMA CSI
 

Similar to Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages (20)

Chasing web-based malware
Chasing web-based malwareChasing web-based malware
Chasing web-based malware
 
Report - Final_New_phishila
Report - Final_New_phishilaReport - Final_New_phishila
Report - Final_New_phishila
 
Phishing Website Detection by Machine Learning Techniques Presentation.pdf
Phishing Website Detection by Machine Learning Techniques Presentation.pdfPhishing Website Detection by Machine Learning Techniques Presentation.pdf
Phishing Website Detection by Machine Learning Techniques Presentation.pdf
 
ppt presentation
ppt presentationppt presentation
ppt presentation
 
State of the Art Analysis Approach for Identification of the Malignant URLs
State of the Art Analysis Approach for Identification of the Malignant URLsState of the Art Analysis Approach for Identification of the Malignant URLs
State of the Art Analysis Approach for Identification of the Malignant URLs
 
Compromised Website Report 2012
Compromised Website Report 2012Compromised Website Report 2012
Compromised Website Report 2012
 
Understanding and Mitigating the Security Risks of Content Inclusion in Web B...
Understanding and Mitigating the Security Risks of Content Inclusion in Web B...Understanding and Mitigating the Security Risks of Content Inclusion in Web B...
Understanding and Mitigating the Security Risks of Content Inclusion in Web B...
 
AutoBLG by Sun Bo
AutoBLG by Sun Bo AutoBLG by Sun Bo
AutoBLG by Sun Bo
 
The Personal and Website Security Mindset
The Personal and Website Security MindsetThe Personal and Website Security Mindset
The Personal and Website Security Mindset
 
NZNOG 2022: Routing Security
NZNOG 2022: Routing SecurityNZNOG 2022: Routing Security
NZNOG 2022: Routing Security
 
Cyber Security Project : Comprehensive Vulnerability Analysis Report.pptx
Cyber Security Project : Comprehensive Vulnerability Analysis Report.pptxCyber Security Project : Comprehensive Vulnerability Analysis Report.pptx
Cyber Security Project : Comprehensive Vulnerability Analysis Report.pptx
 
A SOFT COMPUTING APPROACH FOR BENIGN AND MALICIOUS WEB ROBOT DETECTION
A SOFT COMPUTING APPROACH FOR BENIGN AND MALICIOUS WEB ROBOT DETECTIONA SOFT COMPUTING APPROACH FOR BENIGN AND MALICIOUS WEB ROBOT DETECTION
A SOFT COMPUTING APPROACH FOR BENIGN AND MALICIOUS WEB ROBOT DETECTION
 
How i'm going to own your organization v2
How i'm going to own your organization v2How i'm going to own your organization v2
How i'm going to own your organization v2
 
Heat seeking honeypot
Heat seeking honeypotHeat seeking honeypot
Heat seeking honeypot
 
PhD Thesis presentation
PhD Thesis presentationPhD Thesis presentation
PhD Thesis presentation
 
Browser isolation (isc)2 may presentation v2
Browser isolation (isc)2 may presentation v2Browser isolation (isc)2 may presentation v2
Browser isolation (isc)2 may presentation v2
 
Malware detection-using-machine-learning
Malware detection-using-machine-learningMalware detection-using-machine-learning
Malware detection-using-machine-learning
 
Detection of Phishing Websites
Detection of Phishing Websites Detection of Phishing Websites
Detection of Phishing Websites
 
Practical White Hat Hacker Training - Passive Information Gathering(OSINT)
Practical White Hat Hacker Training -  Passive Information Gathering(OSINT)Practical White Hat Hacker Training -  Passive Information Gathering(OSINT)
Practical White Hat Hacker Training - Passive Information Gathering(OSINT)
 
Web crawler
Web crawlerWeb crawler
Web crawler
 

More from Gianluca Stringhini

The Harvester, the Botmaster, and the Spammer: On the Relations Between the D...
The Harvester, the Botmaster, and the Spammer: On the Relations Between the D...The Harvester, the Botmaster, and the Spammer: On the Relations Between the D...
The Harvester, the Botmaster, and the Spammer: On the Relations Between the D...Gianluca Stringhini
 
The Tricks of the Trade: What Makes Spam Campaigns Successful?
The Tricks of the Trade: What Makes Spam Campaigns Successful?The Tricks of the Trade: What Makes Spam Campaigns Successful?
The Tricks of the Trade: What Makes Spam Campaigns Successful?Gianluca Stringhini
 
Follow the Green: Growth and Dynamics on Twitter Follower Markets
Follow the Green: Growth and Dynamics on Twitter Follower MarketsFollow the Green: Growth and Dynamics on Twitter Follower Markets
Follow the Green: Growth and Dynamics on Twitter Follower MarketsGianluca Stringhini
 
Detecting Spammers on Social Networks
Detecting Spammers on Social NetworksDetecting Spammers on Social Networks
Detecting Spammers on Social NetworksGianluca Stringhini
 
The Spammer, the Botmaster, and the Researcher: On the Arms Race in Spamming ...
The Spammer, the Botmaster, and the Researcher: On the Arms Race in Spamming ...The Spammer, the Botmaster, and the Researcher: On the Arms Race in Spamming ...
The Spammer, the Botmaster, and the Researcher: On the Arms Race in Spamming ...Gianluca Stringhini
 
BotMagnifier: Locating Spambots on the Internet
BotMagnifier: Locating Spambots on the InternetBotMagnifier: Locating Spambots on the Internet
BotMagnifier: Locating Spambots on the InternetGianluca Stringhini
 

More from Gianluca Stringhini (6)

The Harvester, the Botmaster, and the Spammer: On the Relations Between the D...
The Harvester, the Botmaster, and the Spammer: On the Relations Between the D...The Harvester, the Botmaster, and the Spammer: On the Relations Between the D...
The Harvester, the Botmaster, and the Spammer: On the Relations Between the D...
 
The Tricks of the Trade: What Makes Spam Campaigns Successful?
The Tricks of the Trade: What Makes Spam Campaigns Successful?The Tricks of the Trade: What Makes Spam Campaigns Successful?
The Tricks of the Trade: What Makes Spam Campaigns Successful?
 
Follow the Green: Growth and Dynamics on Twitter Follower Markets
Follow the Green: Growth and Dynamics on Twitter Follower MarketsFollow the Green: Growth and Dynamics on Twitter Follower Markets
Follow the Green: Growth and Dynamics on Twitter Follower Markets
 
Detecting Spammers on Social Networks
Detecting Spammers on Social NetworksDetecting Spammers on Social Networks
Detecting Spammers on Social Networks
 
The Spammer, the Botmaster, and the Researcher: On the Arms Race in Spamming ...
The Spammer, the Botmaster, and the Researcher: On the Arms Race in Spamming ...The Spammer, the Botmaster, and the Researcher: On the Arms Race in Spamming ...
The Spammer, the Botmaster, and the Researcher: On the Arms Race in Spamming ...
 
BotMagnifier: Locating Spambots on the Internet
BotMagnifier: Locating Spambots on the InternetBotMagnifier: Locating Spambots on the Internet
BotMagnifier: Locating Spambots on the Internet
 

Recently uploaded

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 

Recently uploaded (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages

  • 1. Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages Gianluca Stringhini, Christopher Kruegel, and Giovanni Vigna University of California, Santa Barbara
  • 2. The Web is a Dangerous Place • Drive-by downloads • Social engineering Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 2
  • 3. Current Detection Techniques Static Analysis Dynamic Analysis Suspicious elements in • URLs • JavaScript • Flash Visit the web page (honeyclients) • Signs of exploitation Obfuscation Cloaking Can only detect attacks that exploit vulnerabilities! Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 3
  • 4. Our Technique Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 4
  • 5. Redirection Graphs No need to analyze the final page! By analyzing the characteristics of the set of visitors and of the redirection graph, we can determine if the destination page is malicious Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 5
  • 6. Legitimate Uses of Redirections • Inform that a web page has moved • Login functionalities • Advertisements We cannot flag all redirections as malicious Luckily, malicious redirection graphs look different Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 6
  • 7. Malicious Redirection Graphs Uniform software configuration Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 7
  • 8. Malicious Redirection Graphs Cross-domain redirections evil.co.cc malicious.ru Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 8
  • 9. Malicious Redirection Graphs “Hubs” to aggregate traffic Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 9
  • 10. Malicious Redirection Graphs “Infected” websites Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 10
  • 11. System Overview Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 11
  • 12. Our System: SpiderWeb We leverage the differences between legitimate and malicious redirection graphs for detection Three components: • Data collection • Creation of redirection graphs • Classification component Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 12
  • 13. Data Collection SpiderWeb needs a set of navigation data from a diverse population of users Dataset obtained from a large AV vendor • Users of a browser security tool • Data collection was optin only • Data was anonymized Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 13
  • 14. Creation of Redirection Graphs b.com c.com d.com c.com a.com d.com c.com d.com When we specify the final page, we allow wildcards (e.g., malicious.com/*) → Groupings We need to discard groupings that are too general Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 14
  • 15. Classification Component Five categories of features • Client features (3 features) • Referrer features (4 features) • Landing page features (4 features) • Final page features (5 features) } how diverse are these elements Distinct URLs, Parameters, TLD, Domain is an IP • Redirection graph features (12 features) Length of chains, same country across referrer and final page, intra-domain redirections, hubs We use Support Vector Machines for classification Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 15
  • 16. Evaluation Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 16
  • 17. Evaluation Dataset 388,098 redirection chains, collected over two months • 34,011 final URLs • 13,780 distinct user IP addresses per week • 145 countries Labeled dataset for training • • 2,533 redirection chains leading to 1,854 malicious URLs 2,466 redirection chains leading to 510 legitimate URLs Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 17
  • 18. Analysis of the Classifier SpiderWeb’s performance depends on the redirection graph complexity • Complexity ≥ 6 causes no FPs and no FNs • Our dataset is limited → we discard graphs with complexity < 4 We need to accept a certain amount of FPs and FNs Full URL grouping: 1.2% FP rate, 17% FN rate Redirection-graph specific features are the most important: Without them, FNs raise to 67% Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 18
  • 19. Detection in the Wild 3,549 redirection graphs with complexity ≥ 4 564 flagged as malicious → 3,368 URLs 778 URLs undetected by the AV vendor • We could not confirm 1.5% of them • Effectively complements state of the art Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 19
  • 20. Comparison with Previous Work A few previous systems leverage redirection information to detect malicious web pages These systems also use other type of information • WarningBird: uses Twitter profile information • SURF: SEO specific If this additional information is not present, SpiderWeb outperforms previous systems Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 20
  • 21. Possible Use Cases Offline detection (blacklist) Online detection Users get infected until the required “complexity” is reached We performed a chronological experiment SpiderWeb would have protected 93% users Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 21
  • 22. Discussion Limitations • Graphs with high complexity are required • Groupings are not perfect • Attackers might redirect users to legitimate pages Attackers might make their redirections look legitimate • Stop using cloaking (easier to detect by previous work) • Stop using hubs (raises the bar) Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 22
  • 23. Conclusions • We showed that malicious and legitimate redirection graphs differ • We presented a system that analyzes redirection graphs to detect malicious web pages • We showed that our system is effective, and complements existing systems Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 23
  • 24. Questions? gianluca@cs.ucsb.edu @gianlucaSB Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 24