Shady Paths: Leveraging
Surfing Crowds to Detect
Malicious Web Pages
Gianluca Stringhini, Christopher Kruegel, and Giovann...
The Web is a Dangerous
Place

• Drive-by downloads
• Social engineering

Shady Paths: Leveraging Surfing Crowds to Detect ...
Current Detection Techniques
Static Analysis

Dynamic Analysis

Suspicious elements in
• URLs
• JavaScript
• Flash

Visit ...
Our Technique

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages

4
Redirection Graphs

No need to
analyze the
final page!

By analyzing the characteristics of the set of visitors and of the...
Legitimate Uses of
Redirections
• Inform that a web page has moved
• Login functionalities
• Advertisements

We cannot fla...
Malicious Redirection Graphs
Uniform software configuration

Shady Paths: Leveraging Surfing Crowds to Detect Malicious We...
Malicious Redirection Graphs
Cross-domain redirections

evil.co.cc

malicious.ru

Shady Paths: Leveraging Surfing Crowds t...
Malicious Redirection Graphs
“Hubs” to aggregate traffic

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web P...
Malicious Redirection Graphs
“Infected” websites

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages

10
System Overview
Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages

11
Our System: SpiderWeb
We leverage the differences between
legitimate and malicious redirection
graphs for detection
Three ...
Data Collection
SpiderWeb needs a set of
navigation data from a
diverse population of users
Dataset obtained from a
large ...
Creation of Redirection
Graphs

b.com

c.com

d.com

c.com

a.com

d.com

c.com

d.com

When we specify the final page, we...
Classification Component
Five categories of features
• Client features (3 features)
• Referrer features (4 features)
• Lan...
Evaluation
Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages

16
Evaluation Dataset
388,098 redirection chains, collected over two months
• 34,011 final URLs
• 13,780 distinct user IP add...
Analysis of the Classifier
SpiderWeb’s performance depends on the redirection graph
complexity
• Complexity ≥ 6 causes no ...
Detection in the Wild
3,549 redirection graphs with complexity ≥ 4

564 flagged as malicious → 3,368 URLs
778 URLs undetec...
Comparison with Previous
Work
A few previous systems leverage redirection information to
detect malicious web pages
These ...
Possible Use Cases
Offline detection (blacklist)
Online detection
Users get infected until the required “complexity” is re...
Discussion
Limitations
• Graphs with high complexity are required
• Groupings are not perfect
• Attackers might redirect u...
Conclusions
• We showed that malicious and legitimate
redirection graphs differ
• We presented a system that analyzes redi...
Questions?
gianluca@cs.ucsb.edu
@gianlucaSB

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages

24
Upcoming SlideShare
Loading in...5
×

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages

806

Published on

Slides of my talk at CCS 2013

Published in: Technology, Design
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
806
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages

  1. 1. Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages Gianluca Stringhini, Christopher Kruegel, and Giovanni Vigna University of California, Santa Barbara
  2. 2. The Web is a Dangerous Place • Drive-by downloads • Social engineering Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 2
  3. 3. Current Detection Techniques Static Analysis Dynamic Analysis Suspicious elements in • URLs • JavaScript • Flash Visit the web page (honeyclients) • Signs of exploitation Obfuscation Cloaking Can only detect attacks that exploit vulnerabilities! Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 3
  4. 4. Our Technique Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 4
  5. 5. Redirection Graphs No need to analyze the final page! By analyzing the characteristics of the set of visitors and of the redirection graph, we can determine if the destination page is malicious Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 5
  6. 6. Legitimate Uses of Redirections • Inform that a web page has moved • Login functionalities • Advertisements We cannot flag all redirections as malicious Luckily, malicious redirection graphs look different Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 6
  7. 7. Malicious Redirection Graphs Uniform software configuration Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 7
  8. 8. Malicious Redirection Graphs Cross-domain redirections evil.co.cc malicious.ru Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 8
  9. 9. Malicious Redirection Graphs “Hubs” to aggregate traffic Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 9
  10. 10. Malicious Redirection Graphs “Infected” websites Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 10
  11. 11. System Overview Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 11
  12. 12. Our System: SpiderWeb We leverage the differences between legitimate and malicious redirection graphs for detection Three components: • Data collection • Creation of redirection graphs • Classification component Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 12
  13. 13. Data Collection SpiderWeb needs a set of navigation data from a diverse population of users Dataset obtained from a large AV vendor • Users of a browser security tool • Data collection was optin only • Data was anonymized Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 13
  14. 14. Creation of Redirection Graphs b.com c.com d.com c.com a.com d.com c.com d.com When we specify the final page, we allow wildcards (e.g., malicious.com/*) → Groupings We need to discard groupings that are too general Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 14
  15. 15. Classification Component Five categories of features • Client features (3 features) • Referrer features (4 features) • Landing page features (4 features) • Final page features (5 features) } how diverse are these elements Distinct URLs, Parameters, TLD, Domain is an IP • Redirection graph features (12 features) Length of chains, same country across referrer and final page, intra-domain redirections, hubs We use Support Vector Machines for classification Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 15
  16. 16. Evaluation Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 16
  17. 17. Evaluation Dataset 388,098 redirection chains, collected over two months • 34,011 final URLs • 13,780 distinct user IP addresses per week • 145 countries Labeled dataset for training • • 2,533 redirection chains leading to 1,854 malicious URLs 2,466 redirection chains leading to 510 legitimate URLs Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 17
  18. 18. Analysis of the Classifier SpiderWeb’s performance depends on the redirection graph complexity • Complexity ≥ 6 causes no FPs and no FNs • Our dataset is limited → we discard graphs with complexity < 4 We need to accept a certain amount of FPs and FNs Full URL grouping: 1.2% FP rate, 17% FN rate Redirection-graph specific features are the most important: Without them, FNs raise to 67% Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 18
  19. 19. Detection in the Wild 3,549 redirection graphs with complexity ≥ 4 564 flagged as malicious → 3,368 URLs 778 URLs undetected by the AV vendor • We could not confirm 1.5% of them • Effectively complements state of the art Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 19
  20. 20. Comparison with Previous Work A few previous systems leverage redirection information to detect malicious web pages These systems also use other type of information • WarningBird: uses Twitter profile information • SURF: SEO specific If this additional information is not present, SpiderWeb outperforms previous systems Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 20
  21. 21. Possible Use Cases Offline detection (blacklist) Online detection Users get infected until the required “complexity” is reached We performed a chronological experiment SpiderWeb would have protected 93% users Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 21
  22. 22. Discussion Limitations • Graphs with high complexity are required • Groupings are not perfect • Attackers might redirect users to legitimate pages Attackers might make their redirections look legitimate • Stop using cloaking (easier to detect by previous work) • Stop using hubs (raises the bar) Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 22
  23. 23. Conclusions • We showed that malicious and legitimate redirection graphs differ • We presented a system that analyzes redirection graphs to detect malicious web pages • We showed that our system is effective, and complements existing systems Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 23
  24. 24. Questions? gianluca@cs.ucsb.edu @gianlucaSB Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 24
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×