Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

605 views

Published on

Mastino, a novel defense system to detect malware download events. A download event is a 3-tuple that identifies the action of downloading a file from a URL that was triggered by a client (machine). Mastino utilizes global situation awareness and continuously monitors various network- and system-level events of the clients' machines across the Internet and provides real time classification of both files and URLs to the clients upon submission of a new, unknown file or URL to the system. To enable detection of the download events, Mastino builds a large download graph that captures the subtle relationships among the entities of download events, i.e. files, URLs, and machines. We implemented a prototype version of Mastino and evaluated it in a large-scale real-world deployment. Our experimental evaluation shows that Mastino can accurately classify malware download events with an average of 95.5% true positive (TP), while incurring less than 0.5% false positives (FP). In addition, we show the Mastino can classify a new download event as either benign or malware in just a fraction of a second, and is therefore suitable as a real time defense system.

Published in: Internet
  • Be the first to comment

Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

  1. 1. Real-Time Detection of Malware Downloads via Large-Scale URL→File→Machine Graph Mining Babak Rahbarinia ; Marco Balduzzi ; Roberto Perdisci AsiaCCS 2016, June 02, Xi’an, China 1
  2. 2. Introduction Traditional AV is dead? Signature-based VS. Statistical-based Traditional AVs inefficiency (they don’t work!) polymorphism, code obfuscation, packers, ... URL blacklisting static, lags behind time consuming analysis of individual URLs Local VS. Global Local: looks at one potential malware at a time Global: leverages global situational awareness 2
  3. 3. Introduction Large-scale analysis of behavioral patterns “Who - where - what” relationship Global situation awareness Graph-based machine learning Combination of system- and network-level info Mastino: Real-time and concurrent detection of download events Real-world deployment on million of machines (Internet-scale) 3
  4. 4. Approach 4
  5. 5. Approach 5
  6. 6. Static+dynamic detection [Many] Graph mining detection: Polonium [KDD10] Offline approach VS real-time Only files classification VS + URLs (download event) Bipartite VS tripartite graph Proprietary reputation function VS open AMICO [Esorics13] HTTP-centric VS protocol-independent Only works in LANs VS “move across networks” Google’s CAMP [NDSS13] Browser-centric VS system-centric (Quick) Related Work 6
  7. 7. Download Graph URLs Files Machines 7
  8. 8. Annotations URLs Files Machines ● Age of URL, domain, path, IP ● Size ● Lifetime, prevalence ● Packed, signed ● Download behavior ● Client processes8
  9. 9. URLs Files Machines Labeling Machines’ reputations based on their download/activity history 9 ● B: Alexa (-hosting) ● M: GSB + WRS ● B: Grid + VT ● M: VT
  10. 10. Features and classifier f url1 url2 url3 f behavior-based features = {URL stats, machine stats} url4 machine1 machine3machine2 compute min, max, med, avg, and std compute min, max, med, avg, and std URL’s R + R of [FQD, e2LD, path, path pattern, query string, query pattern] Machine’s R Files Features 10
  11. 11. Features and classifier f url1 url2 url3 f behavior-based features = {URL stats, machine stats} url4 machine1 machine3machine2 compute min, max, med, avg, and std compute min, max, med, avg, and std URL’s R + R of [FQD, e2LD, path, path pattern, query string, query pattern] Machine’s R f intrinsic features = {file size, prevalence, packed, signed, ...} + Files Features 11
  12. 12. Features and classifier f url1 url2 url3 f behavior-based features = {URL stats, machine stats} url4 machine1 machine3machine2 compute min, max, med, avg, and std compute min, max, med, avg, and std URL’s R + R of [FQD, e2LD, path, path pattern, query string, query pattern] Machine’s R f intrinsic features = {file size, prevalence, packed, signed, ...} Files Features URLs Features u + {all URLs sharing a component with u} file1 file2 file3 u behavior-based features = {files stats, machine stats} file4 machine1 machine3machine2 compute min, max, med, avg, and std compute min, max, med, avg, and std File’s R Machine’s R + 12
  13. 13. Features and classifier URLs Features u + {all URLs sharing a component with u} file1 file2 file3 u behavior-based features = {files stats, machine stats} file4 machine1 machine3machine2 compute min, max, med, avg, and std compute min, max, med, avg, and std File’s R Machine’s R u intrinsic features = {URL, FQD, e2LD recency} + f url1 url2 url3 f behavior-based features = {URL stats, machine stats} url4 machine1 machine3machine2 compute min, max, med, avg, and std compute min, max, med, avg, and std URL’s R + R of [FQD, e2LD, path, path pattern, query string, query pattern] Machine’s R f intrinsic features = {file size, prevalence, packed, signed, ...} Files Features + 13
  14. 14. Example #1 U1 U2 URLs Files Machines F2 F1 F3 G1 G2 What could be said about F1 and F2? 14
  15. 15. Example #1 URLs Files Machines F2 F1 What could be said about F1 and F2? 15
  16. 16. Example #1 URLs Files Machines F2 F1 What could be said about F1 and F2? 16
  17. 17. Example #2 u URLs Files What could be said about F1? All neighbors are unknown F1 Machines 17
  18. 18. Example #2 u URLs Files FQD Path All URLs that share the same components as u Machines All URL components: * FQD * e2LD * Path * Path pattern * Query string * Query string pattern * IP * IP/24 18 F1
  19. 19. Example #2 u URLs Files FQD Path All URLs that share the same components as u Machines 19 F1
  20. 20. Example #2 u URLs Files FQD Path All URLs that share the same components as u Machines F1 20
  21. 21. Deployment Time Day 1 Day 2 Today ... Yesterday 21 Time Window of 10 days
  22. 22. Deployment Time Day 1 Day 2 Today ... Yesterday Trained classifiers URL classifier SHA1 classifier Real-time classification of URLs & SHA1s Detection of Malicious Download Events 22
  23. 23. Data Collection 7 months of data (Jan to Aug 2014) d = (u; f; m) Hundreds of thousands of machines, files, urls Million of nodes Labeling: Files: VirusTotal, GRID [Trend] URLs: Alexa, Google Safe Browsing, WRS [Trend] Annotations: File census and GUID census [Trend] Virus Total (signed..) 23
  24. 24. Train & test for new download events New download events Detection results new events over 7 periods of 5 days (35 days, total) Files URLs 24
  25. 25. Combined detection of download events (u = m) v (f = m) -> d = m 1 day experiment (5 months) Efficiency: requests are served in ~0.16 sec 84% of detection: 0-days (unknown) 25
  26. 26. Wuachos.A Dropper Filename file_saw.exe URLs with _no_ reputation Low prevalence Invalid signature Path pattern with R of 0.72 (malicious) [*] 1,445 URLs serving 182 polymorphic malware [*] /f/1392240240/1255385580/2 , /f/1392240120/4165299987/2 -> /H1/I10/I10/I1 Case Study #1 26
  27. 27. Somoto Adware Filename FreeZipSetup-[d].exe Packed, short lifetime, prevalence = 0 1 related machine downloaded 1 known sample during our time window T=10days Detected a campaign of 695 samples 616 were unknown to VirusTotal 61 unknown +6 months Case Study #2 27
  28. 28. TTAWinCDM Spyware Machine and URL with _no_ reputation Low lifetime&prevelance&countries Mismatch on downloading process Acrobat process VS. Unauthoritative domain Flash 0-day (+2 month) Case Study #3 28
  29. 29. Analysis of Window T Bonus #1 29
  30. 30. Features Analysis Bonus #2 30 Files analysis URLs analysis
  31. 31. Mastino: real-time detection of malware downloads by passive clients monitoring Content agnostic, behavioral analysis Real-world deployment on large-scale Over 95% TP / 0.5% FP 0-days Conclusions 31
  32. 32. Thank you! @embyte http://www.madlab.it Babak Rahbarinia ; Marco Balduzzi ; Roberto Perdisci Questions? 32

×