Finding Diversity In Remote Code Injection Exploits


Published on

Published in: Technology
  • Well done! very nicely explained
    Are you sure you want to  Yes  No
    Your message goes here
  • Very informative slide, and nice template used.

    Roy Jan
    Are you sure you want to  Yes  No
    Your message goes here
  • Using imagery in this presentation is very effective. You've done a fantastic job here friend.
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Finding Diversity In Remote Code Injection Exploits

  1. 1. Finding Diversity in Remote Code Injection Exploits Justin Ma , John Dunagan , Helen J. Wang , Stefan Savage , Geoffrey M. Voelker University of California, San Diego Microsoft Research Internet Measurement Conference 2006
  2. 2. Outline <ul><li>Introduction </li></ul><ul><li>Background and Related Work </li></ul><ul><li>Methodology </li></ul><ul><li>Exploit Diversity </li></ul><ul><li>Discussion and Conclusion </li></ul>
  3. 3. Introduction <ul><li>Internet users are increasingly victimized by online criminal enterprise that spans denial-of-service extortion, identity theft, piracy and unsolicited bulk email </li></ul><ul><li>At the core of these activities is malware </li></ul><ul><ul><li>Software used to remotely compromise and harness the resources of millions of hosts </li></ul></ul><ul><li>There is little research describing the malware ecosystem itself </li></ul><ul><ul><li>How does one piece of malware relate to another? </li></ul></ul><ul><ul><li>What pressures drive its structural and functional evolution? </li></ul></ul><ul><li>This paper focuses on how to identify and measure the diversity among remote code injection exploits </li></ul>
  4. 4. Introduction (cont’d) <ul><li>Typically, a host is compromised via a software vulnerability (e.g. buffer overflow ) that allows network-based input to be “injected” into a running program and executed. </li></ul><ul><ul><li>Subsequently, the exploit payload may </li></ul></ul><ul><ul><ul><li>Download additional software </li></ul></ul></ul><ul><ul><ul><li>Reconfigure the OS to evade detection, etc. </li></ul></ul></ul><ul><li>This paper focuses on the exploit and its initial payload -the so-called shellcodes. </li></ul><ul><ul><li>Shellcodes : </li></ul></ul><ul><ul><ul><li>First executed on a newly compromised machine </li></ul></ul></ul><ul><ul><ul><li>Are typically small, simple, hand-coded machine programs </li></ul></ul></ul><ul><ul><ul><li>Are well-suited to automated analysis </li></ul></ul></ul><ul><ul><li>Understand how much variation exists among the shellcodes for an exploit </li></ul></ul><ul><ul><li>Measure shellcode diversity </li></ul></ul><ul><ul><ul><li>better understand how malware is created. </li></ul></ul></ul><ul><ul><ul><li>Infer the paternity of different samples </li></ul></ul></ul><ul><ul><ul><li>Construct a shellcode phylogeny </li></ul></ul></ul>
  5. 5. Outline <ul><li>Introduction </li></ul><ul><li>Background and Related Work </li></ul><ul><li>Methodology </li></ul><ul><li>Exploit Diversity </li></ul><ul><li>Discussion and Conclusion </li></ul>
  6. 6. Background <ul><li>Remote code injection attacks are a combination of vulnerability, exploit and shellcode </li></ul><ul><ul><li>The vulnerability is the particular software structure that allows data provided over the network to subvert and redirect execution control flow </li></ul></ul><ul><ul><ul><li>An unchecked buffer </li></ul></ul></ul><ul><ul><ul><li>Overwrite the return address of the calling stack frame </li></ul></ul></ul><ul><ul><li>An exploit is a particular formulation of a attack against a vulnerability </li></ul></ul><ul><ul><li>The shellcode is the payload carried by the exploit—it is the first code to execute </li></ul></ul>
  7. 7. Stack Buffer Overflow Simple example of a remote stack-based buffer overflow. The shaded regions represent the shellcode of the exploit as sent over network packets, then as injected into the vulnerable buffer of the target host. The return address has been overwritten with injected data, thereby redirecting the execution flow to the shellcode residing in the vulnerable buffer .
  8. 8. Background (cont’d) <ul><li>Shellcodes: </li></ul><ul><ul><li>Are frequently limited by </li></ul></ul><ul><ul><ul><li>the size of the buffer being processed </li></ul></ul></ul><ul><ul><ul><li>The need for the buffer to contain “NOP sleds” or long regions of consecutive “do nothing” instructions </li></ul></ul></ul><ul><ul><li>Can be quite sophisticated in their construction : </li></ul></ul><ul><ul><ul><li>The creation of pseudo-random NOP sleds </li></ul></ul></ul><ul><ul><ul><li>Polymorphic payloads that are encrypted (and potentially compressed) in transit and only decrypted just before execution </li></ul></ul></ul><ul><ul><ul><li>Some polymorphic shellcode generators also create random decryptors </li></ul></ul></ul>
  9. 9. Background (cont’d) <ul><li>Early attempts to defeat polymorphic : </li></ul><ul><ul><li>X-ray analysis </li></ul></ul><ul><ul><ul><li>Heuristically decode polymorphic codes based on a portion of known, decoded instance to recover the encryption key </li></ul></ul></ul><ul><ul><li>Generic decryption </li></ul></ul><ul><ul><ul><li>Emulate execution while the shellcode decrypts itself </li></ul></ul></ul><ul><ul><ul><li>Typically using a heuristic to guess when this process terminates </li></ul></ul></ul><ul><li>Having decoded a malware shellcode, comparing it to other shellcode is another key problem. Approaches: </li></ul><ul><ul><li>Model each shellcode as a binary string and use traditional lexical distance measures </li></ul></ul><ul><ul><li>Use structural distance measures that capture variation in the control flow and values at instruction, basic block, or function levels [11] </li></ul></ul>[11] C. Kruegel, E. Kirda, D. Mutz, W. Robertson, and G. Vigna. Polymorphic Worm Detection Using Structural Information of Executables. In Proceedings of Symposium on Recent Advances in Intrusion Detection (RAID) , Seattle, WA, Sept.2005.
  10. 10. Outline <ul><li>Introduction </li></ul><ul><li>Background and Related Work </li></ul><ul><li>Methodology </li></ul><ul><li>Exploit Diversity </li></ul><ul><li>Discussion and Conclusion </li></ul>
  11. 11. Methodology-Exploit Collection <ul><li>Primary means of collecting exploits is by examining network traces of traffic sent to and from active responders . </li></ul><ul><ul><li>Active responders: hosts that respond to unsolicited probes (exploit attempts) </li></ul></ul><ul><ul><li>Emulating end-host behavior allows us to collect more session data </li></ul></ul><ul><ul><ul><li>In particular, completing the infection handshake will suffice to cause the attack to transmit the shellcode </li></ul></ul></ul><ul><ul><li>For example: ISystemActivator and RemoteActivation exploit </li></ul></ul><ul><ul><ul><li>Require active responders to capture RPC Bind and Request </li></ul></ul></ul>
  12. 12. Methodology-Extracting Shellcodes <ul><li>Use Shield[29] to extract the shellcode for each exploit session from the traces </li></ul><ul><li>However, not all of the collected data corresponds to executable code. </li></ul><ul><ul><li>Execution starts at an offset within the vulnerable buffer </li></ul></ul><ul><ul><li>The buffer may contain random padding </li></ul></ul>[29] H. Wang, C. Guo, D. Simon, and A. Zugenmaier. Shield: Vulnerability-Driven Network Filters for Preventing Known Vulnerability Exploits. In Proceedings of the ACM SIGCOMM Conference , Portland, Oregon, Sept. 2004.
  13. 13. Methodology-Exploit Emulation <ul><li>Decoding the exploits is often necessary to reveal most of the actual executable code </li></ul><ul><li>The easiest way to deal with the variety of decoding routines is to use binary emulation </li></ul><ul><ul><li>We implement the emulator using Intel’s Pin[13] on Linux </li></ul></ul><ul><ul><li>Given an encoded shellcode, we first declare it as a statically allocated buffer in C source code that treats the buffer as a function </li></ul></ul><ul><ul><li>By Iteratively retrying failed emulations at subsequent offsets </li></ul></ul><ul><ul><ul><li>To overcome any issues with non-executable prefixes </li></ul></ul></ul><ul><ul><li>As Pin successfully emulates the binary, we mark the executed instruction bytes for later analysis </li></ul></ul>
  14. 14. Methodology-Clustering <ul><li>Agglomerative clustering </li></ul><ul><ul><li>A form of hierarchical clustering </li></ul></ul><ul><ul><li>Begins with each unique shellcode belonging to it own cluster. </li></ul></ul><ul><ul><li>Performs merging on the closest ( distance ) pair of clusters </li></ul></ul><ul><ul><li>Builds up a hierarchy of similarity among exploit samples by iteratively merging the closest pair of clusters at each step </li></ul></ul><ul><ul><li>distance between clusters: the distance between the furthest samples in the two respective clusters </li></ul></ul>
  15. 15. Methodology-Clustering (cont’d) <ul><li>Distance Metrics </li></ul><ul><ul><li>Exedit Distance </li></ul></ul><ul><ul><li>Edit Distance </li></ul></ul><ul><ul><ul><li>Does not distinguish code from data </li></ul></ul></ul><ul><ul><ul><li>Random padding generates further noise </li></ul></ul></ul><ul><ul><li>Structural Distance </li></ul></ul><ul><ul><ul><li>Control flow graph (CFG)[11] </li></ul></ul></ul><ul><ul><ul><li>Do not capture subtle variation between related exploits because entire basic blocks are summarized </li></ul></ul></ul>[11]C. Kruegel, E. Kirda, D. Mutz, W. Robertson, and G. Vigna. Polymorphic Worm Detection Using Structural Information of Executables. In Proceedings of Symposium on Recent Advances in Intrusion Detection (RAID), Seattle, WA, Sept. 2005
  16. 16. Methodology-Clustering (cont’d) <ul><li>Exedit distance metric </li></ul><ul><ul><li>Edit distance over executed parts of shellcode </li></ul></ul><ul><ul><ul><li>Distinguishes code from data </li></ul></ul></ul><ul><ul><ul><li>Maintains instruction-level details </li></ul></ul></ul>Canonical string for shellcode
  17. 17. Outline <ul><li>Introduction </li></ul><ul><li>Background and Related Work </li></ul><ul><li>Methodology </li></ul><ul><li>Exploit Diversity </li></ul><ul><li>Discussion and Conclusion </li></ul>
  18. 18. Exploit Diversity <ul><li>Four well-known vulnerabilities </li></ul><ul><ul><li>SQL Name Resolution (Slammer) </li></ul></ul><ul><ul><li>LSASS (Sasser) </li></ul></ul><ul><ul><li>MS RPC ISystemActivator (Blaster) </li></ul></ul><ul><ul><li>MS RPC RemoteActivation (Blaster) </li></ul></ul><ul><li>Use methodology from Section 3 to </li></ul><ul><ul><li>cluster the shellcodes according to their variability and thus identify shellcode families </li></ul></ul><ul><ul><li>provide a detailed characterization of each family to both convey the structure of shellcode families as well as the subtle functional variations among them </li></ul></ul><ul><ul><li>show the prevalence of each shellcode family in the trace </li></ul></ul><ul><li>The trace 1 </li></ul><ul><ul><li>Capture exploit attempts on a residential DSL network for 2 days (2005 9/6) </li></ul></ul><ul><ul><li>Fully patched Windows XP SP2 </li></ul></ul><ul><ul><li>29 IP addresses </li></ul></ul><ul><ul><li>Respond to incoming requests </li></ul></ul><ul><li>The Trace 2 </li></ul><ul><ul><li>From a honeyfarm at the Lawrence Berkeley National Laboratory </li></ul></ul>
  19. 19. SQL Name Resolution <ul><li>The Slammer worm (Jan. 2003) </li></ul><ul><li>The outlier </li></ul><ul><ul><li>Its payload was likely corrupted one the network before being captured. </li></ul></ul><ul><ul><li>The last 91 bytes: </li></ul></ul><ul><ul><ul><li>Un identified 22 bytes </li></ul></ul></ul><ul><ul><ul><li>20-byte IP header </li></ul></ul></ul><ul><ul><ul><li>a UDP header </li></ul></ul></ul><ul><ul><ul><li>The first 41 bytes of the Slammer exploit </li></ul></ul></ul><ul><li>No exploit diversity </li></ul>
  20. 20. LSASS (Local Security Authority Subsystem Service) <ul><li>The original Sasser worm: Apr. 2004 </li></ul><ul><li>A handful of variants were responsible for a large number of occurrences </li></ul>
  21. 21. LSASS (cont’d) Exedit Edit structural Not fundamental to the code Ignores subtle differences between shellcodes
  22. 22. LSASS (cont’d) <ul><li>Inter-family analysis (Manual analysis) </li></ul><ul><ul><li>The differences in variations are 2–20 bytes, and correspond to phone-home/connect-back IP addresses, hostnames, and ports encoded in the payload. </li></ul></ul><ul><ul><li>LSASS-1 </li></ul></ul><ul><ul><ul><li>The main body of the exploit followed immediately after the decoding loop </li></ul></ul></ul><ul><ul><ul><li>The main body and data session were XOR-ed one byte at a time with key 0x99 </li></ul></ul></ul><ul><ul><li>LSASS-0 </li></ul></ul><ul><ul><ul><li>An unencoded main body followed by an encoded data session (byte-wise XOR with key 0xff) </li></ul></ul></ul><ul><ul><ul><li>There are embedded URL strings </li></ul></ul></ul><ul><ul><ul><ul><li>belonged to previously classified malware </li></ul></ul></ul></ul><ul><ul><li>LSASS-2,3,4 </li></ul></ul><ul><ul><ul><li>Share the same encoding scheme and roughly the same flow of execution </li></ul></ul></ul>
  23. 23. LSASS (cont’d) - Prevalence
  24. 24. ISystemActivator <ul><li>The Blaster worm (Aug 2003) </li></ul><ul><ul><li>Originally exploited the RemoteActivation </li></ul></ul>The result of polymorphism? indicate that exploits within a family are similar, but that ISys families differ more substantially from each other than the LSASS exploit families
  25. 25. ISystemActivator <ul><li>We confirmed that there were six different code bases </li></ul><ul><li>There was no code polymorphism </li></ul><ul><ul><li>The differences were due to variations in data constants, such as encodings of phone-home addresses and hostnames, as well as names of executables </li></ul></ul><ul><li>ISys-0 used a 4-byte, non-overlapping XOR to encode its payload, whereas all other exploits used a byte-by-byte XOR </li></ul><ul><li>ISys-4 had the largest payload length and its flow of execution was the most complicated. </li></ul><ul><li>The moderate exedit distance within the ISys-1 family (9%) </li></ul><ul><ul><li>some different instructions, otherwise very similar. </li></ul></ul><ul><li>ISys-5 exploits had a characteristic execution flow </li></ul><ul><ul><li>performed consecutive jumps over two text sections </li></ul></ul><ul><ul><ul><li>“ tftp.exe -i <address> get <executable name>” </li></ul></ul></ul><ul><ul><ul><li><address> <executable name> accounted for 6.5% distance </li></ul></ul></ul>
  26. 26. ISystemActivator 4-byte decoding key Kernel-address loading function Function-finding block 4-byte encoding key Kernel base loader Function finder
  27. 27. ISystemActivator largest payload length and its flow of execution was the most complicated
  28. 28. ISystemActivator <ul><ul><li>performed consecutive jumps over two text sections </li></ul></ul><ul><ul><ul><ul><li>“ tftp.exe -i <address> get <executable name>” </li></ul></ul></ul></ul><ul><ul><ul><ul><li><address> <executable name> accounted for 6.5% distance </li></ul></ul></ul></ul>
  29. 29. ISystemActivator Different instructions in parts, otherwise very similar
  30. 30. ISystemActivator “ Bind” version required the newly-infected host to bind on a socket and wait for a connection attempt from the infecting host “ Connect-back” version required the newly-infected host to connect back to the infecting host Interestingly, the number of iterations in ISys-3’s loop overshoots the exploit payload. Thus, it seems that either ISys-2 was a refinement of ISys-3, or that ISys-3 was a poor imitation of ISys-2.
  31. 31. ISystemActivator
  32. 32. RemoteActivation <ul><li>Unlike the other exploits, RemoteActivation exploits exhibited a high amount of exploit diversity per host </li></ul>
  33. 33. RemoteActivation (cont’d) Exedit distance is very small <ul><li>The byte-wise encoding scheme only covered the main bodies of the exploits, but different exploits used different keys. </li></ul><ul><li>And with manual inspection we confirmed that variable encoding of the exploit’s main body contributed to the jump in average intra-family distance. </li></ul><ul><li>Changing keys along with random filler characters are commonly described techniques for polymorphism, and the RemoteActivation exploits had both of these features. </li></ul>0 : “Bind” version 1 : “Connect-back” version Manual inspection : the last third (roughly 300 bytes) of the payload contained randomly generated characters
  34. 34. Diversity Across Vulnerabilities <ul><li>The trace is a full-payload 4.5-day trace from a Windows honeyfarm running at the Lawrence Berkeley National Laboratory starting on April 19, 2006. </li></ul><ul><li>Hosts in this honeyfarm served as active responders to incoming requests </li></ul>
  35. 35. Diversity Across Vulnerabilities (cont’d) Dendrogram for the LBL trace exploits using exedit distance. The 1st set of hash marks just below 0% represent ISystemActivator, the 2nd represent LSASS, the 3rd represent PNP, and the 4th represent RemoteActivation.
  36. 36. Diversity Across Vulnerabilities (cont’d) Multi-vector family
  37. 37. Discussion - Polymorphism <ul><li>We generated a small set of signatures that exhaustively covered all exploits we observed for each vulnerability in the DSL residential trace. </li></ul><ul><li>Each signature was a contiguous sequence of 100 bytes. </li></ul><ul><ul><li>For each individual vulnerability except LSASS, one signature sufficed to cover the set of exploits. LSASS required two: one covered 1645/1769 exploits, and the other covered the rest. </li></ul></ul><ul><ul><li>Manual investigation of these signatures showed that they primarily focused on the portions of the shellcode that were mostly (but not entirely) NOPs. </li></ul></ul><ul><li>We then tested the signatures against a 5-GB trace on our internal network for false positives. None of the signatures yielded false positives in the internal trace </li></ul><ul><li>The polymorphism was not effective for evasion ? </li></ul><ul><ul><li>Functional variation </li></ul></ul><ul><ul><li>Increase the difficulty of reverse engineering </li></ul></ul>
  38. 38. Conclusion <ul><li>This paper presents a methodology for constructing the phylogeny of remote code injection exploits. </li></ul><ul><li>And evaluates this methodology on network traces taken from several vantage points. </li></ul><ul><ul><li>The methodology is robust to the observed polymorphism </li></ul></ul><ul><ul><li>The techniques reveal non-trivial code sharing among different exploit families, and the resulting phylogenies accurately capture the subtle variations among exploits within each family. </li></ul></ul><ul><li>Analyzing both the emergence of polymorphism and the phylogeny of remote code injection exploits is important </li></ul>