Binary Analysis for Botnet Reverse Engineering & Defense<br />Dawn Song<br />UC Berkeley<br />
Binary Analysis Is Important for Botnet Defense<br />Botnet programs: no source code, only binary<br />Botnet defense need...
BitBlaze Binary Analysis Infrastructure: Architecture<br />The first infrastructure:<br />Novel fusion of static, dynamic,...
BitBlaze: Security Solutions via Program Binary Analysis<br /><ul><li>Unified platform to accurately analyze security prop...
 Security evaluation & audit of third-party code
 Defense against morphing threats
 Faster & deeper analysis of malware</li></ul>Dissecting<br />Malware<br />Detecting<br />Vulnerabilities<br />Generating<...
The BitBlazeApproach & Research Foci<br /><ul><li>Semantics based, focus on root cause:Automatically extracting security-r...
Generating vulnerability signatures to filter out exploits
Dissecting malware for real-time diagnosis & offense: e.g., botnet infiltration
More than a dozen security applications & publications</li></li></ul><li>Plans<br />Building on BitBlaze to develop new te...
Preliminary Work<br />Dispatcher: Enabling Active Botnet Infiltration using Automatic Protocol Reverse-Engineering<br />Bi...
Dispatcher: Enabling Active Botnet Infiltration using Automatic Protocol Reverse-Engineering<br />Juan Caballero<br />Pong...
Automatic Protocol Reverse-Engineering<br />Process of extracting the application-level protocol used by a program, withou...
Challenges for Active Botnet Infiltration<br /><ul><li>Goal: Rewrite C&C messages on either dialog  side</li></ul>Understa...
Field semantics</li></ul>Access to one side of dialog only<br />Handle encryption/obfuscation<br />
Technical Contributions<br />Buffer deconstruction, a technique to extract the format of sent messages<br /><ul><li>Earlie...
Message Format Extraction<br />Extract format of a single message<br />Required by Grammar and State Machine extraction<br...
Message Field Tree<br />HTTP/1.1 200 OK

<br />Field Range: [3:3]<br />Field Boundary: Fixed<br />Field Semantics: Delim...
Sent vs. Received<br />Both protocol directions from single binary<br />Different problems<br />Taint information harder t...
Outline<br />Introduction<br />Problem<br />Techniques<br />Buffer Deconstruction<br />Field Semantics Inference<br />Hand...
Buffer Deconstruction<br />Intuition<br />Programs keep fields in separate memory buffers<br />Combine those buffers to co...
Buffer Deconstruction<br />HTTP/1.1 200 OK

<br />MSG<br />[0:18]<br />G(2)<br />D(1)<br />E(3)<br />F(1)<br />C(8)<br /...
Field Attributes Inference<br />Attributes capture extra information <br />E.g., inter-field relationships<br /><ul><li>Te...
Keywords
Upcoming SlideShare
Loading in …5
×

Binary Analysis for Grammar and Model Extraction: Techniques and ...

607 views
500 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
607
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Dynamic binary analysis
  • This should probably be a reply
  • Field = contiguous seq. of bytes in msgBuffer = contiguous seq. of bytes in memory
  • stat is a well-known Unix functionPart of the IEEE Std 1003.1, 2004 Edition standard
  • MegaD is a prevalent spam botnet that accounted for 35.4% of all spam in the Internet in a December 2008 study, and still accounts for 4-5 [Marshal86e].Grammar available in Appendix AField semantics: IP addresses, ports, hostnames, length, sleep timers, error codes, keywords, cookies, stored data, padding, and host information
  • Binary Analysis for Grammar and Model Extraction: Techniques and ...

    1. 1. Binary Analysis for Botnet Reverse Engineering & Defense<br />Dawn Song<br />UC Berkeley<br />
    2. 2. Binary Analysis Is Important for Botnet Defense<br />Botnet programs: no source code, only binary<br />Botnet defense needs internal understanding of botnet programs<br />C&C reverse engineering<br />Different possible commands, encryption/decryption<br />Botnet traffic rewriting<br />Botnet infiltration<br />Botnet vulnerability discovery<br />
    3. 3. BitBlaze Binary Analysis Infrastructure: Architecture<br />The first infrastructure:<br />Novel fusion of static, dynamic, formal analysis methods<br />Loop extended symbolic execution<br />Grammar-aware symbolic execution<br />Whole system analysis (including OS kernel) <br />Analyzing packed/encrypted/obfuscated code<br />Vine:<br />Static Analysis<br />Component<br />TEMU:<br />Dynamic Analysis<br />Component<br />Rudder:<br />Mixed Execution<br />Component<br />BitBlazeBinary Analysis Infrastructure<br />
    4. 4. BitBlaze: Security Solutions via Program Binary Analysis<br /><ul><li>Unified platform to accurately analyze security properties of binaries
    5. 5. Security evaluation & audit of third-party code
    6. 6. Defense against morphing threats
    7. 7. Faster & deeper analysis of malware</li></ul>Dissecting<br />Malware<br />Detecting<br />Vulnerabilities<br />Generating<br />Filters<br />BitBlazeBinary Analysis Infrastructure<br />
    8. 8. The BitBlazeApproach & Research Foci<br /><ul><li>Semantics based, focus on root cause:Automatically extracting security-related properties from binary code for effective vulnerability detection & defense</li></ul>Build a unified binary analysis platform for security<br />Identify & cater common needs of different security applications<br />Leverage recent advances in program analysis, formal methods, binary instrumentation/analysis techniques for new capabilities<br />Solve real-world security problems via binary analysis<br /><ul><li>Extracting security related models for vulnerability detection
    9. 9. Generating vulnerability signatures to filter out exploits
    10. 10. Dissecting malware for real-time diagnosis & offense: e.g., botnet infiltration
    11. 11. More than a dozen security applications & publications</li></li></ul><li>Plans<br />Building on BitBlaze to develop new techniques<br />Automatic Reverse Engineering of C&C protocols of botnets<br />Automatic rewriting of botnet traffic to facilitate botnet infiltration<br />Vulnerability discovery of botnet<br />
    12. 12. Preliminary Work<br />Dispatcher: Enabling Active Botnet Infiltration using Automatic Protocol Reverse-Engineering<br />Binary code extraction and interface identification for botnet traffic rewriting<br />Botnet analysis for vulnerability discovery<br />
    13. 13. Dispatcher: Enabling Active Botnet Infiltration using Automatic Protocol Reverse-Engineering<br />Juan Caballero<br />Pongsin Poosankam<br />Christian Kreibich<br />Dawn Song<br />
    14. 14. Automatic Protocol Reverse-Engineering<br />Process of extracting the application-level protocol used by a program, without the specification<br />Automatic process<br />Many undocumented protocols (C&C, Skype, Yahoo)<br />Encompasses extracting: <br />the Protocol Grammar<br />the Protocol State Machine<br />Message format extraction is prerequisite<br />
    15. 15. Challenges for Active Botnet Infiltration<br /><ul><li>Goal: Rewrite C&C messages on either dialog side</li></ul>Understand both sides of C&C protocol<br /><ul><li>Message structure
    16. 16. Field semantics</li></ul>Access to one side of dialog only<br />Handle encryption/obfuscation<br />
    17. 17. Technical Contributions<br />Buffer deconstruction, a technique to extract the format of sent messages<br /><ul><li>Earlier work only handles received messages</li></ul>Field semantics inference techniques, for messages sent and received<br />Designing and developing Dispatcher<br />Extending a technique to handle encryption<br />Rewriting a botnetdialog using information extracted by Dispatcher<br />
    18. 18. Message Format Extraction<br />Extract format of a single message<br />Required by Grammar and State Machine extraction<br />GET / HTTP/1.1<br />HTTP/1.1 200 OK<br />[Polyglot]<br />[Dispatcher]<br />
    19. 19. Message Field Tree<br />HTTP/1.1 200 OK <br />Field Range: [3:3]<br />Field Boundary: Fixed<br />Field Semantics: Delimiter<br />Field Keywords: <none><br />Target: Version<br />MSG<br />[0:18]<br />Delimiter<br />StatusLine<br />[17:18]<br />[0:16]<br />Reason<br />Version<br />Delimiter<br />Status-Code<br />Delimiter<br />Delimiter<br />[0:7]<br />[8:8]<br />[9:11]<br />[13:14]<br />[15:16]<br />[12:12]<br />Message format extraction has 2 steps: <br />Extract tree structure<br />Extract field attributes <br />
    20. 20. Sent vs. Received<br />Both protocol directions from single binary<br />Different problems<br />Taint information harder to leverage<br />Focus on how message is constructed, not processed<br />Different techniques needed: <br />Tree structure  Buffer Deconstruction<br />Field attributes  New heuristics<br />
    21. 21. Outline<br />Introduction<br />Problem<br />Techniques<br />Buffer Deconstruction<br />Field Semantics Inference<br />Handling encryption<br />Evaluation<br />
    22. 22. Buffer Deconstruction<br />Intuition<br />Programs keep fields in separate memory buffers<br />Combine those buffers to construct sent message<br />Output buffer<br />Holds message when “send” function invoked <br />Or holds unencrypted message before encryption<br />Recursive process<br />Decompose a buffer into buffers used to fill it<br />Starts with output buffer<br />Stops when there’s nothing to recurse<br />
    23. 23. Buffer Deconstruction<br />HTTP/1.1 200 OK <br />MSG<br />[0:18]<br />G(2)<br />D(1)<br />E(3)<br />F(1)<br />C(8)<br />H(2)<br />Status Line<br />Delimiter<br />[0:16]<br />[17:18]<br />A(17)<br />B(2)<br />[0:7]<br />[8:8]<br />[9:11]<br />[12:12]<br />[13:14]<br />[15:16]<br />Output Buffer (19)<br />Reason<br />StatusCode<br />Delimiter<br />Version<br />Delimiter<br />Delimiter<br />Message field tree = inverse of output buffer structure<br />Output is structure of message field tree<br />No field attributes, except range<br />
    24. 24. Field Attributes Inference<br />Attributes capture extra information <br />E.g., inter-field relationships<br /><ul><li>Techniques identify
    25. 25. Keywords
    26. 26. Length fields
    27. 27. Delimiters
    28. 28. Variable-length field
    29. 29. Arrays</li></li></ul><li>Field Semantics<br />A field attribute in the message field tree<br />Captures the type of data in the field<br /><ul><li>Programs contain much semantic info  leverage it!
    30. 30. Semantics in well-defined functions and instructions
    31. 31. Prototype
    32. 32. Similar to type inference
    33. 33. Differs for received and sent messages</li></li></ul><li>Field Semantic Inference<br />File path<br />GET /index.html HTTP/1.1<br />HTTP/1.1 200 OK<br />Content-Length: 25<br /><html>Hello world!</html><br />File length<br />stat(“index.html”, &file_info);<br />OUT<br />OUT<br />IN<br />int stat(const char*path, struct stat *buf);<br />struct stat {<br /> …<br />off_tst_size; /* total size in bytes */<br /> …<br />}<br />
    34. 34. Detecting Encoding Functions<br />Encoding functions = (de)compression, (de)(en)cryption, (de)obfuscation…<br />High ratio of arithmetic & bitwise instructions<br />Use read/write set to identify buffers<br />Work-in-progress on extracting and reusing encoding functions<br />
    35. 35. MegaD C&C protocol<br />type MegaD_Message = record {<br />msg_len : uint16;<br />encrypted_payload: <br />bytestring &length = 8*msg_len;<br />} &byteorder = bigendian;<br />type encrypted_payload= record {<br /> version : uint16; <br />mtype : uint16;<br /> data : MegaD_data (mtype);<br />};<br />type MegaD_data (msg_type: uint16) = <br /> case msg_type of {<br /> 0x00 -> m00 : msg_0;<br /> […]<br /> default -> unknown : bytestring &restofdata;<br />};<br /><ul><li>C&C on tcp/443 using proprietary encryption
    36. 36. Use Dispatcher’s output to generate grammar
    37. 37. 15 different messages seen (7 recv, 8 sent)
    38. 38. 11 field semantics</li></li></ul><li>C&C Server<br />SMTP Test Server<br />TestSMTP<br />Cmd?<br />EHLO<br />MegaD Dialog<br />Failed<br />
    39. 39. C&C Server<br />Template Server<br />MegaD Rewriting<br />SMTP Test Server<br />TestSMTP<br />EHLO<br />Cmd?<br />Failed<br />Get Template<br />Grammar<br />Success<br />Template?<br />
    40. 40. Summary<br />Buffer deconstruction, a technique to extract the format of sent messages<br />Field semantics inference techniques, for messages sent and received<br />Designed and developed Dispatcher<br />Extended technique to handle encryption<br />Rewrote MegaD dialog using information extracted by Dispatcher<br />

    ×