Taint scope

546 views

Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Taint scope

  1. 1. A Checksum-Aware Directed fuzzing Tool for Automatic Software Vulnerability Detection Tielei Wang1, Tao Wei1, Guofei Gu2, Wei Zou1 1 Peking University, China 2 Texas A&M University, US
  2. 2. 2  Checksum – a way to check the integrity of data. Used in network protocols and files. data Checksum function data Checksum field Fuzzing – generating malformed inputs and feeding them to the application.  Dynamic Taint Analysis – runs a program and observes which computations are affected by predefined taint sources (e.g. input) 
  3. 3. 3  The input mutation space is enormous .  Most malformed inputs dropped at an early stage, if the program employs a checksum mechanism.
  4. 4. 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 void decode_image(FILE* fd){ ... int length = get_length(fd); int recomputed_chksum = checksum(fd, length); int chksum_in_file = get_checksum(fd); //line 6 is used to check the integrity of inputs if(chksum_in_file != recomputed_chksum) error(); int Width = get_width(input_file); int Height = get_height(input_file); int size = Width*Height*sizeof(int); int* p = malloc(size); ... for(i=0; i<Height; i++){// read ith row to p read_row(p+Width*i, i, fd);
  5. 5. 5  To infer whether/where a program checks the integrity of input.  Identify which input bytes can flow into sensitive points: Taint analysis at byte level – monitors how application uses the input data.  Create malformed input focusing the “hot bytes”.  Repair checksum fields in input, to expose vulnerability.  Fully automatic  Found 27 new vulnerability – acrobat reader, google picasa and more.
  6. 6. 6 1. 2. 3. 4. Dynamic taint tracing Detecting checksum Directed fuzzing Repairing crashed samples
  7. 7. 7 Modified Crashed Program Samples Checksum Locator Directed Fuzzer Instruction Profile Execution Monitor Checksum Repairer Hot Bytes Info Reports
  8. 8. 8  Runs the program with well-formed input.  Execution  Which input bytes related to arguments of API functions (e.g.  monitor records: malloc, strcpy) – “hot bytes” report. Which bytes each conditional jump instruction depends on (e.g. JZ, JE, JB) – checksum report.  Considering only data flow (no control flow).
  9. 9. 9  Instruments instructions – movement (e.g. MOV, PUSH), arithmetic (e.g. SUB, ADD), logic (e.g. AND, XOR)  Taints all values written by an instruction with union of all taint labels associated with values used by that instruction.  Considering also eflags register. eax {0x6, 0x7}, ebx {0x8, 0x9} add eax, ebx eax {0x6, 0x7, 0x8, 0x9}, eflags
  10. 10. 10 Input size is 1024 bytes “hot bytes” report: 8 9 10 11 int Width = get_width(input_file); int Height = get_height(input_file); int size = Width*Height*sizeof(int); int* p = malloc(size); … 0x8048d5b: invoking malloc: [0x8,0xf] …
  11. 11. 11 Input size is 1024 bytes checksum report: 6 7 if(chksum_in_file != recomputed_chksum) error(); … 0x8048d4f: JZ: 1024: [0x0,0x3ff] …
  12. 12. 12 Checksum detector:  identify    potential checksum check points the recomputed checksum value depends on many input bytes Instruments conditional jump. Before execution, checks whether the number of marks associated with eflags register exceeds a threshold. Problem with decompressed bytes.
  13. 13. 13 Refinement: Well-formed inputs can pass the checksum test, but most malformed inputs cannot
  14. 14. 14 Refinement: Well-formed inputs can pass the checksum test, but most malformed inputs cannot  Run well-formed inputs, identify the always-taken and always-not-taken instructions.
  15. 15. 15 Refinement: Well-formed inputs can pass the checksum test, but most malformed inputs cannot   Run well-formed inputs, identify the always-taken and always-not-taken instructions. Run malformed inputs, also identify the always-taken and always-not-taken instructions.
  16. 16. 16 Refinement: Well-formed inputs can pass the checksum test, but most malformed inputs cannot    Run well-formed inputs, identify the always-taken and always-not-taken instructions. Run malformed inputs, also identify the always-taken and always-not-taken instructions. Identify the conditional jump instructions that behaves completely different when processing well-formed and malformed inputs.
  17. 17. 17 Checksum detector:  Creates bypass rules – always-taken, always-not-taken 6 7 if(chksum_in_file != recomputed_chksum) error(); … 0x8048d4f: JZ: 1024: [0x0,0x3ff] … 0x8048d4f: JZ: always-taken
  18. 18. 18 Checksum detector:  Checksum 6 7 field identification if(chksum_in_file != recomputed_chksum) error(); Input bytes that affects chksum_in_file are the checksum field.
  19. 19. 19  Generates malformed test cases – feeds them to the original or instrumented program.  According to the bypass rules, alters the execution traces at check points – sets the eflags register.
  20. 20. 20  All malformed test cases are constructed based on the “hot bytes” information  Using attack heuristics: bytes that influence memory allocation are set to small, large or negative. bytes that flow into string functions are replaced by characters such as %n, %p.  Output – test cases that could cause to crash or consume 100% CPU.
  21. 21. 21 6 7 8 9 10 11 if(chksum_in_file != recomputed_chksum) error(); int Width = get_width(input_file); int Height = get_height(input_file); int size = Width*Height*sizeof(int); int* p = malloc(size); Checksum report … 0x8048d4f: JZ: 1024: [0x0,0x3ff] … Bypass info 0x8048d4f: JZ: always-taken “hot bytes” report … 0x8048d5b: invoking malloc: [0x8,0xf] …
  22. 22. 22 6 if(chksum_in_file != recomputed_chksum) 7 error(); 8 int Width = get_width(input_file); 9 Before executing 0x8048d4f, int Height = get_height(input_file); 10 int size = Width*Height*sizeof(int); 11 the fuzzer sets the flag int* p = malloc(size); in eflags Checksum report to an … 0x8048d4f: JZ: 1024: [0x0,0x3ff] … Bypass info 0x8048d4f: JZ: always-taken ZF opposite value … “hot bytes” report 0x8048d5b: invoking malloc: [0x8,0xf] …
  23. 23. 23  Fixing is expensive - fixes checksum fields only in test cases that caused crashing.  How? Cr – row data in the checksum field D – input data protected by checksum filed Checksum() – the complete checksum algorithm T – transformation We want to pass the constraint: Checksum(D) == T(Cr)
  24. 24. 24 Using symbolic execution to solve: Checksum(D) == T(Cr) Checksum(D) is a runtime determinable constant: c== T(Cr) Only Cr is a symbolic value.  Common transformations (e.g. converting from hex/oct to decimal), can be solved by existing solvers (STP).
  25. 25. 25 If the new test case cause the original program to crash, a potential vulnerability is detected!
  26. 26. 26 An incomplete list of applications:
  27. 27. 27 “hot bytes” identification results – memory allocation
  28. 28. 28 Checksum identification results: Threshold = 16
  29. 29. 29 Correct checksum fields:
  30. 30. 30 27 previous unknown Vulnerabilities: MS Paint Google Picasa irfanview gstreamer Amaya dillo Adobe Acrobat ImageMagick Winamp XEmacs wxWidgets PDFlib
  31. 31. 31 Vulnerabilities detected by TaintScope:
  32. 32. 32  TaintScope cannot deal with secure integrity check schemes (e.g. cryptographic hash algorithms, digital signature) – impossible to generate valid test cases.  Limited effectiveness when all input data are encrypted (tracking decrypted data).  Checksum check points identification can be affected by the quality of inputs.  Not tracks control flow propagation.  Not all instructions of x86 are instrumented by the execution monitor.
  33. 33. 33 TaintScope can perform:  Directed fuzzing   Identify which bytes flow into system/library calls. dramatically reduce the mutation space.  Checksum-aware   fuzzing Disable checksum checks by control flow alternation. Generate correct checksum fields in invalid inputs.
  34. 34. 34

×