Reverse Engineering automation

815 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
815
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
30
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Reverse Engineering automation

  1. 1. by Anton Dorfman PHDAYS 2014, Moscow
  2. 2.  Fan of & Fun with Assembly language  Researcher  Scientist  Teach Reverse Engineering since 2001  Candidate of technical science  Lecturer at Samara State Technical University and Samara State Aerospace University
  3. 3.  Intro  Simple  Trace & Coverage  Graph  Program Slicing  All Together
  4. 4.  Iterative process  Understand small piece of code – make abstraction in mind  Understand all pieces of code in procedure – unite all abstractions – make abstraction about function  And etc  Good visualization important  Many routine tasks
  5. 5.  Code localization  Data flow dependencies  Code flow dependencies  Local variables checking  Input output procedures parameters checking  Variables range checking  Labels naming  Function naming  Function prototyping
  6. 6.  Biggest science school - Professor Thomas W. Reps - University of Wisconsin-Madison - http://pages.cs.wisc.edu/~reps/  In Russia – Institute for System Programming Russian Academy of Science - http://www.ispras.ru
  7. 7.  Dynamic Binary Instrumentation (DBI)  Intermediate representation (IR)  System emulators
  8. 8.  Function  Variable  Label
  9. 9.  Also called Execution Trace  Trace of program execution  Simpe case - just a list of addresses that instruction pointer takes on single run
  10. 10.  Firstly used as a measure to describe the degree to which the source code of a program is tested by a particular test suite.  List of instructions that executed during single run  List of unique addresses from program trace
  11. 11.  Difference between code coverage can help to locate code that do some functionality  Common code coverage – common functionality  More runs – more diff between code coverage – precise code localization
  12. 12.  The collection of all memory accesses performed by an application in single run  Include both writes and reads
  13. 13.  Include Code Trace  Include all registers values and memory values at every execution point  May be absolute – save all values  Relative – just save values that changed at this execution point
  14. 14.  Directed graph that shows control dependencies between blocks of commands  Each node represents basic block  Basic block – piece of code ends with jump, starts with jump target without any jump or jump target inside block  Two special blocks – entry block and exit block
  15. 15.  Directed graph that represents calling relationships between subroutines in a computer program  Each node represents procedure  Each edge (a, b) indicates that procedure a calls procedure b  Cycle in the graph indicates recursive procedure calls  Static call graph represents every possible run of the program  Dynamic call graph is a record of an execution of the program
  16. 16.  Directed graph that represents data dependencies between a number of operations  Each node represents operation  Each edge represents variable
  17. 17.  Ottenstein & Ottenstein – PDG, 1984  Actually – Procedure dependence graph because introduced for programs with one procedure  Each node represents a statement  Two types of edges  Control Dependence – between a predicate and the statements it controls  Data Dependence – between statements modifying a variable and those that may reference it  Special “Entry” node is connected to all nodes that are not control dependant
  18. 18.  Horowitz, Reps & Binkly – SDG, 1990  PDG included for procedures  New nodes: Call Site, Procedure Entry, Actual-in- argument, Actual-out-argument, Formal-in- parameter, Formal-out-parameter  3 new edge types  Call Edge – connect “call site” and “procedure entry”  Parameter-In Edge – connect “Actual-in” with “Formal-in”  Parameter-Out-Edge – connect “Actual-out” with “Formal-out”
  19. 19.  Large programs must be decomposed for understanding and manipulation.  However, it should be into procedures and abstract data types.  Program Slicing is decomposition based on data flow and control flow analysis.  A study showed, experienced programmers mentally slicing while debugging.  “The mental abstraction people make when they are debugging a program” [Weiser]
  20. 20.  All the statements of a program that may affect the values of some variables in a set V at some point of interest i.  A slicing criterion of a program P is a tuple (i, V), where i is a statement in P and V is a subset of variables in P.  Slicing Criterion: C = (i , V)
  21. 21.  Direction of slicing ◦ Backward ◦ Forward  Slicing techniques ◦ Static ◦ Dynamic ◦ Conditioned  Levels of slices ◦ Intraprocedural slicing ◦ Interprocedural slicing
  22. 22.  Original Slicing Method  Backward slice of a program with respect to a program point i and set of program variables V consists of all statements and predicates in the program that may affect the value of variables in V at I  Answer the question “what program components might effect a selected computation?”  Preserve the meaning of the variable (s) in the slicing criterion for all possible inputs to the program
  23. 23.  Slice criterion <12,i> ◦ 1 main( ) ◦ 2 { ◦ 3 int i, sum; ◦ 4 sum = 0; ◦ 5 i = 1; ◦ 6 while(i <= 10) ◦ 7 { ◦ 8 Sum = sum + 1; ◦ 9 ++ i; ◦ 10 } ◦ 11 Cout<< sum; ◦ 12 Cout<< i; ◦ 13 }
  24. 24. • Forward slice of a program with respect to a program point i and set of program variables V consists of all statements and predicates in the program that may be affected by the value of variables in V at I • Answers the question “what program components might be effected by a selected computation?” • Can show the code affected by a modification to a single statement
  25. 25.  Slice criterion <3,sum> ◦ 1 main( ) ◦ 2 { ◦ 3 int i, sum; ◦ 4 sum = 0; ◦ 5 i = 1; ◦ 6 while(i <= 10) ◦ 7 { ◦ 8 sum = sum + 1; ◦ 9 ++ i; ◦ 10 } ◦ 11 Cout<< sum; ◦ 12 Cout<< i; ◦ 13}
  26. 26.  Static Slicing does not make any assumptions regarding the input.  Slices derived from the source code for all possible input values  May lead to relatively big slices  Contains all statements that may affect a variable for every possible execution  Current static methods can only compute approximations
  27. 27.  Slice criterion (12,i) ◦ 1 main( ) ◦ 2 { ◦ 3 int i, sum; ◦ 4 sum = 0; ◦ 5 i = 1; ◦ 6 while(i <= 10) ◦ 7 { ◦ 8 sum = sum + 1; ◦ 9 ++ i; ◦ 10 } ◦ 11 Cout<< sum; ◦ 12 Cout<< i; ◦ 13 }
  28. 28.  First introduced by Korel and Laski  Dynamic Slicing assumes a fixed input for a program.  Only the dependences that occur in a specific execution of the program are taken into account  Computed on a given input  Dynamic slicing criterion is a triple (input, occurrence of a statement, variable) – it specifies the input, and distinguishes between different occurrences of a statement in the execution history
  29. 29. 1. read (n) 2. for I := 1 to n do 3. a := 2 4. if c1==1 then 5. if c2==1 then 6. a := 4 7. else 8. a := 6 9. z := a 10. write (z) • Assumptions – Input n is 1 – C1, c2 both true – Execution history is 11, 21, 31, 41, 51, 61, 91, 22, 101 – Slice criterion<1, 101, z>
  30. 30.  Assumptions - Input ‘a’ is positive number 1. read(a) 2. if (a < 0) 3. a = -a 4. x = 1/a
  31. 31.  Computes slice within one procedure  Consists basically of two steps:  A single slice of the procedure containing the slicing criterion is made.  Procedure calls from within this procedure are sliced using new criteria.
  32. 32.  Compute slice over an entire program  Two ways for crossing procedure boundary  Up – going from sliced procedure into calling procedure  Down – going from sliced procedure into called procedure  Must Be Context Sensitive
  33. 33.  Chopping  Value Set Analysis
  34. 34.  CodeSurfer ◦ Commercial product by GammaTech Inc. ◦ GUI Based ◦ Scripting language-Tk  Unravel ◦ Static program slicer developed at NIST ◦ Slices ANSI C programs ◦ Limitations are in the treatment of Unions, Forks and pointers to functions
  35. 35.  Slicing of Register on Code Coverage  Graph based view of file reading and moves between memory blocks
  36. 36.  dorfmananton@gmail.com

×