random test

564 views

Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

random test

  1. 1. Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005
  2. 2. Program Analysis <ul><li>Applications in all aspects of software development, e.g. </li></ul><ul><li>Program correctness </li></ul><ul><ul><li>Software bugs are expensive! </li></ul></ul><ul><li>Compiler optimizations </li></ul><ul><ul><li>Provide people freedom to write code the way they want (leaving performance issues to compilers). </li></ul></ul><ul><li>Translation validation </li></ul><ul><ul><li>Semantic equivalence of programs before and after compilation </li></ul></ul><ul><ul><li>(difficult to trust o/p of compiler for safety-critical systems). </li></ul></ul>
  3. 3. Design choices in Program Analysis <ul><li>Completeness (precision, # of false positives) </li></ul><ul><li>Computational complexity </li></ul><ul><li>Ease of implementation </li></ul><ul><li>Soundness = If analysis says “no bugs”, it means “no bugs”. </li></ul><ul><li>What if we allow “probabilistic soundness” ? </li></ul><ul><ul><li>We get more precise, efficient and even simpler algorithms. </li></ul></ul><ul><ul><li>Earlier probabilistic algorithms were used in other areas like networks, but not in program analysis. </li></ul></ul><ul><ul><li>We obtain a new class of analyses: random interpretation. </li></ul></ul>
  4. 4. Random Interpretation <ul><li>= Random Testing + Abstract Interpretation </li></ul><ul><li>Random Testing: </li></ul><ul><li>Test program on random inputs </li></ul><ul><li>Simple, efficient but unsound (can’t prove absence of bugs) </li></ul><ul><li>Abstract Interpretation: </li></ul><ul><li>Class of deterministic program analyses </li></ul><ul><li>Interpret (analyze) an abstraction (approximation) of program </li></ul><ul><li>Sound but usually complicated, expensive </li></ul><ul><li>Random Interpretation: </li></ul><ul><li>Class of randomized program analyses </li></ul><ul><li>Almost as simple, efficient as random testing </li></ul><ul><li>Almost as sound as abstract interpretation </li></ul>
  5. 5. Example 1 a := 0; b := i; a := i-2; b := 2; c := b – a; d := i – 2b; assert(c+d = 0); assert(c = a+i) c := 2a + b; d := b – 2i; True False False True * *
  6. 6. Example 1: Random Testing a := 0; b := i; a := i-2; b := 2; c := b – a; d := i – 2b; assert(c+d = 0); assert(c = a+i) c := 2a + b; d := b – 2i; True False False True * * <ul><li>Need to test blue path to falsify second assertion. </li></ul><ul><li>Chances of choosing blue path from set of all 4 paths are small. </li></ul><ul><li>Hence, random testing is unsound. </li></ul>
  7. 7. Example 1: Abstract Interpretation a+b=i a+b=i, c=-d a=i-2, b=2 a+b=i c=2a+b, d=b-2i a+b=i c=b-a, d=i-2b a=0, b=i a := 0; b := i; a := i-2; b := 2; c := b – a; d := i – 2b; assert(c+d = 0); assert(c = a+i) c := 2a + b; d := b – 2i; True False False True * * <ul><li>Computes invariant at each program point. </li></ul><ul><li>Operations are usually complicated and expensive. </li></ul>
  8. 8. Example 1: Random Interpretation a := 0; b := i; a := i-2; b := 2; c := b – a; d := i – 2b; assert(c+d = 0); assert(c = a+i) c := 2a + b; d := b – 2i; True False False True * * <ul><li>Choose random values for input variables. </li></ul><ul><li>Execute both branches of a conditional. </li></ul><ul><li>Combine values of variables at join points. </li></ul><ul><li>Test the assertion. </li></ul>
  9. 9. Outline <ul><li>Random Interpretation </li></ul><ul><ul><li>Linear arithmetic (POPL 2003) </li></ul></ul><ul><ul><li>Uninterpreted functions (POPL 2004) </li></ul></ul><ul><ul><li>Inter-procedural analysis (POPL 2005) </li></ul></ul><ul><ul><li>Other applications </li></ul></ul>
  10. 10. Linear relationships in programs with linear assignments <ul><li>Linear relationships (e.g., x=2y+5) are useful for </li></ul><ul><ul><li>Program correctness (e.g. buffer overflows) </li></ul></ul><ul><ul><li>Compiler optimizations (e.g., constant and copy propagation, CSE, Induction variable elimination etc.) </li></ul></ul><ul><li>“ programs with linear assignments” does not mean inapplicability to “real” programs </li></ul><ul><ul><li>“ abstract” other program stmts as non-deterministic assignments (standard practice in program analysis) </li></ul></ul>
  11. 11. Basic idea in random interpretation <ul><li>Generic algorithm: </li></ul><ul><li>Choose random values for input variables. </li></ul><ul><li>Execute both branches of a conditional. </li></ul><ul><li>Combine the values of variables at join points. </li></ul><ul><li>Test the assertion. </li></ul>
  12. 12. Idea #1: The Affine Join operation <ul><li>Affine join of v 1 and v 2 w.r.t. weight w </li></ul><ul><ul><li> w (v 1 ,v 2 ) ´ w v 1 + (1- w ) v 2 </li></ul></ul><ul><li>Affine join preserves common linear relationships (a+b=5) </li></ul><ul><li>It does not introduce false relationships w.h.p. </li></ul>w = 7 a = 2 b = 3 a = 4 b = 1 a =  7 (2,4) = -10 b =  7 (3,1) = 15
  13. 13. Idea #1: The Affine Join operation <ul><li>Affine join of v 1 and v 2 w.r.t. weight w </li></ul><ul><ul><li> w (v 1 ,v 2 ) ´ w v 1 + (1- w ) v 2 </li></ul></ul><ul><li>Affine join preserves common linear relationships (a+b=5) </li></ul><ul><li>It does not introduce false relationships w.h.p. </li></ul><ul><li>Unfortunately, non-linear relationships are not preserved (e.g. a £ (1+b) = 8) </li></ul>w = 5 w = 7 a =  5 (2,4) = -6 b =  5 (3,1) = 11 a = 2 b = 3 a = 4 b = 1 a =  7 (2,4) = -10 b =  7 (3,1) = 15
  14. 14. Geometric Interpretation of Affine Join a b a + b = 5 b = 2 (a = 2, b = 3) (a = 4, b = 1) : State before the join : State after the join satisfies all the affine relationships that are satisfied by both (e.g. a + b = 5) Given any relationship that is not satisfied by any of (e.g. b=2), also does not satisfy it with high probability
  15. 15. i=3, a=0, b=3 i=3 a := 0; b := i; a := i-2; b := 2; c := b – a; d := i – 2b; assert (c+d = 0); assert (c = a+i) i=3, a=-4, b=7 i=3, a=-4, b=7 c=23, d=-23 c := 2a + b; d := b – 2i; i=3, a=1, b=2 i=3, a=-4, b=7 c=-1, d=1 i=3, a=-4, b=7 c=11, d=-11 False False w 1 = 5 w 2 = 2 True True * * Example 1 <ul><li>Choose a random weight for each join independently. </li></ul><ul><li>All choices of random weights verify first assertion </li></ul><ul><li>Almost all choices contradict second assertion </li></ul>
  16. 16. Example 2 <ul><li>We need to make use of the conditional x=y on the true branch to prove the assertion. </li></ul>a := x + y b := a b := 2x assert (b = 2x) True False x = y ?
  17. 17. Idea #2: The Adjust Operation <ul><li>Execute multiple runs of the program in parallel. </li></ul><ul><li>Sample S = Collection of states at a program point </li></ul><ul><li>Adjust(S, e=0) is the sample obtained by linear combination of states in S such that </li></ul><ul><ul><li>The equality conditional is satisfied. </li></ul></ul><ul><ul><li>Note that original relationships are preserved. </li></ul></ul><ul><li>Use Adjust(S, e=0) on true branch of the conditional e=0 </li></ul>
  18. 18. Geometric Interpretation of Adjust <ul><li>Program states = points </li></ul><ul><li>Adjust = projection onto the hyperplane </li></ul><ul><li>Adjust operation loses one point. </li></ul>Algorithm to obtain S’ = Adjust(S, e=0) S 4 S 2 S 3 S 1 S’ 3 S’ 1 S’ 2 Hyperplane e = 0
  19. 19. Correctness of Random Interpreter R <ul><li>Completeness: If e 1 =e 2 , then R ) e 1 =e 2 </li></ul><ul><ul><li>assuming non-det conditionals </li></ul></ul><ul><li>Soundness: If e 1  e 2 , then R e 1 = e 2 </li></ul><ul><ul><li>error prob. · </li></ul></ul><ul><ul><ul><li>b, j : number of branches and joins </li></ul></ul></ul><ul><ul><ul><li>d: size of set from which random values are chosen </li></ul></ul></ul><ul><ul><ul><li>k: number of points in the sample </li></ul></ul></ul><ul><ul><li>If j = b = 10, k = 15, d ¼ 2 32 , then error · </li></ul></ul>
  20. 20. Proof Methodology <ul><li>Proving correctness was the most complicated part in this work. We used the following methodology. </li></ul><ul><li>Design an appropriate deterministic algorithm (need not be efficient) </li></ul><ul><li>Prove (by induction) that the randomized algorithm simulates each step of the deterministic algorithm with high probability. </li></ul>
  21. 21. Outline <ul><li>Random Interpretation </li></ul><ul><ul><li>Linear arithmetic (POPL 2003) </li></ul></ul><ul><ul><li>Uninterpreted functions (POPL 2004) </li></ul></ul><ul><ul><li>Inter-procedural analysis (POPL 2005) </li></ul></ul><ul><ul><li>Other applications </li></ul></ul>
  22. 22. Problem: Global value numbering a := 5; x := a*b; y := 5*b; z := b*a; a := 5; x := F(a,b); y := F(5,b); z := F(b,a); <ul><li>x=y and x=z </li></ul><ul><li>Reasoning about multiplication is undecidable </li></ul><ul><li>only x=y </li></ul><ul><li>Reasoning is decidable but tricky in presence of joins </li></ul><ul><li>Axiom: If x 1 =y 1 and x 2 =y 2 , then F(x 1 ,x 2 )=F(y 1 ,y 2 ) </li></ul><ul><li>Goal: Detect expression equivalence when program operators are abstracted using “uninterpreted functions” </li></ul><ul><li>Application: Compiler optimizations, Translation validation </li></ul>Abstraction
  23. 23. assert(x = y); assert(z = F(y)); * x =  (a,b) y =  (a,b) z =  (F(a),F(b)) F(y) = F(  (a,b)) <ul><li>Typical algorithms treat  as uninterpreted </li></ul><ul><ul><li>Hence cannot verify the second assertion </li></ul></ul><ul><li>The randomized algorithm interprets  </li></ul><ul><ul><li>as affine join operation  w </li></ul></ul>x := a; y := a; z := F(a); x := b; y := b; z := F(b); Example True False
  24. 24. How to “execute” uninterpreted functions ? <ul><li>Expression Language e := y | F(e 1 ,e 2 ) </li></ul><ul><li>Choose a random interpretation for F </li></ul><ul><li>Non-linear interpretation </li></ul><ul><ul><li>E.g. F(e 1 ,e 2 ) = r 1 e 1 2 + r 2 e 2 2 </li></ul></ul><ul><ul><li>Preserves all equivalences in straight-line code </li></ul></ul><ul><ul><li>But not across join points </li></ul></ul><ul><li>Let’s try linear interpretation </li></ul>
  25. 25. Random Linear Interpretation <ul><li>Encode F(e 1 ,e 2 ) = r 1 e 1 + r 2 e 2 </li></ul><ul><li>Preserves all equivalences across a join point </li></ul><ul><li>Introduces false equivalences in straight-line code. </li></ul><ul><li>E.g. e and e’ have same encodings even though e  e’ </li></ul><ul><li>Problem: Scalar multiplication is commutative. </li></ul><ul><li>Solution: Choose r 1 and r 2 to be random matrices and evaluate expressions to vectors </li></ul>Encodings e = r 1 ( r 1 a+ r 2 b) + r 2 ( r 1 c+ r 2 d) = r 1 2 (a)+ r 1 r 2 (b)+ r 2 r 1 (c)+ r 2 2 (d) e’ = r 1 2 (a)+ r 1 r 2 (c)+ r 2 r 1 (b)+ r 2 2 (d) F F F a b c d e = F F F a c b d e’ =
  26. 26. Outline <ul><li>Random Interpretation </li></ul><ul><ul><li>Linear arithmetic (POPL 2003) </li></ul></ul><ul><ul><li>Uninterpreted functions (POPL 2004) </li></ul></ul><ul><ul><li>Inter-procedural analysis (POPL 2005) </li></ul></ul><ul><ul><li>Other applications </li></ul></ul>
  27. 27. Example a := 0; b := i; a := i-2; b := 2; c := b – a; d := i – 2b; assert (c + d = 0); assert (c = a + i) c := 2a + b; d := b – 2i; True False False <ul><li>The second assertion is true in the context i=2. </li></ul><ul><li>Interprocedural Analysis requires computing procedure summaries. </li></ul>True * *
  28. 28. i=2 a=0, b=i a := 0; b := i; a := i-2; b := 2; c := b – a; d := i – 2b; assert (c+d = 0); assert (c = a+i) a=8-4i, b=5i-8 a=8-4i, b=5i-8 c=21i-40, d=40-21i c := 2a + b; d := b – 2i; a=i-2, b=2 a=8-4i, b=5i-8 c=8-3i, d=3i-8 a=8-4i, b=5i-8 c=9i-16, d=16-9i False False w 1 = 5 w 2 = 2 Idea #1: Keep input variables symbolic <ul><li>Do not choose random values for input variables (to later instantiate by any context). </li></ul><ul><li>Resulting program state at the end is a random procedure summary. </li></ul>a=0, b=2 c=2, d=-2 True True * *
  29. 29. Experiments
  30. 30. Experiments <ul><li>Randomized algorithm discovers 10-70% more facts. </li></ul><ul><li>Randomized algorithm is slower by a factor of 2. </li></ul>Randomized Deterministic
  31. 31. Experimental measure of error <ul><li>The % of incorrect relationships decreases with increase in </li></ul><ul><li>S = size of set from which random values are chosen. </li></ul><ul><li>N = # of random summaries used. </li></ul>S N The experimental results are better than what is predicted by theory. 0 0 0 6 0 0 0 5 0 0 0.2 4 0 3.2 64.3 3 95.5 95.5 95.5 2 2 31 2 16 2 10
  32. 32. Outline <ul><li>Random Interpretation </li></ul><ul><ul><li>Linear arithmetic (POPL 2003) </li></ul></ul><ul><ul><li>Uninterpreted functions (POPL 2004) </li></ul></ul><ul><ul><li>Inter-procedural analysis (POPL 2005) </li></ul></ul><ul><ul><li>Other applications </li></ul></ul>
  33. 33. Other applications of random interpretation <ul><li>Model Checking </li></ul><ul><ul><li>Randomized equivalence testing algorithm for FCEDs, which represent conditional linear expressions and are generalization of BDDs. (SAS 04) </li></ul></ul><ul><li>Theorem Proving </li></ul><ul><ul><li>Randomized decision procedure for linear arithmetic and uninterpreted functions. This runs an order of magnitude faster than det. algo. (CADE 03) </li></ul></ul><ul><li>Ideas for deterministic algorithms </li></ul><ul><ul><li>PTIME algorithm for global value numbering, thereby solving a 30 year old open problem. (SAS 04) </li></ul></ul>
  34. 34. Summary <ul><li>Lessons Learned </li></ul><ul><li>Randomization buys efficiency, simplicity at cost of prob. soundness. </li></ul><ul><li>Randomization suggests ideas for deterministic algorithms. </li></ul><ul><li>Combining randomized and symbolic techniques is powerful. </li></ul>Affine Join, Adjust Linear Arithmetic Vectors Uninterpreted Fns. Symbolic i/p variables Interproc. Analysis Key Idea

×