Yoyak:
static analysis framework
Heejong Lee
ScalaDays 2015
Speaker Introduction
• Has been working in a static analysis industry since 2008
• Studied programming language theory at ...
Agenda
• Static analysis
• Theory of abstract interpretation
• Yoyak framework: implementation highlights
• Yoyak framewor...
Static Analysis
What is Static Analysis?
• Analyze source codes without actually running it
• Someone prefers to call it white box test
• ...
Examples of Static Analysis
• Finding bugs : symbolic execution
• Optimizing a compiled binary: data flow analysis
• Calcul...
Two important terms in Static Analysis
• Soundness
• The analysis result should contain all possibilities which can
happen...
Two important terms in Static Analysis
Over-approximation of Semantics
Program Semantics
Under-approximation of
Semantics
Abstract
Interpretation
The beauty of abstraction
http://cargocollective.com/carlyfox/Design
What is the result of this expression?
19224 ⇥ 7483919 ⇥ (11952 20392)
What is the result of this expression?
19224 ⇥ 7483919 ⇥ (11952 20392)
= 1214270048744640
How long does it take without a ...
What is the result of this expression?
19224 ⇥ 7483919 ⇥ (11952 20392)
= 1214270048744640
What if we do not have an intere...
What is the result of this expression?
19224 ⇥ 7483919 ⇥ (11952 20392)
ˆ+ ⇥ ˆ+ ⇥ ˆ
= ˆ
↵
= n (n 2 Z ^ n < 0)
What is the result of this expression?
19224 ⇥ 7483919 ⇥ (11952 20392)
= 1214270048744640
= n (n 2 Z ^ n < 0)
takes 30 sec...
Is this program safe from buffer overruns?
void foo(int x) {
String[] strs = new String[10];
int index = 0;
while(x > 0) {...
No, ArrayIndexOutOfBoundsException may occur at the last line
void foo(int x) {
String[] strs = new String[10];
int index ...
• Roughly but soundly execute the program
Abstract interpretation for dummies
?
Abstract interpretation for brains
First, we need to precisely define what “domain” and
“semantics” means in a mathematical way
Let me introduce you Javar language
1
1
What this program means?
Javar-1
C ! n (n 2 Z)
Javar-1 semantic domain
n 2 V alue = Z
JCK 2 V alue
Javar-1 semantics
JnK = n
1+1
Javar-2
C ! n op n (n 2 Z, op 2 {+, , ⇤, /})
Javar-{1,2} semantic domain
n 2 V alue = Z
JCK 2 V alue
Javar-2 semantics
JnK = n
Jn1 + n2K = Jn1K + Jn2K
Jn1 n2K = Jn1K Jn2K
Jn1 ⇤ n2K = Jn1K ⇥ Jn2K
Jn1 / n2K = Jn1K ÷ Jn2K
x := x + 1
Javar-3
C ! x := E
E ! n (n 2 Z)
| x
| E op E (op 2 {+, , ⇤, /})
Javar-3 semantic domain
M 2 Memory = V ar ! V alue
n 2 V alue = Z
x 2 V ar = V ariables
JCK 2 Memory ! Memory
JEK 2 Memory...
Javar-3 semantics
Jx := EKM = M{x ! JEKM}
JnKM = n
JxKM = M(x)
JE1{+, , ⇤, /}E2KM = JE1KM{+, , ⇥, ÷}JE2KM
x := 100 + 2;
if(x)
x := x * 10
else
x := x / 2;
while(x)
x := x - 1
Javar-4
C ! x := E
| if (E) C else C
| while (E) C
| C; C
E ! n (n 2 Z)
| x
| E op E (op 2 {+, , ⇤, /})
Javar-{3,4} semantic domain
M 2 Memory = V ar ! V alue
n 2 V alue = Z
x 2 V ar = V ariables
JCK 2 Memory ! Memory
JEK 2 Me...
Javar-4 semantics
Jx := EKM = M{x ! JEKM}
Jif(E) C1 else C2KM = if JEKM 6= 0 then JC1KM else JC2KM
Jwhile(E) CKM = if JEKM...
This is not a definition
Jwhile(E) CKM = if JEKM 6= 0 then Jwhile(E) CK(JCKM) else M
GNU = GNU’s Not Unix
The existence and uniqueness of the fixed-point
is guaranteed by domain theory
Jwhile(E) CKM = if JEKM 6= 0 then Jwhile(E) ...
Abstract interpretation revisited
• Safely estimate program semantics in a finite time
• Abstraction is not omission, guara...
Key Elements of Abstract Interpretation
• Domain : concrete domain, abstract domain
• Semantics : concrete semantics, abst...
Galois Connection
8x 2 D, ˆx 2 ˆD : ↵(x) v ˆx () x v (ˆx)
x
ˆx
↵
D ˆD
CPO
exists partial order ⊑
exists element x where x ⊑ y (for all y ∈ D)
for all ordered subset of D, there
exists upper bo...
Lattices
Partially ordered set in which every
two elements have a unique LUB(⊔)
and a unique GLB(⊓)
Continuous Function
x
D
8ordered subset S ✓ D, F(
G
x2S
x) =
G
x2S
F(x)
D
y
z
F(x)
F(y)
F(z)
Abstract Interpretation in a Nutshell
Concrete Abstract
Program Semantics
Domain D should be CPO should be CPO
Galois Conn...
Abstract Interpretation in a Nutshell
lfp F v ˆX
false positives
lfp F
ˆX
lfp ˆF
↵ F v ˆF ↵
D ˆD
Is this program safe from buffer overruns?
void foo(int x) {
String[] strs = new String[10];
int index = 0;
if(x > 0) {
in...
void foo(int x) {
String[] strs = new String[10];
int index = 0;
if(x > 0) {
index = 1;
} else {
index = 10;
}
strs[index]...
Interval analysis based on abstract interpretation
• Concrete domain: the domain in the real world
Memory = V ar ! V alue
...
Interval analysis based on abstract interpretation
• Concrete semantics: the semantics in the real world
C x := E m = m{x ...
Interval analysis based on abstract interpretation
• Concrete execution of a program
? @ F(?) @ F(F(?)) @ F(F(F(?)))... @ ...
Interval analysis based on abstract interpretation
• Abstract domain: the domain we will use in an analysis
ˆMemory = V ar...
ㅗ
[0,0] [1,1] [2,2] ……..[-1,-1][-2,-2][-3,-3]
[-1,0] [0,1] [0,2][-2,-1][-3,-2]
[-3,-1] [-2,0] [-1,1] [0,2]
[-2,1][-3,0] [-...
Interval analysis based on abstract interpretation
• Abstract semantics: the semantics we will use in an analysis
ˆC x := ...
Interval analysis based on abstract interpretation
• Abstract execution of a program
is the analysis result of a program
ˆ...
Interval analysis based on abstract interpretation
• Widening
What if this chain has infinite length?
ˆ? @ ˆF(ˆ?) @ ˆF( ˆF(...
Interval analysis based on abstract interpretation
• Widening
ˆ? @ [0, 0] @ [0, 1] @ [0, 2]... @ [0, i 1] r [0, i] v [0, 1...
Is this program safe from buffer overruns?
void foo(int x) {
String[] strs = new String[10];
int index = 0;
if(x > 0) {
in...
Interval analysis based on abstract interpretation
0
21
3 4
5 6
index = 0; if(x > 0) index = 1 else index = 10; result = i...
Interval analysis based on abstract interpretation
ˆC C0 {} = ˆC C2 ( ˆC C1 {})
ˆC C1 {} = {index 7! [0, 0]}
ˆC C2 {index ...
void foo(int x) {
String[] strs = new String[10];
int index = 0;
if(x > 0) {
index = 1;
} else {
index = 10;
}
strs[index]...
Yoyak
Do not reinvent the wheel
https://trimaps.com/assets/website/dontreinventthemap-6ba62b8ba05d4957d2ed772584d7e4cd.png
Motivation
• Do no reinvent the wheel : many components that static analyzers often use
are reusable
• CFG data types : co...
Motivation
• Perfect to be a framework : the theory of abstract
interpretation guarantees soundness and termination of the...
Overview
Yoyak
Abstract Domain
Fixed Point
Computation
Abstract Semantics
MapDom
MemDom
Interval
ArithmeticOps
LatticeOps
...
Fixed-point Computation in Yoyak
Built-in work-list algorithm
x := 10
Assume (y == 0)
println(“0”)
println(“2”)
Assume (y ...
Fixed-point Computation in Yoyak
Built-in work-list algorithm
trait FlowSensitiveFixedPointComputation[D<:Galois] extends
...
Abstract Semantics in Yoyak
Built-in work-list algorithm
trait AbstractTransferable[D<:Galois] {
protected def transferIde...
Abstract Semantics in Yoyak
Built-in standard semantic
trait StdSemantics[A<:Galois,D,Mem<:MemDomLike[A,D,Mem]] extends
Ab...
Abstract Domain in Yoyak
Composable abstract domains
class MapDom[K,V <: Galois : LatticeOps] {
trait LatticeOps[D <: Galo...
Abstract Domain in Yoyak
Built-in Interval Domain
scala> import com.simplytyped.yoyak.framework.domain.arith._
import com....
Abstract Domain in Yoyak
Built-in Interval Domain
scala> import IntervalInt.arithOps
import IntervalInt.arithOps
scala> ar...
Abstract Domain in Yoyak
Built-in Standard Object Model
trait StdObjectModel[A<:Galois,D<:Galois,This<:StdObjectModel[A,D,...
Abstract Domain in Yoyak
Built-in Memory Domain
scala> import com.simplytyped.yoyak.framework.domain.mem.MemDom
scala> imp...
Abstract Domain in Yoyak
scala> val memory2 = memory.update(Local("x") -> AbsArith[IntervalInt](Interv.of(1)))
scala> val ...
IL in Yoyak
CommonIL
abstract class Stmt extends Attachable {
override def equals(that: Any): Boolean = this eq that.asIns...
IL in Yoyak
CommonIL
case class Block(stmts: StatementContainer) extends Stmt
case class Switch(v: Value.Loc, keys: List[V...
IL in Yoyak
Stmt
x := 10;
switch (y) {
case 0:
println(“0”);
break;
case 1:
println(“1”);
default:
println(“2”);
}
if(z) {...
Simple Interval Analysis in Yoyak
class IntervalAnalysis(cfg: CFG) {
def run() = {
import IntervalAnalysis.{memDomOps,absT...
Simple Interval Analysis in Yoyak
MemDom
StdObjectModel
MapDom
AbsValue
AbsRef
AbsArith
IntervalInt
AbsBox
SetAb[Any]
AbsB...
Yoyak : Scala Experience
• Scala is a very good language to implement a static analyzer
• Function is a first class citizen...
Yoyak : Scala Experience
• Function is a first class citizen
Natural way to express mathematical logic
// optimize Cfg
(ins...
Yoyak : Scala Experience
• Type class support
Can avoid F-bounded polymorphism which is the fast lane to overworking
• F-b...
Yoyak : Scala Experience
• F-bounded polymorphism
trait Queue[T, This <: Queue[T, This]] {
def push(elem: T) : This
}
trai...
Yoyak : Scala Experience
• Type class
trait QueueLike[T,This] {
def push(elem: T) : This
}
trait GoodQueueLike[T,This] {
i...
Yoyak : Scala Experience
• Type class in Yoyak
trait StdObjectModel[A<:Galois,D<:Galois,This<:StdObjectModel[A,D,This]] ex...
Yoyak : Scala Experience
• Algebraic data type support
Natural way to express an abstract syntax tree of a program
;
if(x)...
Yoyak : Scala Experience
• Algebraic data type support
Easy to navigate the abstract syntax tree
def eval(v: Value.t, inpu...
Yoyak : Scala Experience
• Native support for mutable and immutable values
Memory
x
y
z
Object
f
g
1
“A”
In some cases, mu...
Yoyak : Scala Experience
• Native support for mutable and immutable values
Memory
x
y
z
Object
f
g
1
“A”
NewObject
f
g
2
“...
Yoyak : Scala Experience
• Native support for mutable and immutable values
Memory
x
y
z
NewObject
f
g
2
“A”
object.update(...
Yoyak : Scala Experience
• Native support for mutable and immutable values
Memory
x
y
z
Object
f
g
1
“A”
NewObject
f
g
2
“...
Yoyak : Scala Experience
• Excellent support for parallelization
• Static analysis does not sufficiently utilize today’s
ad...
Yoyak : Scala Experience
• Excellent support for parallelization
Worklist Parallelization
can be naturally
implemented by ...
Yoyak : Roadmap
• Add more built-in abstract domains
• Optimize analysis performance
• Visualize analysis details
• Build ...
Yoyak : Roadmap
• Add more built-in abstract domains
Interval domain cannot represent
the relation between two variables
x...
Yoyak : Roadmap
• Add more built-in abstract domains
Octagon domain can represent the
relation between two variables
100 1...
Yoyak : Roadmap
• Add more built-in abstract domains
2-interval domain is more precise
than interval domain
100 1 2 3 4 5 ...
Yoyak : Roadmap
• Optimize analysis performance
• {Worklist, Method, Class}-level parallelization
• Reduce abstract memory...
Yoyak : Roadmap
• Visualize analysis details
It is hard to know what a static analyzer is doing at a
specific moment becaus...
Yoyak : Roadmap
• Visualize analysis details
Example from SAT solvers
Visualization of the search tree
generated by a basi...
Yoyak : Roadmap
• Build Scala compiler plug-in
• Programming language researchers foresee that the semantic
program analyz...
Yoyak : Roadmap
• Build Scala compiler plug-in
• Scala compiler is well modularized, cleanly coded (as
compared to other c...
Thank you!
Further Questions,
ScalaDays 2015
twitter @heejongl
gmail heejong@gmail.com
Upcoming SlideShare
Loading in …5
×

Yoyak ScalaDays 2015

1,657 views

Published on

Presentation slide for ScalaDays 2015

Published in: Software

Yoyak ScalaDays 2015

  1. 1. Yoyak: static analysis framework Heejong Lee ScalaDays 2015
  2. 2. Speaker Introduction • Has been working in a static analysis industry since 2008 • Studied programming language theory at a graduate school • Has been developing several static analyzers which are mostly commercial ones • Began to use Scala six years ago and still actively using it in everyday development
  3. 3. Agenda • Static analysis • Theory of abstract interpretation • Yoyak framework: implementation highlights • Yoyak framework: Scala experience • Yoyak framework: Roadmap
  4. 4. Static Analysis
  5. 5. What is Static Analysis? • Analyze source codes without actually running it • Someone prefers to call it white box test • Used for finding bugs, optimizing a compiled binary, calculating a software metric, proving safety properties, etc.
  6. 6. Examples of Static Analysis • Finding bugs : symbolic execution • Optimizing a compiled binary: data flow analysis • Calculating a software metric: syntactic analysis • Proving safety properties: model checking, abstract interpretation, type system
  7. 7. Two important terms in Static Analysis • Soundness • The analysis result should contain all possibilities which can happen in the runtime • If the analysis uses an over-approximation, it is sound • Completeness • The analysis result should not contain any possibility which cannot happen in the runtime • If the analysis uses an under-approximation, it is complete
  8. 8. Two important terms in Static Analysis Over-approximation of Semantics Program Semantics Under-approximation of Semantics
  9. 9. Abstract Interpretation The beauty of abstraction http://cargocollective.com/carlyfox/Design
  10. 10. What is the result of this expression? 19224 ⇥ 7483919 ⇥ (11952 20392)
  11. 11. What is the result of this expression? 19224 ⇥ 7483919 ⇥ (11952 20392) = 1214270048744640 How long does it take without a calculator?
  12. 12. What is the result of this expression? 19224 ⇥ 7483919 ⇥ (11952 20392) = 1214270048744640 What if we do not have an interest in the exact number, rather we just want to know whether it is positive or negative?
  13. 13. What is the result of this expression? 19224 ⇥ 7483919 ⇥ (11952 20392) ˆ+ ⇥ ˆ+ ⇥ ˆ = ˆ ↵ = n (n 2 Z ^ n < 0)
  14. 14. What is the result of this expression? 19224 ⇥ 7483919 ⇥ (11952 20392) = 1214270048744640 = n (n 2 Z ^ n < 0) takes 30 seconds takes 3 seconds • inaccurate but not incorrect • accurate enough for a specific purpose • much faster than a real calculation This is abstract interpretation
  15. 15. Is this program safe from buffer overruns? void foo(int x) { String[] strs = new String[10]; int index = 0; while(x > 0) { index = index + 1; x = x - 1; } strs[index] = "hello!"; }
  16. 16. No, ArrayIndexOutOfBoundsException may occur at the last line void foo(int x) { String[] strs = new String[10]; int index = 0; while(x > 0) { index = index + 1; x = x - 1; } strs[index] = "hello!"; } index = [0,0] index = [1,∞] index = [0,∞]
  17. 17. • Roughly but soundly execute the program Abstract interpretation for dummies
  18. 18. ? Abstract interpretation for brains
  19. 19. First, we need to precisely define what “domain” and “semantics” means in a mathematical way
  20. 20. Let me introduce you Javar language
  21. 21. 1
  22. 22. 1 What this program means?
  23. 23. Javar-1 C ! n (n 2 Z)
  24. 24. Javar-1 semantic domain n 2 V alue = Z JCK 2 V alue
  25. 25. Javar-1 semantics JnK = n
  26. 26. 1+1
  27. 27. Javar-2 C ! n op n (n 2 Z, op 2 {+, , ⇤, /})
  28. 28. Javar-{1,2} semantic domain n 2 V alue = Z JCK 2 V alue
  29. 29. Javar-2 semantics JnK = n Jn1 + n2K = Jn1K + Jn2K Jn1 n2K = Jn1K Jn2K Jn1 ⇤ n2K = Jn1K ⇥ Jn2K Jn1 / n2K = Jn1K ÷ Jn2K
  30. 30. x := x + 1
  31. 31. Javar-3 C ! x := E E ! n (n 2 Z) | x | E op E (op 2 {+, , ⇤, /})
  32. 32. Javar-3 semantic domain M 2 Memory = V ar ! V alue n 2 V alue = Z x 2 V ar = V ariables JCK 2 Memory ! Memory JEK 2 Memory ! Z
  33. 33. Javar-3 semantics Jx := EKM = M{x ! JEKM} JnKM = n JxKM = M(x) JE1{+, , ⇤, /}E2KM = JE1KM{+, , ⇥, ÷}JE2KM
  34. 34. x := 100 + 2; if(x) x := x * 10 else x := x / 2; while(x) x := x - 1
  35. 35. Javar-4 C ! x := E | if (E) C else C | while (E) C | C; C E ! n (n 2 Z) | x | E op E (op 2 {+, , ⇤, /})
  36. 36. Javar-{3,4} semantic domain M 2 Memory = V ar ! V alue n 2 V alue = Z x 2 V ar = V ariables JCK 2 Memory ! Memory JEK 2 Memory ! Z
  37. 37. Javar-4 semantics Jx := EKM = M{x ! JEKM} Jif(E) C1 else C2KM = if JEKM 6= 0 then JC1KM else JC2KM Jwhile(E) CKM = if JEKM 6= 0 then Jwhile(E) CK(JCKM) else M JnKM = n JxKM = M(x) JE1{+, , ⇤, /}E2KM = JE1KM{+, , ⇥, ÷}JE2KM
  38. 38. This is not a definition Jwhile(E) CKM = if JEKM 6= 0 then Jwhile(E) CK(JCKM) else M GNU = GNU’s Not Unix
  39. 39. The existence and uniqueness of the fixed-point is guaranteed by domain theory Jwhile(E) CKM = if JEKM 6= 0 then Jwhile(E) CK(JCKM) else M Jwhile(E) CK = M.if JEKM 6= 0 then Jwhile(E) CK(JCKM) else M F = M.if JEKM 6= 0 then F(JCKM) else M F = H(F) Jwhile(E) CK = fix( F. M.if JEKM 6= 0 then F(JCKM) else M)
  40. 40. Abstract interpretation revisited • Safely estimate program semantics in a finite time • Abstraction is not omission, guarantees soundness • Most of static analysis techniques can be defined in a form of abstract interpretation
  41. 41. Key Elements of Abstract Interpretation • Domain : concrete domain, abstract domain • Semantics : concrete semantics, abstract semantics • Galois connection : pair of abstraction and concretization functions • CPO : complete partial order • Continuous function : preserving upper bound
  42. 42. Galois Connection 8x 2 D, ˆx 2 ˆD : ↵(x) v ˆx () x v (ˆx) x ˆx ↵ D ˆD
  43. 43. CPO exists partial order ⊑ exists element x where x ⊑ y (for all y ∈ D) for all ordered subset of D, there exists upper bound x where x ∈ D
  44. 44. Lattices Partially ordered set in which every two elements have a unique LUB(⊔) and a unique GLB(⊓)
  45. 45. Continuous Function x D 8ordered subset S ✓ D, F( G x2S x) = G x2S F(x) D y z F(x) F(y) F(z)
  46. 46. Abstract Interpretation in a Nutshell Concrete Abstract Program Semantics Domain D should be CPO should be CPO Galois Connection Semantic Function F should be continuous should be monotonic Program Execution F : D ! D ˆF : ˆD ! ˆD lfp F = G i2N Fi (?) G i2N ˆFi (ˆ?) v ˆX ↵ : D ! ˆD : ˆD ! D Performing analysis using abstract interpretation = calculating in a finite timeˆX And the following formula is always satisfied (soundness guarantee) lfp F v ˆX
  47. 47. Abstract Interpretation in a Nutshell lfp F v ˆX false positives lfp F ˆX lfp ˆF ↵ F v ˆF ↵ D ˆD
  48. 48. Is this program safe from buffer overruns? void foo(int x) { String[] strs = new String[10]; int index = 0; if(x > 0) { index = 1; } else { index = 10; } strs[index] = "hello!"; }
  49. 49. void foo(int x) { String[] strs = new String[10]; int index = 0; if(x > 0) { index = 1; } else { index = 10; } strs[index] = "hello!"; } index = [0,0] index = [1,1] index = [10,10] index = [1,10]
  50. 50. Interval analysis based on abstract interpretation • Concrete domain: the domain in the real world Memory = V ar ! V alue V alue = 2Z C 2 C ! Memory ! Memory V 2 E ! Memory ! V alue
  51. 51. Interval analysis based on abstract interpretation • Concrete semantics: the semantics in the real world C x := E m = m{x 7! V E m} C if(E) C1 C2 m = V E m ? C C1 m : C C2 m C while(E) C m = V E m ? C while(E) C (C C m) : m C C1; C2 m = C C2 (C C1 m) V x m = m x V n m = {n} V E1 + E2 m = (V E1 m) + (V E2 m)
  52. 52. Interval analysis based on abstract interpretation • Concrete execution of a program ? @ F(?) @ F(F(?)) @ F(F(F(?)))... @ Fi (?) = Fi+1 (?) is the execution result of a programFi (?) 2 Memory F = m.C C m lfp F = G i2N Fi ({})
  53. 53. Interval analysis based on abstract interpretation • Abstract domain: the domain we will use in an analysis ˆMemory = V ar ! ˆV alue ˆV alue = ˆZ [ {?} ˆZ = {[a, b] | a 2 Z [ { 1}, b 2 Z [ {1}, a  b} ˆC 2 C ! ˆMemory ! ˆMemory ˆV 2 E ! ˆMemory ! ˆV alue
  54. 54. ㅗ [0,0] [1,1] [2,2] ……..[-1,-1][-2,-2][-3,-3] [-1,0] [0,1] [0,2][-2,-1][-3,-2] [-3,-1] [-2,0] [-1,1] [0,2] [-2,1][-3,0] [-1,2] …….. [-∞,∞] [0,∞] [-1,∞] [-2,∞] …….. [-∞,0] [-∞,1] [-∞,2] …….…… ……………… ………………..… …….. …….…… ……………… ………………..… Lattice of Interval Domain
  55. 55. Interval analysis based on abstract interpretation • Abstract semantics: the semantics we will use in an analysis ˆC x := E ˆm = ˆm{x 7! ˆV E ˆm} ˆC if(E) C1 C2 ˆm = ˆC C1 ˆm t ˆC C2 ˆm ˆC while(E) C ˆm = ˆm t ˆC while(E) C ( ˆC C ˆm) ˆC C1; C2 ˆm = ˆC C2 ( ˆC C1 ˆm) ˆV x ˆm = ˆm x ˆV n ˆm = ↵{n} ˆV E1 + E2 ˆm = (ˆV E1 ˆm)ˆ+(ˆV E2 ˆm)
  56. 56. Interval analysis based on abstract interpretation • Abstract execution of a program is the analysis result of a program ˆF = ˆm. ˆC C ˆm G i2N ˆFi ({}) v ˆX ˆ? @ ˆF(ˆ?) @ ˆF( ˆF(ˆ?)) @ ˆF( ˆF( ˆF(ˆ?)))... @ ˆFi (ˆ?) v ˆX ˆX
  57. 57. Interval analysis based on abstract interpretation • Widening What if this chain has infinite length? ˆ? @ ˆF(ˆ?) @ ˆF( ˆF(ˆ?)) @ ˆF( ˆF( ˆF(ˆ?)))... @ ˆFi (ˆ?) v ˆX ˆ? @ ˆF(ˆ?) @ ˆF( ˆF(ˆ?)) @ ˆF( ˆF( ˆF(ˆ?)))... @ ˆFi 1 (ˆ?)r ˆFi (ˆ?) v ˆX rWe need a widening operator
  58. 58. Interval analysis based on abstract interpretation • Widening ˆ? @ [0, 0] @ [0, 1] @ [0, 2]... @ [0, i 1] r [0, i] v [0, 1] void foo(int x) { String[] strs = new String[10]; int index = 0; while(x > 0) { index = index + 1; x = x - 1; } strs[index] = "hello!"; } index = [0,0] index = [1,∞] index = [0,∞]
  59. 59. Is this program safe from buffer overruns? void foo(int x) { String[] strs = new String[10]; int index = 0; if(x > 0) { index = 1; } else { index = 10; } strs[index] = "hello!"; }
  60. 60. Interval analysis based on abstract interpretation 0 21 3 4 5 6 index = 0; if(x > 0) index = 1 else index = 10; result = index ˆC C0 ˆm = ˆC C2 ( ˆC C1 ˆm) ˆC C1 ˆm = ˆm{index 7! ↵{0}} ˆC C2 ˆm = ˆC C4 ( ˆC C3 ˆm) ˆC C3 ˆm = ˆC C5 ˆm t ˆC C6 ˆm ˆC C4 ˆm = ˆm{result 7! ˆm index} ˆC C5 ˆm = ˆm{index 7! ↵{1}} ˆC C6 ˆm = ˆm{index 7! ↵{10}}
  61. 61. Interval analysis based on abstract interpretation ˆC C0 {} = ˆC C2 ( ˆC C1 {}) ˆC C1 {} = {index 7! [0, 0]} ˆC C2 {index 7! [0, 0]} = ˆC C4 ( ˆC C3 {index 7! [0, 0]}) ˆC C3 {index 7! [0, 0]} = ˆC C5 {index 7! [0, 0]} t ˆC C6 {index 7! [0, 0]} ˆC C4 {index 7! [1, 10]} = {index 7! [1, 10], result 7! [1, 10]} ˆC C5 {index 7! [0, 0]} = {index 7! [1, 1]} ˆC C6 {index 7! [0, 0]} = {index 7! [10, 10]} ˆC C0 {} = {index 7! [1, 10], result 7! [1, 10]}
  62. 62. void foo(int x) { String[] strs = new String[10]; int index = 0; if(x > 0) { index = 1; } else { index = 10; } strs[index] = "hello!"; } index may have an integer between 1 and 10 Since the size of the buffer strs is 10, ArrayIndexOutOfBoundsException may occur here Is this program safe from buffer overruns?
  63. 63. Yoyak Do not reinvent the wheel https://trimaps.com/assets/website/dontreinventthemap-6ba62b8ba05d4957d2ed772584d7e4cd.png
  64. 64. Motivation • Do no reinvent the wheel : many components that static analyzers often use are reusable • CFG data types : construction, optimization, visualization • Graph algorithms : unrolling loops, finding loop heads, finding topological order • Intermediate language data types : construction, optimization, pretty printing • Common abstract domains : integer interval, abstract object, abstract memory • Common abstract semantics : assignment, invoking methods, evaluating binary expressions
  65. 65. Motivation • Perfect to be a framework : the theory of abstract interpretation guarantees soundness and termination of the analysis if a user supplies valid abstract domain and semantics Generic fixed point computation engine Abstract domain D Abstract semantics F Fixed point x = F(x) (x∈D)
  66. 66. Overview Yoyak Abstract Domain Fixed Point Computation Abstract Semantics MapDom MemDom Interval ArithmeticOps LatticeOps StdSemanticsForwardAnalysis AbstractTransferable Widening Galois ILFlowSensitive FixedPoint Computation Worklist WideningAt LoopHeads Interprocedural Iteration DoWidening CommonIL Attachable Typable
  67. 67. Fixed-point Computation in Yoyak Built-in work-list algorithm x := 10 Assume (y == 0) println(“0”) println(“2”) Assume (y != 0) Assume (y == 1) println(“0”) Assume (y != 1) Assume (z) throw new Ex(); ENTRY EXIT Assume (!z) println(“done”) return; def computeFixedPoint(startNodes: List[BasicBlock])(implicit widening: Option[Widening[D]] = None) : MapDom[BasicBlock,D] = { worklist.add(startNodes:_*) var map = MapDom.empty[BasicBlock,D] while(worklist.size() > 0) { val bb = worklist.pop().get val prevInputs = memoryFetcher(map,bb) val prev = getInput(map,prevInputs) val (mapOut,next) = work(map,prev,bb) val orig = map.get(bb) val isStableOpt = ops.<=(next,orig) if(isStableOpt.isEmpty) { println("error: abs. transfer func. is not distributive") } if(!isStableOpt.get) { val widened = if(widening.nonEmpty) { doWidening(widening.get)(orig,next,bb) } else next map = mapOut.update(bb->widened) val nextWork = getNextBlocks(bb) worklist.add(nextWork:_*) } } map
  68. 68. Fixed-point Computation in Yoyak Built-in work-list algorithm trait FlowSensitiveFixedPointComputation[D<:Galois] extends FlowSensitiveIteration[D] with CfgNavigator[D] with DoWidening[D] { def computeFixedPoint(startNodes: List[BasicBlock])(implicit widening: Option[Widening[D]] = None) : MapDom[BasicBlock,D] = { class FlowSensitiveForwardAnalysis[D<:Galois](val cfg: CFG)( implicit val ops: LatticeOps[D], val absTransfer: AbstractTransferable[D], val widening: Option[Widening[D]] = None) extends FlowSensitiveFixedPointComputation[D] with WideningAtLoopHeads[D] {
  69. 69. Abstract Semantics in Yoyak Built-in work-list algorithm trait AbstractTransferable[D<:Galois] { protected def transferIdentity(stmt: Identity, input: D#Abst)( implicit context: Context) : D#Abst = input protected def transferAssign(stmt: Assign, input: D#Abst)( implicit context: Context) : D#Abst = input protected def transferInvoke(stmt: Invoke, input: D#Abst)( implicit context: Context) : D#Abst = input protected def transferIf(stmt: If, input: D#Abst)( implicit context: Context) : D#Abst = input protected def transferAssume(stmt: Assume, input: D#Abst)( implicit context: Context) : D#Abst = input // so on
  70. 70. Abstract Semantics in Yoyak Built-in standard semantic trait StdSemantics[A<:Galois,D,Mem<:MemDomLike[A,D,Mem]] extends AbstractTransferable[GaloisIdentity[Mem]] { val arithOps : ArithmeticOps[A] override protected def transferAssign(stmt: Assign, input: Mem)( implicit context: Context) : Mem = { val (rv,output) = eval(stmt.rv,input) output.update(stmt.lv,rv) }
  71. 71. Abstract Domain in Yoyak Composable abstract domains class MapDom[K,V <: Galois : LatticeOps] { trait LatticeOps[D <: Galois] extends ParOrdOps[D] { def /(lhs: D#Abst, rhs: D#Abst) : D#Abst def bottom : D#Abst trait ParOrdOps[D <: Galois] { def <=(lhs: D#Abst, rhs: D#Abst) : Option[Boolean] trait Galois { type Conc type Abst
  72. 72. Abstract Domain in Yoyak Built-in Interval Domain scala> import com.simplytyped.yoyak.framework.domain.arith._ import com.simplytyped.yoyak.framework.domain.arith._ scala> import com.simplytyped.yoyak.framework.domain.arith.Interval._ import com.simplytyped.yoyak.framework.domain.arith.Interval._ scala> val intv1 = Interv.of(10) intv1: com.simplytyped.yoyak.framework.domain.arith.Interval = Interv(IInt(10),IInt(10)) scala> val intv2 = Interv.in(IInt(-10),IInt(10)) intv2: com.simplytyped.yoyak.framework.domain.arith.Interval = Interv(IInt(-10),IInt(10)) scala> val intv3 = Interv.in(IInfMinus,IInf) intv3: com.simplytyped.yoyak.framework.domain.arith.Interval = IntervTop scala> val intv4 = Interv.in(IInt(-10),IInf) intv4: com.simplytyped.yoyak.framework.domain.arith.Interval = Interv(IInt(-10),IInf)
  73. 73. Abstract Domain in Yoyak Built-in Interval Domain scala> import IntervalInt.arithOps import IntervalInt.arithOps scala> arithOps.+(intv1,intv2) // [10,10] + [-10,10] res1: com.simplytyped.yoyak.framework.domain.arith.IntervalInt#Abst = Interv(IInt(0),IInt(20)) scala> arithOps.-(intv1,intv2) // [10,10] - [-10,10] res2: com.simplytyped.yoyak.framework.domain.arith.IntervalInt#Abst = Interv(IInt(0),IInt(20)) scala> arithOps.+(intv2,intv3) // [-10,10] + [-∞,∞] res3: com.simplytyped.yoyak.framework.domain.arith.IntervalInt#Abst = IntervTop scala> arithOps.*(intv2,intv4) // [-10,10] * [-10,∞] res4: com.simplytyped.yoyak.framework.domain.arith.IntervalInt#Abst = IntervTop scala> arithOps.*(intv1,intv4) // [10,10] * [-10,∞] res5: com.simplytyped.yoyak.framework.domain.arith.IntervalInt#Abst = Interv(IInt(-100),IInf)
  74. 74. Abstract Domain in Yoyak Built-in Standard Object Model trait StdObjectModel[A<:Galois,D<:Galois,This<:StdObjectModel[A,D,This]] extends MemDomLike[A,D,This] with ArrayJoinModel[A,D,This] { implicit val arithOps : ArithmeticOps[A] implicit val boxedOps : LatticeWithTopOps[D] def update(kv: (Loc,AbsValue[A,D])) : This def remove(loc: Local) : This def alloc(from: Stmt) : (AbsRef,This) def get(k: Loc) : AbsValue[A,D] def isStaticAddr(addr: AbsAddr) : Boolean def isDynamicAddr(addr: AbsAddr) : Boolean class MemDom[A <: Galois : ArithmeticOps, D <: Galois : LatticeWithTopOps] extends StdObjectModel[A,D,MemDom[A,D]] {
  75. 75. Abstract Domain in Yoyak Built-in Memory Domain scala> import com.simplytyped.yoyak.framework.domain.mem.MemDom scala> import com.simplytyped.yoyak.framework.domain.mem.MemElems._ scala> import com.simplytyped.yoyak.framework.domain.Galois._ scala> import com.simplytyped.yoyak.framework.domain.arith.Interv scala> import com.simplytyped.yoyak.framework.domain.arith.IntervalInt scala> import com.simplytyped.yoyak.il.CommonIL.Value._ scala> val memory = new MemDom[IntervalInt,SetAbstraction[String]] memory: com.simplytyped.yoyak.framework.domain.mem.MemDom[com.simplytyped.yoyak.framework.doma in.arith.IntervalInt,com.simplytyped.yoyak.framework.domain.Galois.SetAbstraction[Stri ng]] = com.simplytyped.yoyak.framework.domain.mem.MemDom@8443a1
  76. 76. Abstract Domain in Yoyak scala> val memory2 = memory.update(Local("x") -> AbsArith[IntervalInt](Interv.of(1))) scala> val memory3 = memory.update(Local("x") -> AbsArith[IntervalInt](Interv.of(10))) scala> val memory4 = MemDom.ops[IntervalInt,SetAbstraction[String]]./(memory2,memory3) scala> memory4.get(Local("x")) res1: com.simplytyped.yoyak.framework.domain.mem.MemElems.AbsValue[com.simplytyped.yoyak.framework .domain.arith.IntervalInt,com.simplytyped.yoyak.framework.domain.Galois.SetAbstraction[Strin g]] = AbsArith(Interv(IInt(1),IInt(10))) Built-in Memory Domain
  77. 77. IL in Yoyak CommonIL abstract class Stmt extends Attachable { override def equals(that: Any): Boolean = this eq that.asInstanceOf[AnyRef] override def hashCode() : Int = System.identityHashCode(this) private[Stmt] def copyAttr(stmt: Stmt) : this.type = {sourcePos = stmt.pos; this} }
  78. 78. IL in Yoyak CommonIL case class Block(stmts: StatementContainer) extends Stmt case class Switch(v: Value.Loc, keys: List[Value.t], targets: List[Target]) extends Stmt case class Placeholder(x: AnyRef) extends Stmt sealed trait CoreStmt extends Stmt case class If(cond: Value.CondBinExp, target: Target) extends CoreStmt case class Goto(target: Target) extends CoreStmt sealed trait CfgStmt extends CoreStmt case class Identity(lv: Value.Local, rv: Value.Param) extends CfgStmt case class Assign(lv: Value.Loc, rv: Value.t) extends CfgStmt case class Invoke(ret: Option[Value.Local], callee: Type.InvokeType) extends CfgStmt case class Assume(cond: Value.CondBinExp) extends CfgStmt case class Return(v: Option[Value.Loc]) extends CfgStmt case class Nop() extends CfgStmt case class EnterMonitor(v: Value.Loc) extends CfgStmt case class ExitMonitor(v: Value.Loc) extends CfgStmt case class Throw(v: Value.Loc) extends CfgStmt
  79. 79. IL in Yoyak Stmt x := 10; switch (y) { case 0: println(“0”); break; case 1: println(“1”); default: println(“2”); } if(z) { throw new Exception(); } else { println(“done”); } return 0; x := 10; if(y == 0) { println(“0”); goto D; } if(y == 1) { println(“1”); } D: println(“2”); if(z) { throw new Exception(); } else { println(“done”); } return 0; CoreStmt x := 10 Assume (y == 0) println(“0”) println(“2”) Assume (y != 0) Assume (y == 1) println(“0”) Assume (y != 1) Assume (z) throw new Ex(); ENTRY EXIT Assume (!z) println(“done”) return; CfgStmt
  80. 80. Simple Interval Analysis in Yoyak class IntervalAnalysis(cfg: CFG) { def run() = { import IntervalAnalysis.{memDomOps,absTransfer,widening} val analysis = new FlowSensitiveForwardAnalysis[GMemory](cfg) val output = analysis.compute output } } object IntervalAnalysis { type Memory = MemDom[IntervalInt,SetAbstraction[Any]] type GMemory = GaloisIdentity[Memory] implicit val absTransfer : AbstractTransferable[GMemory] = new StdSemantics[IntervalInt,SetAbstraction[Any],Memory] { val arithOps: ArithmeticOps[IntervalInt] = IntervalInt.arithOps } implicit val memDomOps : LatticeOps[GMemory] = MemDom.ops[IntervalInt,SetAbstraction[Any]] implicit val widening : Option[Widening[GMemory]] = { implicit val NoWideningForSetAbstraction = Widening.NoWidening[SetAbstraction[Any]] Some(MemDom.widening[IntervalInt,SetAbstraction[Any]]) } }
  81. 81. Simple Interval Analysis in Yoyak MemDom StdObjectModel MapDom AbsValue AbsRef AbsArith IntervalInt AbsBox SetAb[Any] AbsBottom AbsTop AbsObject AbsAddr IntervalAnalysis FlowSensitive ForwardAnalysis FlowSensitive FixedPointComputation Worklist LatticeOps FlowSensitiveIteration Abstract Transferable CfgNavigator WideningAtLoopHeads Widening MapDom BasicBlock MemDom MemDom.op IntervalInt.widening IntervalAnalysisTransferFunction CFG Fixed-point result StdSemantics ArithmeticOps IntervalInt.arithOps
  82. 82. Yoyak : Scala Experience • Scala is a very good language to implement a static analyzer • Function is a first class citizen • Type class support • Algebraic data type support • Native support for mutable and immutable values • Excellent support for parallelization
  83. 83. Yoyak : Scala Experience • Function is a first class citizen Natural way to express mathematical logic // optimize Cfg (insertAssume _ andThen removeIfandGoto) apply rawCfg
  84. 84. Yoyak : Scala Experience • Type class support Can avoid F-bounded polymorphism which is the fast lane to overworking • F-bounded polymorphism • Commonly happen when inheritance meets immutability • Seriously deteriorate code readability
  85. 85. Yoyak : Scala Experience • F-bounded polymorphism trait Queue[T, This <: Queue[T, This]] { def push(elem: T) : This } trait GoodQueue[T, This <: GoodQueue[T, This]] extends Queue[T, This] { def pop : (T, This) } trait BetterQueue[T, R, This <: BetterQueue[T, R, This]] extends GoodQueue[T, This] { def giveMeSomethingNew : R } trait QueueUnited[T, R, Q <: Queue[T, Q], G <: GoodQueue[T, G], B <: BetterQueue[T, R, B], This <: QueueUnited[T, R, Q, G, B, This]] extends BetterQueue[T, R, This] { def giveUp : Unit } • Always need the type of concrete subclass • Reiterate all type variables again in subclass reference • Type class liberates methods from inheritance
  86. 86. Yoyak : Scala Experience • Type class trait QueueLike[T,This] { def push(elem: T) : This } trait GoodQueueLike[T,This] { implicit val queueLike : QueueLike[T,This] def push(elem: T) : This = queueLike.push(elem) def pop(q: This) : (T,This) } trait BetterQueueLike[T,R,This] { implicit val goodQueueLike : GoodQueueLike[T,This] def push(elem: T) : This = goodQueueLike.push(elem) def pop(q: This) : (T,This) = goodQueueLike.pop(q) def giveMeSomethingNew : R } class QueueUnited[T,R,This](implicit val q : QueueLike[T,This], g : GoodQueueLike[T,This], b : BetterQueueLike[T,R,This]) { def push(elem: T) : This = b.push(elem) def pop(q: This) : (T,This) = b.pop(q) def giveMeSomethingNew : R = b.giveMeSomethingNew def giveUp : Unit = {} }
  87. 87. Yoyak : Scala Experience • Type class in Yoyak trait StdObjectModel[A<:Galois,D<:Galois,This<:StdObjectModel[A,D,This]] extends MemDomLike[A,D,This] with ArrayJoinModel[A,D,This] { implicit val arithOps : ArithmeticOps[A] implicit val boxedOps : LatticeWithTopOps[D] Use both methods in an appropriate place
  88. 88. Yoyak : Scala Experience • Algebraic data type support Natural way to express an abstract syntax tree of a program ; if(x) a = 1 a = 2 println(a) Seq( If(“x”,Assign(“a”,1), Assign(“a”,2)), Invoke(“println”,List(“a”)) )
  89. 89. Yoyak : Scala Experience • Algebraic data type support Easy to navigate the abstract syntax tree def eval(v: Value.t, input: Mem)(implicit context: Context) : (AbsValue[A,D],Mem) = { v match { case x : Value.Constant => evalConstant(x,input) case x : Value.Loc => evalLoc(x,input) case x : Value.BinExp => evalBinExp(x,input) case Value.This => (AbsRef(Set("$this")),input) case Value.CaughtExceptionRef => (AbsRef(Set("$caughtex")),input) case Value.CastExp(v, ofTy) => evalLoc(v,input) case Value.InstanceOfExp(v, ofTy) => (AbsTop,input) case Value.LengthExp(v) => (AbsTop,input) case Value.NewExp(ofTy) => input.alloc(context.stmt) case Value.NewArrayExp(ofTy, size) => input.alloc(context.stmt)
  90. 90. Yoyak : Scala Experience • Native support for mutable and immutable values Memory x y z Object f g 1 “A” In some cases, mutability is more important than immutability
  91. 91. Yoyak : Scala Experience • Native support for mutable and immutable values Memory x y z Object f g 1 “A” NewObject f g 2 “A” memory.filter{_._2 == object}.foldLeft(memory) { case (m,(k,_)) => m + (k -> newObject) } O(n)
  92. 92. Yoyak : Scala Experience • Native support for mutable and immutable values Memory x y z NewObject f g 2 “A” object.update(newObject) O(1)
  93. 93. Yoyak : Scala Experience • Native support for mutable and immutable values Memory x y z Object f g 1 “A” NewObject f g 2 “A” If we frequently update immutable objects in a big memory, it may result in severe inefficiency
  94. 94. Yoyak : Scala Experience • Excellent support for parallelization • Static analysis does not sufficiently utilize today’s advancement of computing scalability (multicore machines, big data technologies, cloud computing) • Scala has a perfect platform to experiment parallelization which called Akka • Many fun things to try with Yoyak powered by Akka
  95. 95. Yoyak : Scala Experience • Excellent support for parallelization Worklist Parallelization can be naturally implemented by Akka’s Actor model
  96. 96. Yoyak : Roadmap • Add more built-in abstract domains • Optimize analysis performance • Visualize analysis details • Build Scala compiler plug-in
  97. 97. Yoyak : Roadmap • Add more built-in abstract domains Interval domain cannot represent the relation between two variables x = [2,8], y = [1,7] produce 49 combinations of (x,y) pairs 100 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 X Axis YAxis
  98. 98. Yoyak : Roadmap • Add more built-in abstract domains Octagon domain can represent the relation between two variables 100 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 X Axis YAxis http://www.di.ens.fr/~mine/publi/article-mine-HOSC06.pdf
  99. 99. Yoyak : Roadmap • Add more built-in abstract domains 2-interval domain is more precise than interval domain 100 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 X Axis YAxis
  100. 100. Yoyak : Roadmap • Optimize analysis performance • {Worklist, Method, Class}-level parallelization • Reduce abstract memory size by removing unused variables (faster join operation for abstract memory) • Optional faster but unsound analysis
  101. 101. Yoyak : Roadmap • Visualize analysis details It is hard to know what a static analyzer is doing at a specific moment because… • Static analyzer’s behavior is very different for each input program • Often need to inspect and compare a map with thousands of entries • Unable to look over the big picture by ordinary Java debuggers
  102. 102. Yoyak : Roadmap • Visualize analysis details Example from SAT solvers Visualization of the search tree generated by a basic DPLL algorithm DPVis
  103. 103. Yoyak : Roadmap • Build Scala compiler plug-in • Programming language researchers foresee that the semantic program analyzer will be merged with compiler systems in the near future as the type system did Syntactic Analysis Grammar Checking Type System Semantic Analysis
  104. 104. Yoyak : Roadmap • Build Scala compiler plug-in • Scala compiler is well modularized, cleanly coded (as compared to other compiler systems), so it is an excellent platform for experimenting new ideas • Pure Scala code is safe from null, however linked Java libraries are not • It would be great if Scala compiler can detect possible null dereferences at a compile time and issue a warning
  105. 105. Thank you! Further Questions, ScalaDays 2015 twitter @heejongl gmail heejong@gmail.com

×