2. Speaker Introduction
• Has been working in a static analysis industry since 2008
• Studied programming language theory at a graduate school
• Has been developing several static analyzers which are
mostly commercial ones
• Began to use Scala six years ago and still actively using it in
everyday development
5. What is Static Analysis?
• Analyze source codes without actually running it
• Someone prefers to call it white box test
• Used for finding bugs, optimizing a compiled binary,
calculating a software metric, proving safety properties, etc.
6. Examples of Static Analysis
• Finding bugs : symbolic execution
• Optimizing a compiled binary: data flow analysis
• Calculating a software metric: syntactic analysis
• Proving safety properties: model checking, abstract
interpretation, type system
7. Two important terms in Static Analysis
• Soundness
• The analysis result should contain all possibilities which can
happen in the runtime
• If the analysis uses an over-approximation, it is sound
• Completeness
• The analysis result should not contain any possibility which
cannot happen in the runtime
• If the analysis uses an under-approximation, it is complete
8. Two important terms in Static Analysis
Over-approximation of Semantics
Program Semantics
Under-approximation of
Semantics
10. What is the result of this expression?
19224 ⇥ 7483919 ⇥ (11952 20392)
11. What is the result of this expression?
19224 ⇥ 7483919 ⇥ (11952 20392)
= 1214270048744640
How long does it take without a calculator?
12. What is the result of this expression?
19224 ⇥ 7483919 ⇥ (11952 20392)
= 1214270048744640
What if we do not have an interest in the exact number, rather
we just want to know whether it is positive or negative?
13. What is the result of this expression?
19224 ⇥ 7483919 ⇥ (11952 20392)
ˆ+ ⇥ ˆ+ ⇥ ˆ
= ˆ
↵
= n (n 2 Z ^ n < 0)
14. What is the result of this expression?
19224 ⇥ 7483919 ⇥ (11952 20392)
= 1214270048744640
= n (n 2 Z ^ n < 0)
takes 30 seconds
takes 3 seconds
• inaccurate but not incorrect
• accurate enough for a specific purpose
• much faster than a real calculation
This is abstract interpretation
15. Is this program safe from buffer overruns?
void foo(int x) {
String[] strs = new String[10];
int index = 0;
while(x > 0) {
index = index + 1;
x = x - 1;
}
strs[index] = "hello!";
}
16. No, ArrayIndexOutOfBoundsException may occur at the last line
void foo(int x) {
String[] strs = new String[10];
int index = 0;
while(x > 0) {
index = index + 1;
x = x - 1;
}
strs[index] = "hello!";
}
index = [0,0]
index = [1,∞]
index = [0,∞]
17. • Roughly but soundly execute the program
Abstract interpretation for dummies
34. x := 100 + 2;
if(x)
x := x * 10
else
x := x / 2;
while(x)
x := x - 1
35. Javar-4
C ! x := E
| if (E) C else C
| while (E) C
| C; C
E ! n (n 2 Z)
| x
| E op E (op 2 {+, , ⇤, /})
36. Javar-{3,4} semantic domain
M 2 Memory = V ar ! V alue
n 2 V alue = Z
x 2 V ar = V ariables
JCK 2 Memory ! Memory
JEK 2 Memory ! Z
37. Javar-4 semantics
Jx := EKM = M{x ! JEKM}
Jif(E) C1 else C2KM = if JEKM 6= 0 then JC1KM else JC2KM
Jwhile(E) CKM = if JEKM 6= 0 then Jwhile(E) CK(JCKM) else M
JnKM = n
JxKM = M(x)
JE1{+, , ⇤, /}E2KM = JE1KM{+, , ⇥, ÷}JE2KM
38. This is not a definition
Jwhile(E) CKM = if JEKM 6= 0 then Jwhile(E) CK(JCKM) else M
GNU = GNU’s Not Unix
39. The existence and uniqueness of the fixed-point
is guaranteed by domain theory
Jwhile(E) CKM = if JEKM 6= 0 then Jwhile(E) CK(JCKM) else M
Jwhile(E) CK = M.if JEKM 6= 0 then Jwhile(E) CK(JCKM) else M
F = M.if JEKM 6= 0 then F(JCKM) else M
F = H(F)
Jwhile(E) CK = fix( F. M.if JEKM 6= 0 then F(JCKM) else M)
40. Abstract interpretation revisited
• Safely estimate program semantics in a finite time
• Abstraction is not omission, guarantees soundness
• Most of static analysis techniques can be defined in a form of
abstract interpretation
41. Key Elements of Abstract Interpretation
• Domain : concrete domain, abstract domain
• Semantics : concrete semantics, abstract semantics
• Galois connection : pair of abstraction and concretization
functions
• CPO : complete partial order
• Continuous function : preserving upper bound
46. Abstract Interpretation in a Nutshell
Concrete Abstract
Program Semantics
Domain D should be CPO should be CPO
Galois Connection
Semantic Function F should be continuous should be monotonic
Program Execution
F : D ! D ˆF : ˆD ! ˆD
lfp F =
G
i2N
Fi
(?)
G
i2N
ˆFi
(ˆ?) v ˆX
↵ : D ! ˆD : ˆD ! D
Performing analysis using abstract interpretation = calculating in a finite timeˆX
And the following formula is always satisfied (soundness guarantee)
lfp F v ˆX
48. Is this program safe from buffer overruns?
void foo(int x) {
String[] strs = new String[10];
int index = 0;
if(x > 0) {
index = 1;
} else {
index = 10;
}
strs[index] = "hello!";
}
49. void foo(int x) {
String[] strs = new String[10];
int index = 0;
if(x > 0) {
index = 1;
} else {
index = 10;
}
strs[index] = "hello!";
}
index = [0,0]
index = [1,1]
index = [10,10]
index = [1,10]
50. Interval analysis based on abstract interpretation
• Concrete domain: the domain in the real world
Memory = V ar ! V alue
V alue = 2Z
C 2 C ! Memory ! Memory
V 2 E ! Memory ! V alue
51. Interval analysis based on abstract interpretation
• Concrete semantics: the semantics in the real world
C x := E m = m{x 7! V E m}
C if(E) C1 C2 m = V E m ? C C1 m : C C2 m
C while(E) C m = V E m ? C while(E) C (C C m) : m
C C1; C2 m = C C2 (C C1 m)
V x m = m x
V n m = {n}
V E1 + E2 m = (V E1 m) + (V E2 m)
52. Interval analysis based on abstract interpretation
• Concrete execution of a program
? @ F(?) @ F(F(?)) @ F(F(F(?)))... @ Fi
(?) = Fi+1
(?)
is the execution result of a programFi
(?) 2 Memory
F = m.C C m
lfp F =
G
i2N
Fi
({})
53. Interval analysis based on abstract interpretation
• Abstract domain: the domain we will use in an analysis
ˆMemory = V ar ! ˆV alue
ˆV alue = ˆZ [ {?}
ˆZ = {[a, b] | a 2 Z [ { 1}, b 2 Z [ {1}, a b}
ˆC 2 C ! ˆMemory ! ˆMemory
ˆV 2 E ! ˆMemory ! ˆV alue
55. Interval analysis based on abstract interpretation
• Abstract semantics: the semantics we will use in an analysis
ˆC x := E ˆm = ˆm{x 7! ˆV E ˆm}
ˆC if(E) C1 C2 ˆm = ˆC C1 ˆm t ˆC C2 ˆm
ˆC while(E) C ˆm = ˆm t ˆC while(E) C ( ˆC C ˆm)
ˆC C1; C2 ˆm = ˆC C2 ( ˆC C1 ˆm)
ˆV x ˆm = ˆm x
ˆV n ˆm = ↵{n}
ˆV E1 + E2 ˆm = (ˆV E1 ˆm)ˆ+(ˆV E2 ˆm)
56. Interval analysis based on abstract interpretation
• Abstract execution of a program
is the analysis result of a program
ˆF = ˆm. ˆC C ˆm
G
i2N
ˆFi
({}) v ˆX
ˆ? @ ˆF(ˆ?) @ ˆF( ˆF(ˆ?)) @ ˆF( ˆF( ˆF(ˆ?)))... @ ˆFi
(ˆ?) v ˆX
ˆX
57. Interval analysis based on abstract interpretation
• Widening
What if this chain has infinite length?
ˆ? @ ˆF(ˆ?) @ ˆF( ˆF(ˆ?)) @ ˆF( ˆF( ˆF(ˆ?)))... @ ˆFi
(ˆ?) v ˆX
ˆ? @ ˆF(ˆ?) @ ˆF( ˆF(ˆ?)) @ ˆF( ˆF( ˆF(ˆ?)))... @ ˆFi 1
(ˆ?)r ˆFi
(ˆ?) v ˆX
rWe need a widening operator
58. Interval analysis based on abstract interpretation
• Widening
ˆ? @ [0, 0] @ [0, 1] @ [0, 2]... @ [0, i 1] r [0, i] v [0, 1]
void foo(int x) {
String[] strs = new String[10];
int index = 0;
while(x > 0) {
index = index + 1;
x = x - 1;
}
strs[index] = "hello!";
}
index = [0,0]
index = [1,∞]
index = [0,∞]
59. Is this program safe from buffer overruns?
void foo(int x) {
String[] strs = new String[10];
int index = 0;
if(x > 0) {
index = 1;
} else {
index = 10;
}
strs[index] = "hello!";
}
62. void foo(int x) {
String[] strs = new String[10];
int index = 0;
if(x > 0) {
index = 1;
} else {
index = 10;
}
strs[index] = "hello!";
}
index may have an integer between 1 and 10
Since the size of the buffer strs is 10,
ArrayIndexOutOfBoundsException may occur here
Is this program safe from buffer overruns?
63. Yoyak
Do not reinvent the wheel
https://trimaps.com/assets/website/dontreinventthemap-6ba62b8ba05d4957d2ed772584d7e4cd.png
64. Motivation
• Do no reinvent the wheel : many components that static analyzers often use
are reusable
• CFG data types : construction, optimization, visualization
• Graph algorithms : unrolling loops, finding loop heads, finding topological
order
• Intermediate language data types : construction, optimization, pretty
printing
• Common abstract domains : integer interval, abstract object, abstract
memory
• Common abstract semantics : assignment, invoking methods, evaluating
binary expressions
65. Motivation
• Perfect to be a framework : the theory of abstract
interpretation guarantees soundness and termination of the
analysis if a user supplies valid abstract domain and
semantics
Generic fixed point
computation engine
Abstract domain D
Abstract semantics F
Fixed point
x = F(x) (x∈D)
76. Abstract Domain in Yoyak
scala> val memory2 = memory.update(Local("x") -> AbsArith[IntervalInt](Interv.of(1)))
scala> val memory3 = memory.update(Local("x") -> AbsArith[IntervalInt](Interv.of(10)))
scala> val memory4 = MemDom.ops[IntervalInt,SetAbstraction[String]]./(memory2,memory3)
scala> memory4.get(Local("x"))
res1:
com.simplytyped.yoyak.framework.domain.mem.MemElems.AbsValue[com.simplytyped.yoyak.framework
.domain.arith.IntervalInt,com.simplytyped.yoyak.framework.domain.Galois.SetAbstraction[Strin
g]] = AbsArith(Interv(IInt(1),IInt(10)))
Built-in Memory Domain
77. IL in Yoyak
CommonIL
abstract class Stmt extends Attachable {
override def equals(that: Any): Boolean = this eq that.asInstanceOf[AnyRef]
override def hashCode() : Int = System.identityHashCode(this)
private[Stmt] def copyAttr(stmt: Stmt) : this.type = {sourcePos = stmt.pos; this}
}
78. IL in Yoyak
CommonIL
case class Block(stmts: StatementContainer) extends Stmt
case class Switch(v: Value.Loc, keys: List[Value.t], targets: List[Target]) extends Stmt
case class Placeholder(x: AnyRef) extends Stmt
sealed trait CoreStmt extends Stmt
case class If(cond: Value.CondBinExp, target: Target) extends CoreStmt
case class Goto(target: Target) extends CoreStmt
sealed trait CfgStmt extends CoreStmt
case class Identity(lv: Value.Local, rv: Value.Param) extends CfgStmt
case class Assign(lv: Value.Loc, rv: Value.t) extends CfgStmt
case class Invoke(ret: Option[Value.Local], callee: Type.InvokeType) extends CfgStmt
case class Assume(cond: Value.CondBinExp) extends CfgStmt
case class Return(v: Option[Value.Loc]) extends CfgStmt
case class Nop() extends CfgStmt
case class EnterMonitor(v: Value.Loc) extends CfgStmt
case class ExitMonitor(v: Value.Loc) extends CfgStmt
case class Throw(v: Value.Loc) extends CfgStmt
79. IL in Yoyak
Stmt
x := 10;
switch (y) {
case 0:
println(“0”);
break;
case 1:
println(“1”);
default:
println(“2”);
}
if(z) {
throw new Exception();
} else {
println(“done”);
}
return 0;
x := 10;
if(y == 0) {
println(“0”);
goto D;
}
if(y == 1) {
println(“1”);
}
D:
println(“2”);
if(z) {
throw new Exception();
} else {
println(“done”);
}
return 0;
CoreStmt
x := 10
Assume (y == 0)
println(“0”)
println(“2”)
Assume (y != 0)
Assume (y == 1)
println(“0”)
Assume (y != 1)
Assume (z)
throw new Ex();
ENTRY
EXIT
Assume (!z)
println(“done”)
return;
CfgStmt
80. Simple Interval Analysis in Yoyak
class IntervalAnalysis(cfg: CFG) {
def run() = {
import IntervalAnalysis.{memDomOps,absTransfer,widening}
val analysis = new FlowSensitiveForwardAnalysis[GMemory](cfg)
val output = analysis.compute
output
}
}
object IntervalAnalysis {
type Memory = MemDom[IntervalInt,SetAbstraction[Any]]
type GMemory = GaloisIdentity[Memory]
implicit val absTransfer : AbstractTransferable[GMemory] =
new StdSemantics[IntervalInt,SetAbstraction[Any],Memory] {
val arithOps: ArithmeticOps[IntervalInt] = IntervalInt.arithOps
}
implicit val memDomOps : LatticeOps[GMemory] = MemDom.ops[IntervalInt,SetAbstraction[Any]]
implicit val widening : Option[Widening[GMemory]] = {
implicit val NoWideningForSetAbstraction = Widening.NoWidening[SetAbstraction[Any]]
Some(MemDom.widening[IntervalInt,SetAbstraction[Any]])
}
}
82. Yoyak : Scala Experience
• Scala is a very good language to implement a static analyzer
• Function is a first class citizen
• Type class support
• Algebraic data type support
• Native support for mutable and immutable values
• Excellent support for parallelization
83. Yoyak : Scala Experience
• Function is a first class citizen
Natural way to express mathematical logic
// optimize Cfg
(insertAssume _ andThen removeIfandGoto) apply rawCfg
84. Yoyak : Scala Experience
• Type class support
Can avoid F-bounded polymorphism which is the fast lane to overworking
• F-bounded polymorphism
• Commonly happen when inheritance meets immutability
• Seriously deteriorate code readability
85. Yoyak : Scala Experience
• F-bounded polymorphism
trait Queue[T, This <: Queue[T, This]] {
def push(elem: T) : This
}
trait GoodQueue[T, This <: GoodQueue[T, This]] extends Queue[T, This] {
def pop : (T, This)
}
trait BetterQueue[T, R, This <: BetterQueue[T, R, This]] extends GoodQueue[T,
This] {
def giveMeSomethingNew : R
}
trait QueueUnited[T, R, Q <: Queue[T, Q], G <: GoodQueue[T, G], B <:
BetterQueue[T, R, B], This <: QueueUnited[T, R, Q, G, B, This]] extends
BetterQueue[T, R, This] {
def giveUp : Unit
}
• Always need the type of concrete subclass
• Reiterate all type variables again in subclass reference
• Type class liberates methods from inheritance
86. Yoyak : Scala Experience
• Type class
trait QueueLike[T,This] {
def push(elem: T) : This
}
trait GoodQueueLike[T,This] {
implicit val queueLike : QueueLike[T,This]
def push(elem: T) : This = queueLike.push(elem)
def pop(q: This) : (T,This)
}
trait BetterQueueLike[T,R,This] {
implicit val goodQueueLike : GoodQueueLike[T,This]
def push(elem: T) : This = goodQueueLike.push(elem)
def pop(q: This) : (T,This) = goodQueueLike.pop(q)
def giveMeSomethingNew : R
}
class QueueUnited[T,R,This](implicit val q : QueueLike[T,This], g :
GoodQueueLike[T,This], b : BetterQueueLike[T,R,This]) {
def push(elem: T) : This = b.push(elem)
def pop(q: This) : (T,This) = b.pop(q)
def giveMeSomethingNew : R = b.giveMeSomethingNew
def giveUp : Unit = {}
}
87. Yoyak : Scala Experience
• Type class in Yoyak
trait StdObjectModel[A<:Galois,D<:Galois,This<:StdObjectModel[A,D,This]] extends
MemDomLike[A,D,This] with ArrayJoinModel[A,D,This] {
implicit val arithOps : ArithmeticOps[A]
implicit val boxedOps : LatticeWithTopOps[D]
Use both methods in an appropriate place
88. Yoyak : Scala Experience
• Algebraic data type support
Natural way to express an abstract syntax tree of a program
;
if(x)
a = 1 a = 2
println(a)
Seq(
If(“x”,Assign(“a”,1),
Assign(“a”,2)),
Invoke(“println”,List(“a”))
)
89. Yoyak : Scala Experience
• Algebraic data type support
Easy to navigate the abstract syntax tree
def eval(v: Value.t, input: Mem)(implicit context: Context) : (AbsValue[A,D],Mem) = {
v match {
case x : Value.Constant => evalConstant(x,input)
case x : Value.Loc => evalLoc(x,input)
case x : Value.BinExp => evalBinExp(x,input)
case Value.This => (AbsRef(Set("$this")),input)
case Value.CaughtExceptionRef => (AbsRef(Set("$caughtex")),input)
case Value.CastExp(v, ofTy) => evalLoc(v,input)
case Value.InstanceOfExp(v, ofTy) => (AbsTop,input)
case Value.LengthExp(v) => (AbsTop,input)
case Value.NewExp(ofTy) => input.alloc(context.stmt)
case Value.NewArrayExp(ofTy, size) => input.alloc(context.stmt)
90. Yoyak : Scala Experience
• Native support for mutable and immutable values
Memory
x
y
z
Object
f
g
1
“A”
In some cases, mutability is more important than immutability
91. Yoyak : Scala Experience
• Native support for mutable and immutable values
Memory
x
y
z
Object
f
g
1
“A”
NewObject
f
g
2
“A”
memory.filter{_._2 == object}.foldLeft(memory) {
case (m,(k,_)) => m + (k -> newObject)
}
O(n)
92. Yoyak : Scala Experience
• Native support for mutable and immutable values
Memory
x
y
z
NewObject
f
g
2
“A”
object.update(newObject) O(1)
93. Yoyak : Scala Experience
• Native support for mutable and immutable values
Memory
x
y
z
Object
f
g
1
“A”
NewObject
f
g
2
“A”
If we frequently update immutable objects in a big memory,
it may result in severe inefficiency
94. Yoyak : Scala Experience
• Excellent support for parallelization
• Static analysis does not sufficiently utilize today’s
advancement of computing scalability (multicore
machines, big data technologies, cloud computing)
• Scala has a perfect platform to experiment parallelization
which called Akka
• Many fun things to try with Yoyak powered by Akka
95. Yoyak : Scala Experience
• Excellent support for parallelization
Worklist Parallelization
can be naturally
implemented by Akka’s
Actor model
101. Yoyak : Roadmap
• Visualize analysis details
It is hard to know what a static analyzer is doing at a
specific moment because…
• Static analyzer’s behavior is very different for each
input program
• Often need to inspect and compare a map with
thousands of entries
• Unable to look over the big picture by ordinary Java
debuggers
102. Yoyak : Roadmap
• Visualize analysis details
Example from SAT solvers
Visualization of the search tree
generated by a basic DPLL
algorithm
DPVis
103. Yoyak : Roadmap
• Build Scala compiler plug-in
• Programming language researchers foresee that the semantic
program analyzer will be merged with compiler systems in the
near future as the type system did
Syntactic Analysis
Grammar Checking
Type System Semantic Analysis
104. Yoyak : Roadmap
• Build Scala compiler plug-in
• Scala compiler is well modularized, cleanly coded (as
compared to other compiler systems), so it is an excellent
platform for experimenting new ideas
• Pure Scala code is safe from null, however linked Java
libraries are not
• It would be great if Scala compiler can detect possible null
dereferences at a compile time and issue a warning