Judge: Identifying, Understanding, and Evaluating Sources of Unsoundness in Call Graphs

Judge: Identifying,
Understanding, and Evaluating
Sources of Unsoundness in Call
Graphs
Michael Reif, Florian Kübler, Michael Eichberg, Dominik Helm, and Mira Mezini

Software Technology Group

TU Darmstadt
@Reifmi

Why We Shouldn’t Take  
Call Graphs for Granted
• Call graphs are a central data-structure for numerous static
analyses

• Call graphs directly impact a client analysis’ result

• The chosen algorithm predetermines an analysis’ precision
and recall

• Programming languages evolve (APIs and features are
added) and frameworks might not
!2

State-of-the-art Call-graph
Generators for Java
• Many different static analysis frameworks are available

• All can compute a different set of call graphs

• All frameworks use different approaches and make unknown
trade-offs or implementation choices

• Are they actually comparable??
!3
OPAL

Judge’s Overview
TC1.jarTC2.jar⟨Test Case⟩
.jar
⟨Advanced
Test Case⟩
.jar
compile test cases
AllTestCases
<Test Fixtures
Category>.md
Test Case 1(TC1)
…
Test Case 3 (TCN)
⟨Test Fixtures⟩.md
Test Case 1
…
Test Case 3

Judge’s Overview
.jar
⟨Advanced
Test Case⟩
.jar
compile test cases
AllTestCases
<Test Fixtures
Category>.md
Test Case 1(TC1)
…
Test Case 3 (TCN)
Test Case 1
…
Test Case 3
⟨CG⟩
.json
compute CG
Done for each CG per supported
static analysis framework.
⟨CG Algorithm Profile⟩
.tsvcompute profile using CG and expected call targets

Judge’s Overview
.jar
⟨Advanced
Test Case⟩
.jar
compile test cases
AllTestCases
<Test Fixtures
Category>.md
Test Case 1(TC1)
…
Test Case 3 (TCN)
Test Case 1
…
Test Case 3
⟨CG⟩
.json
compute CG
⟨Project⟩
.jar
⟨Features &
Locations⟩
.json
⟨CG⟩
.json
compute CG
run Hermes
Infrastructure used for computing the prevalence of features in
real projects.

Judge’s Overview
.jar
⟨Advanced
Test Case⟩
.jar
compile test cases
AllTestCases
<Test Fixtures
Category>.md
Test Case 1(TC1)
…
Test Case 3 (TCN)
Test Case 1
…
Test Case 3
⟨CG⟩
.json
compute CG
⟨Project⟩
.jar
⟨Features &
Locations⟩
.json
⟨CG⟩
.json
compute CG
run Hermes
real projects.
⟨Potential
Sources of
Unsoundness⟩
.tsv
compute suitability of CG algo.
use the
respective
CG profile

Test Suite
.jar
⟨Advanced
Test Case⟩
.jar
compile test cases
AllTestCases
<Test Fixtures
Category>.md
Test Case 1(TC1)
…
Test Case 3 (TCN)
Test Case 1
…
Test Case 3
⟨CG⟩
.json
compute CG
⟨Project⟩
.jar
⟨Features &
Locations⟩
.json
⟨CG⟩
.json
compute CG
run Hermes
real projects.
⟨Potential
Sources of
Unsoundness⟩
.tsv
use the
respective
CG profile

Test Suite
.jar
⟨Advanced
Test Case⟩
.jar
compile test cases
AllTestCases
<Test Fixtures
Category>.md
Test Case 1(TC1)
…
Test Case 3 (TCN)
Test Case 1
…
Test Case 3
⟨CG⟩
.json
compute CG
⟨Project⟩
.jar
⟨Features &
Locations⟩
.json
⟨CG⟩
.json
compute CG
run Hermes
real projects.
⟨Potential
Sources of
Unsoundness⟩
.tsv
use the
respective
CG profile
• Each category has:

• a description

• multiple test cases

• Each test case has:

• a scenario description

• unique id

• the test code

• excepted calls

• Available annotations:

• CallSite

• IndirectCall

Test Suite
Language Features

• Static Initializer

• Polymorphic Calls

• Java 8 Polymorphic Calls

• Lambdas/Method References

• Signature Polymorphic Methods

• Non-Java bytecode

• …
!6
APIs

• Reﬂection

• Unsafe

• Serialization

• Method Handles

• Dynamic Proxies

• Classloading

• …

Computing the Algorithms’
Proﬁle
!7
.jar
⟨Advanced
Test Case⟩
.jar
compile test cases
AllTestCases
<Test Fixtures
Category>.md
Test Case 1(TC1)
…
Test Case 3 (TCN)
Test Case 1
…
Test Case 3
⟨CG⟩
.json
compute CG
⟨Project⟩
.jar
⟨Features &
Locations⟩
.json
⟨CG⟩
.json
compute CG
run Hermes
real projects.
⟨Potential
Sources of
Unsoundness⟩
.tsv
use the
respective
CG profile

.jar
⟨Advanced
Test Case⟩
.jar
compile test cases
AllTestCases
<Test Fixtures
Category>.md
Test Case 1(TC1)
…
Test Case 3 (TCN)
Test Case 1
…
Test Case 3
⟨CG⟩
.json
compute CG
⟨Project⟩
.jar
⟨Features &
Locations⟩
.json
⟨CG⟩
.json
compute CG
run Hermes
real projects.
⟨Potential
Sources of
Unsoundness⟩
.tsv
use the
respective
CG profile
Finding Features in
Real Code
!8

.jar
⟨Advanced
Test Case⟩
.jar
compile test cases
AllTestCases
<Test Fixtures
Category>.md
Test Case 1(TC1)
…
Test Case 3 (TCN)
Test Case 1
…
Test Case 3
⟨CG⟩
.json
compute CG
⟨Project⟩
.jar
⟨Features &
Locations⟩
.json
⟨CG⟩
.json
compute CG
run Hermes
real projects.
⟨Potential
Sources of
Unsoundness⟩
.tsv
use the
respective
CG profile
Finding Features in
Real Code
!8
[1] Reif, Michael et al. Hermes: assessment and creation of eﬀective test corpora. SOAP ’17. ACM, 43–48.
• We used Hermes [1], a static analysis code query
infrastructure

• Each query is an analysis that checks if a speciﬁc feature
is found in a given code base

• We developed 15 Hermes queries to derive 107 Hermes
features and map the derived features to the test case ids

• All queries perform a most-conservative intra-procedural
analysis

Potential Sources of
Unsoundness
!9
0✘
Lambda8
(Invokedynamic -
Scala)
Lambda3
(Invokedynamic -
Java ≤ 10)
1✓
… ……
TR1
(Reflection)
2✘
Extensions
Count
3
Supported
by CG(a)
✓
BPC2
(Polymorphic Call)
Features
(Based on
Test Cases)
✘mz
my ✓
mx ✘
✓mu
……
m4 ✓
m3 ✓
m2 ✘
Reached
by CG(a)
✓m1
Name
Methods
Computed Using Feature Queries / Hermes
LibraryCodeApplicationCode
Sourceof
Unsoundness
For Project (p)
ConditionalSource
ofUnsoundness
Extensions
Mapping
.jar
⟨Advanced
Test Case⟩
.jar
compile test cases
AllTestCases
<Test Fixtures
Category>.md
Test Case 1(TC1)
…
Test Case 3 (TCN)
Test Case 1
…
Test Case 3
⟨CG⟩
.json
compute CG
⟨Project⟩
.jar
⟨Features &
Locations⟩
.json
⟨CG⟩
.json
compute CG
run Hermes
real projects.
⟨Potential
Sources of
Unsoundness⟩
.tsv
use the
respective
CG profile
• Sources of Unsoundness
deﬁnitely make the call graph
unsound

• Conditional sources of
Unsoundness might introduce
unsoundness

Research Questions
• RQ1: How prevalent are the language and API features?

• RQ2: How do the frameworks compare to each other?

• RQ3: Which framework is best suited for which kind of
code base?

• RQ4: How much eﬀort is necessary to get a sound call
graph?
!10

Prevalent Language
Features and APIs (RQ1)
• All the API and language features supported by
Java up to version 7 are used widely across all
code bases

• Support for Java 8 is a must, unless analyzing
Android or Clojure code

• Supporting classical Reﬂection and Serialization
is strongly recommended, independent of the
source code’s age

• Support for many features is only required in
speciﬁc scenarios
!11

The Call Graphs’ Feature Support (RQ2)
!12

!12
Standard Java
Features are well-
supported

!12
Java 8 Features
are partially
supported
Standard Java
Features are well-
supported

!12
Java 8 Features
are partially
supported
The JVM is not
fully covered
Standard Java
Features are well-
supported

!12
Java 8 Features
are partially
supported
The JVM is not
fully covered
Standard Java
Features are well-
supported
Reﬂection API
partially
supported

!12
Java 8 Features
are partially
supported
The JVM is not
fully covered
Some APIs and
language features
are unsupported
Standard Java
Features are well-
supported
Reﬂection API
partially
supported

Performance Results (RQ2)
!13
avg. Runtimes
largely differ

Performance Results (RQ2)
!13
avg. Runtimes
largely differ
Reachable Methods vary even for
implementations of the same algorithm
by more than 20x

RTA-Example
!14
void program(boolean condition){
Collection c1 = new LinkedList();
Collection c2;
if(condition){
c2 = new ArrayList();
} else {
c2 = new Vector();
}
c2.add(null);
Collection c3 = new HashSet();
}
• RTA [2] depends on the program’s instantiated
types

• Soot, WALA, and OPAL behave complete
diﬀerently
[2] D. Bacon and P. Sweeney. Fast static analysis of C++ virtual function calls. OOPSLA '96. ACM, 324-341.

RTA-Example
!14
Collection c2;
if(condition){
} else {
c2 = new Vector();
}
c2.add(null);
}
types

diﬀerently
{ LinkedList, ArrayList, Vector, HashSet }

RTA-Example
!14
Collection c2;
if(condition){
} else {
c2 = new Vector();
}
c2.add(null);
}
types

diﬀerently
{ LinkedList, ArrayList, Vector}

RTA-Example
!14
Collection c2;
if(condition){
} else {
c2 = new Vector();
}
c2.add(null);
}
types

diﬀerently
{ArrayList, Vector}{ LinkedList, ArrayList, Vector}

Project-speciﬁc Evaluation
(RQ3)
!15

(RQ3)
!15
Soot supports CSR
but its expensive

(RQ3)
!15
Soot supports CSR
but its expensive
OPAL supports most
features but has the
smallest call graph

(RQ3)
!15
Soot supports CSR
but its expensive
OPAL supports most
smallest call graph
OPAL covers only 47
methods from Xalan
(~0.3%)

(RQ3)
!15
Soot supports CSR
but its expensive
OPAL supports most
smallest call graph
OPAL covers only 47
methods from Xalan
(~0.3%)
Very few call sites
have a huge impact

Is it worth it to do the work
manually? (RQ 4)
• GOAL: Get a reasonably sound call graph

• JVM profiling and TamiFlex [3] as ground truth
!16
[3] Bodden, Eric, et al. Taming Reflection--Static Analysis in the Presence of Reflection and Custom Class Loaders. (2010).
Apply Judge
Inspect Results
Add Entry Points
• Analyzed 10 reflective call sites

• Added 50 entry points

• manual analysis took roughly 90 minutes

• The call graph then covered 91% of all
methods contained in the profile and 121 from
198 reported by TamiFlex

Judge: Identifying, Understanding, and Evaluating Sources of Unsoundness in Call Graphs

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Judge: Identifying, Understanding, and Evaluating Sources of Unsoundness in Call Graphs

Similar to Judge: Identifying, Understanding, and Evaluating Sources of Unsoundness in Call Graphs (20)

Recently uploaded

Recently uploaded (20)

Judge: Identifying, Understanding, and Evaluating Sources of Unsoundness in Call Graphs