Demand-Driven Interprocedural Data Flow Analysis

Demand-Driven Computation of
Inter-procedural Data Flow
Evelyn Duesterwald et al. POPL 1995
Presenter: Min-Yih Hsu

Outline
• Overview
• Illustrating Exhaustive Data-Flow Analysis with Copy Constant
Propagation (CCP)
• Demand-Driven Data-Flow Analysis
• Related Works
• Comments

The over-analysis problem
• Users of a data-flow analysis might only need part of the results.
• Generating superfluous analysis results are called over-analysis.

• Example: In constant propagation, user only asked the constant
value (or non-constant) of variable x at line n.

• Traditional data-flow analysis will give you results of every
variables in each line.

• Traditional data-flow analysis will give you results of every
variables in each line.
• A real-world application: Interactive code editor

CFG
Untouched
Analyzed
Traditional Approach

CFG
Variable Constant
X C1
Y C2
Z C3
At Line n:
Untouched
Analyzed

CFG
Variable Constant
X C1
Y C2
Z C3
At Line n:
Untouched
Analyzed
a.k.a Exhaustive Approach

CFG
Untouched
Analyzed
Demand-Driven Approach

CFG
Variable Constant
X -
Y C2
Z -
At Line n:
Untouched
Analyzed

CFG
Variable Constant
X -
Y C2
Z -
At Line n:
Untouched
Analyzed
Query:
q=<Fy, n>

Illustrating Exhaustive Data-Flow Analysis
w/ Copy Constant Propagation (CCP)

Copy Constant Propagation (CCP)
A lattice L

A lattice L Data ﬂow facts at a given program point
x is a k-tuple of type L. (denoted Lk)
(x)v means data flow fact for variable v

A lattice L Data ﬂow facts at a given program point
x is a k-tuple of type L. (denoted Lk)
(x)v means data flow fact for variable v
If variable v has constant c :

The local (intra-procedural) ﬂow function

Inter-procedural related formulas

Inter-procedural data-ﬂow functions

Inter-procedural data-ﬂow functions
Inter-procedural data-ﬂow results
On program point n

Demand-Driven Data-Flow Analysis

Query (i.e. the “Demand”)
q := <y, n>

q := <y, n>
• y is a set of data flow facts

q := <y, n>
• n is a program point

q := <y, n>
• q is a boolean type result

q := <y, n>
• q is a boolean type result
q tells if y is a safe approximation of the exhaustive data flow facts
on program point n.

An Example Query in CCP*
q := <[a=c], 10>
“Tell me if variable a is equal to constant c at line 10”

Resolving Queries
CFG
Untouched
Analyzed
q=<y, n>

Resolving Queries
CFG
Untouched
Analyzed
q=<y, n>
Find the answer along this path

Resolving Queries
CFG
Untouched
Analyzed
q=<y, n>
Key Idea: Reversed Flow Function!
Find the answer along this path

Reverse Flow Function
If f is meet-distributive…

(It’s easy to show that) fr will be join-distributive!

(It’s easy to show that) fr will be join-distributive!
=> We can use MFP with reversed functions
along the reversed path!

What does answers?

What does answers?
Node n

What does answers?
Node n
What input should I feed in…

What does answers?
Node n
In order to see w=c here?

What does answers?
Node n
Any Integer!

What does answers?
Node n
Impossible
Any Integer!

What does answers?
Node n
Impossible
Any Integer!
Asking u=c instead

Inter-procedural Reverse Flow Function
Additional Function Compositions Properties

+

+
=

Reverse Flow Function - Combining Everything

Reverse Flow Function - Combining Everything
Inter and Intra-procedural data-ﬂow function

Generalizing the CCP Query
q := <[a=c], 10>

q := <[a=c], 10>
What c should we ask?

q := <[a=c], 10>
What c should we ask?
A more useful query:
“Tell me the constant value (if it is) of variable a at line 10.”
q’ := <[a], 10>

Generalized CCP Query - New Reverse Flow Function

Generalized CCP Query - New Reverse Flow Function
Also need to collect the constant value!

Generalized CCP Query - A Simple Example
x = 3
y = 4
a = x
r = a + 1 <[a], n>

Generalized CCP Query - A Simple Example
x = 3
y = 4
a = x
r = a + 1 <[a], n>
Final answer for this query

Takeaways
• Reverse flow function is the core of this algorithm.
• It’s join-distributive property allow us to use MFP with it.

Takeaways
• In the generalized query (for CCP):

Takeaways
• In the generalized query (for CCP):
• Instead of taking its returned value from reverse functions as
the query result, we used it to guide the query propagation
process.

Time and Space Complexity
• The same worst-case complexities as Sharir and Pnueli’s
exhaustive (iterative worklist-based) data-flow analysis*.
• Running Time: O(C x height(L) x |L| x |N|)
• C is the maximum number of call sites, and N is the number of
nodes.
• Space: O(|L| x |N|)
*Sharir, Micha, and Amir Pnueli. Two approaches to interprocedural data flow analysis. New York
University. Courant Institute of Mathematical Sciences. ComputerScience Department, 1978.

• Ryder, Barbara G., and Marvin C. Paull. "Incremental data-flow
analysis algorithms." ACM Transactions on Programming Languages
and Systems (TOPLAS) 10.1 (1988): 1-50.

• Incremental data-flow analysis need to perform full data-flow
analysis on first run. Where over-analysis problem might still
happen.

happen.
• IFDS / IDE

happen.
• IFDS / IDE
• The authors of this paper argued that their work has “less
restrictions on structure of lattice and flow functions”.

happen.
• IFDS / IDE
• The authors of this paper argued that their work has “less
restrictions on structure of lattice and flow functions”.
• e.g. This work can give an approximation even with non-
distributive flow function.

• (IFDS / IDE cont’d)
• IFDS and IDE can not (fully) avoid over-analysis problem.

• e.g. For CCP problem, IFDS / IDE would tell the constant value
(if any) for every variables.

• Strom, Robert E., and Daniel M. Yellin. "Extending typestate
checking using conditional liveness analysis." IEEE Transactions on
Software Engineering 19.5 (1993): 478-485.

• Strom, Robert E., and Daniel M. Yellin. "Extending typestate
checking using conditional liveness analysis." IEEE Transactions on
Software Engineering 19.5 (1993): 478-485.
• Also used the idea of backward and demand-driven program
analysis.

What I Think the Paper Should Organize
(Larger Box == More Important)
The “Normal” Data-Flow Analysis
w/ Boolean Query Result
Generalize To Non-Boolean Query Result
(for CCP)
Supporting Global Variables /
Memory Aliasing

The Paper’s Organization
(Larger Box == More Paragraphs)

Memory Aliasing

Generalize To Non-Boolean Query Result (for CCP)
Memory Aliasing

Other Comments
• The querying mechanism fit really well in Language Server
Protocol (LSP), which is getting more attention now.

Other Comments
• This paper did a pretty nice survey on related works.

Other Comments
• Some notations are inconsistent across paragraphs.

Other Comments
• Some notations are inconsistent across paragraphs.
• Amortized Time Complexity Analysis Please!!!

Summary
• This work used queries to drive the data-flow analysis process to avoid
generating redundant results that would never be used.

Summary
• The algorithm performed a revered data-flow analysis from the point
where users are inquiring.
• The reverse data-flow function played an important role.

Summary
• The query algorithm was augmented to support arbitrary result types.
In addition to the basic boolean type.

Summary
• The query algorithm was augmented to support arbitrary result types.
In addition to the basic boolean type.
• Regarding the time complexity, this algorithm is no worse than the
exhaustive data-flow analysis.
• Just as the incremental data-flow analysis, we hope to see the
amortized time complexity.

Demand-Driven Interprocedural Data Flow Analysis

Recommended

Recommended

More Related Content

Similar to Demand-Driven Interprocedural Data Flow Analysis

Similar to Demand-Driven Interprocedural Data Flow Analysis (20)

More from Min-Yih Hsu

More from Min-Yih Hsu (14)

Recently uploaded

Recently uploaded (20)

Demand-Driven Interprocedural Data Flow Analysis