This paper presents a demand-driven approach to inter-procedural data flow analysis that avoids generating redundant analysis results by only computing the necessary data flow facts needed to answer specific queries. The algorithm performs a reverse data flow analysis starting from the query point using a reverse flow function. This approach can answer queries more efficiently than traditional exhaustive data flow analysis.
5. The over-analysis problem
• Users of a data-flow analysis might only need part of the results.
• Generating superfluous analysis results are called over-analysis.
6. The over-analysis problem
• Users of a data-flow analysis might only need part of the results.
• Generating superfluous analysis results are called over-analysis.
• Example: In constant propagation, user only asked the constant
value (or non-constant) of variable x at line n.
7. The over-analysis problem
• Users of a data-flow analysis might only need part of the results.
• Generating superfluous analysis results are called over-analysis.
• Example: In constant propagation, user only asked the constant
value (or non-constant) of variable x at line n.
• Traditional data-flow analysis will give you results of every
variables in each line.
8. The over-analysis problem
• Users of a data-flow analysis might only need part of the results.
• Generating superfluous analysis results are called over-analysis.
• Example: In constant propagation, user only asked the constant
value (or non-constant) of variable x at line n.
• Traditional data-flow analysis will give you results of every
variables in each line.
• A real-world application: Interactive code editor
23. Copy Constant Propagation (CCP)
A lattice L Data flow facts at a given program point
x is a k-tuple of type L. (denoted Lk)
(x)v means data flow fact for variable v
24. Copy Constant Propagation (CCP)
A lattice L Data flow facts at a given program point
x is a k-tuple of type L. (denoted Lk)
(x)v means data flow fact for variable v
If variable v has constant c :
32. Copy Constant Propagation (CCP)
Inter-procedural related formulas
Inter-procedural data-flow functions
Inter-procedural data-flow results
On program point n
37. Query (i.e. the “Demand”)
q := <y, n>
• y is a set of data flow facts
38. Query (i.e. the “Demand”)
q := <y, n>
• y is a set of data flow facts
• n is a program point
39. Query (i.e. the “Demand”)
q := <y, n>
• y is a set of data flow facts
• n is a program point
• q is a boolean type result
40. Query (i.e. the “Demand”)
q := <y, n>
• y is a set of data flow facts
• n is a program point
• q is a boolean type result
q tells if y is a safe approximation of the exhaustive data flow facts
on program point n.
41. An Example Query in CCP*
q := <[a=c], 10>
“Tell me if variable a is equal to constant c at line 10”
50. Reverse Flow Function
If f is meet-distributive…
(It’s easy to show that) fr will be join-distributive!
51. Reverse Flow Function
If f is meet-distributive…
(It’s easy to show that) fr will be join-distributive!
=> We can use MFP with reversed functions
along the reversed path!
64. Reverse Flow Function - Combining Everything
Inter and Intra-procedural data-flow function
65. Reverse Flow Function - Combining Everything
Inter and Intra-procedural data-flow function
66. Generalizing the CCP Query
q := <[a=c], 10>
“Tell me if variable a is equal to constant c at line 10”
67. Generalizing the CCP Query
q := <[a=c], 10>
“Tell me if variable a is equal to constant c at line 10”
What c should we ask?
68. Generalizing the CCP Query
q := <[a=c], 10>
“Tell me if variable a is equal to constant c at line 10”
What c should we ask?
A more useful query:
“Tell me the constant value (if it is) of variable a at line 10.”
q’ := <[a], 10>
78. Takeaways
• Reverse flow function is the core of this algorithm.
• It’s join-distributive property allow us to use MFP with it.
79. Takeaways
• Reverse flow function is the core of this algorithm.
• It’s join-distributive property allow us to use MFP with it.
• In the generalized query (for CCP):
80. Takeaways
• Reverse flow function is the core of this algorithm.
• It’s join-distributive property allow us to use MFP with it.
• In the generalized query (for CCP):
• Instead of taking its returned value from reverse functions as
the query result, we used it to guide the query propagation
process.
81. Time and Space Complexity
• The same worst-case complexities as Sharir and Pnueli’s
exhaustive (iterative worklist-based) data-flow analysis*.
• Running Time: O(C x height(L) x |L| x |N|)
• C is the maximum number of call sites, and N is the number of
nodes.
• Space: O(|L| x |N|)
*Sharir, Micha, and Amir Pnueli. Two approaches to interprocedural data flow analysis. New York
University. Courant Institute of Mathematical Sciences. ComputerScience Department, 1978.
84. • Ryder, Barbara G., and Marvin C. Paull. "Incremental data-flow
analysis algorithms." ACM Transactions on Programming Languages
and Systems (TOPLAS) 10.1 (1988): 1-50.
85. • Ryder, Barbara G., and Marvin C. Paull. "Incremental data-flow
analysis algorithms." ACM Transactions on Programming Languages
and Systems (TOPLAS) 10.1 (1988): 1-50.
• Incremental data-flow analysis need to perform full data-flow
analysis on first run. Where over-analysis problem might still
happen.
86. • Ryder, Barbara G., and Marvin C. Paull. "Incremental data-flow
analysis algorithms." ACM Transactions on Programming Languages
and Systems (TOPLAS) 10.1 (1988): 1-50.
• Incremental data-flow analysis need to perform full data-flow
analysis on first run. Where over-analysis problem might still
happen.
• IFDS / IDE
87. • Ryder, Barbara G., and Marvin C. Paull. "Incremental data-flow
analysis algorithms." ACM Transactions on Programming Languages
and Systems (TOPLAS) 10.1 (1988): 1-50.
• Incremental data-flow analysis need to perform full data-flow
analysis on first run. Where over-analysis problem might still
happen.
• IFDS / IDE
• The authors of this paper argued that their work has “less
restrictions on structure of lattice and flow functions”.
88. • Ryder, Barbara G., and Marvin C. Paull. "Incremental data-flow
analysis algorithms." ACM Transactions on Programming Languages
and Systems (TOPLAS) 10.1 (1988): 1-50.
• Incremental data-flow analysis need to perform full data-flow
analysis on first run. Where over-analysis problem might still
happen.
• IFDS / IDE
• The authors of this paper argued that their work has “less
restrictions on structure of lattice and flow functions”.
• e.g. This work can give an approximation even with non-
distributive flow function.
91. • (IFDS / IDE cont’d)
• IFDS and IDE can not (fully) avoid over-analysis problem.
92. • (IFDS / IDE cont’d)
• IFDS and IDE can not (fully) avoid over-analysis problem.
• e.g. For CCP problem, IFDS / IDE would tell the constant value
(if any) for every variables.
93. • (IFDS / IDE cont’d)
• IFDS and IDE can not (fully) avoid over-analysis problem.
• e.g. For CCP problem, IFDS / IDE would tell the constant value
(if any) for every variables.
• Strom, Robert E., and Daniel M. Yellin. "Extending typestate
checking using conditional liveness analysis." IEEE Transactions on
Software Engineering 19.5 (1993): 478-485.
94. • (IFDS / IDE cont’d)
• IFDS and IDE can not (fully) avoid over-analysis problem.
• e.g. For CCP problem, IFDS / IDE would tell the constant value
(if any) for every variables.
• Strom, Robert E., and Daniel M. Yellin. "Extending typestate
checking using conditional liveness analysis." IEEE Transactions on
Software Engineering 19.5 (1993): 478-485.
• Also used the idea of backward and demand-driven program
analysis.
96. What I Think the Paper Should Organize
(Larger Box == More Important)
The “Normal” Data-Flow Analysis
Demand-Driven Data-Flow Analysis
w/ Boolean Query Result
Generalize To Non-Boolean Query Result
(for CCP)
Supporting Global Variables /
Memory Aliasing
98. The Paper’s Organization
(Larger Box == More Paragraphs)
The “Normal” Data-Flow Analysis
Demand-Driven Data-Flow Analysis
w/ Boolean Query Result
99. The Paper’s Organization
(Larger Box == More Paragraphs)
The “Normal” Data-Flow Analysis
Supporting Global Variables /
Memory Aliasing
Demand-Driven Data-Flow Analysis
w/ Boolean Query Result
100. The Paper’s Organization
(Larger Box == More Paragraphs)
The “Normal” Data-Flow Analysis
Generalize To Non-Boolean Query Result (for CCP)
Supporting Global Variables /
Memory Aliasing
Demand-Driven Data-Flow Analysis
w/ Boolean Query Result
101. The Paper’s Organization
(Larger Box == More Paragraphs)
The “Normal” Data-Flow Analysis
Generalize To Non-Boolean Query Result (for CCP)
Supporting Global Variables /
Memory Aliasing
Demand-Driven Data-Flow Analysis
w/ Boolean Query Result
103. Other Comments
• The querying mechanism fit really well in Language Server
Protocol (LSP), which is getting more attention now.
104. Other Comments
• The querying mechanism fit really well in Language Server
Protocol (LSP), which is getting more attention now.
• This paper did a pretty nice survey on related works.
105. Other Comments
• The querying mechanism fit really well in Language Server
Protocol (LSP), which is getting more attention now.
• This paper did a pretty nice survey on related works.
• Some notations are inconsistent across paragraphs.
106. Other Comments
• The querying mechanism fit really well in Language Server
Protocol (LSP), which is getting more attention now.
• This paper did a pretty nice survey on related works.
• Some notations are inconsistent across paragraphs.
• Amortized Time Complexity Analysis Please!!!
108. Summary
• This work used queries to drive the data-flow analysis process to avoid
generating redundant results that would never be used.
109. Summary
• This work used queries to drive the data-flow analysis process to avoid
generating redundant results that would never be used.
• The algorithm performed a revered data-flow analysis from the point
where users are inquiring.
• The reverse data-flow function played an important role.
110. Summary
• This work used queries to drive the data-flow analysis process to avoid
generating redundant results that would never be used.
• The algorithm performed a revered data-flow analysis from the point
where users are inquiring.
• The reverse data-flow function played an important role.
• The query algorithm was augmented to support arbitrary result types.
In addition to the basic boolean type.
111. Summary
• This work used queries to drive the data-flow analysis process to avoid
generating redundant results that would never be used.
• The algorithm performed a revered data-flow analysis from the point
where users are inquiring.
• The reverse data-flow function played an important role.
• The query algorithm was augmented to support arbitrary result types.
In addition to the basic boolean type.
• Regarding the time complexity, this algorithm is no worse than the
exhaustive data-flow analysis.
• Just as the incremental data-flow analysis, we hope to see the
amortized time complexity.