2. Value Numbering
I Assign value numbers to expressions
I Expressions that produce the same value should have the same
value number
I Usually achieved by hashing of simplified and canonicalized
expressions with operands replaced by their value number
3. Value Numbering in GCC
Multiple value numbering implementations and their main users
I RTL CSE (cselib)
I RTL PRE
I GIMPLE SSA DOM (scoped tables)
I GIMPLE SSA FRE/PRE (RPO VN)
I simpler forms of VN in CCP and copy propagation
4. Common Subexpression Elimination
For each statement
I try to simplify the computed expression using value numbers of
the operands
I lookup value number of the simplified expression
I if found and a register with that value is available, replace the
expression with the register or constant
I if not found, record a new value number for it and make it
available in the destination receiving the value of the expression
5. Availability
Different ways to track, update and query availability of a so called
leader for a value number
I with a DOM walk a value to leader map can be kept
up-to-date with an unwind stack
I the RPO VN walk records a list of leaders for each value that
can be unwound when iterating and otherwise queried with
dominator checks
6. Availability and expression simplification
I use match.pd based simplification
I value expression operands get substituted with their leaders
I allows to keep flow-sensitive info like ranges
8. Memory Expressions
I memory state is part of hashing, the current .MEM_n virtual
definition is used
I at lookup time walk the virtual SSA use->def chains, skip
clobbers that do not alias and perform lookups with the
previous memory state
I fancy tricks during walking
I memory to memory copies
I pieces from larger entities
I larger objects formed from smaller entities
I memory handling consumes the majority of compile time
9. Why RPO VN
I SSA SCC VN
I reduces what to iterate
I difficult to mate with CFG: not executable parts, predication,
equivalences, region
I RPO VN
I iteration more costly
I maps to the CFG, allows for flow-sensitive optimizations easily
I allows region-based operation
10. RPO VN Operation Modes
I can operate with different effort for memory handling
I can do optimistic, iterating VN with elimination done after the
fact
I can do non-iterating VN with immediate elimination
I can operate on the whole function or a single entry, multiple
exit region
12. Iteration scheme
I SSA SCC based VN iterates SSA SCCs until nothing changes
I RPO VN iterates CFG cycles
I rev_post_order_and_mark_dfs_back_seme can compute a
RPO with CFG cycles adjacent and their extent in the RPO
array recorded
I handles irreducible regions, loop info would not
I optimal regions for iteration
I avoid iteration when possible, do not iterate until nothing
changes
I unwind cost to the iteration point linear with the amound of
things to undo (expression hashes, availability)
I iteration itself is O(n * loop-depth), inner cycles are iterated
fully before iterating outer cycles
13. Non-iterative mode
I Greedy walk along edges discovered as executable, but
enforcing RPO visiting of reachable blocks.
I Predecessors not visited and reachable from blocks later in
RPO have to be conservatively assumed reachable.
I Handles PHIs with unreachable incoming non-back edges
optimally
14. RPO VN as Utility
RPO VN was designed to be usable on small regions of a function
without much overhead when doing that very often and with being
much cheaper than a pass over the whole function.
I loop unrolling applies CSE on unrolled bodies before trying to
unroll the containing loop
I loop if-conversion applies CSE to optimize predicates
I unroll-and-jam applies CSE to leverage cross loop redundancies
I uninit analysis uses RPO VN to compute basic block
reachability without performing actual CSE
16. RPO VN Utility Efficiency
Non-iterating region-based VN with or without elimination was
designed to be efficient
I startup cost linear in the size of the region
I performing RPO VN with VN_NOWALK, without iteration
and elimination on each basic-block individually vs. performing
a single RPO VN on the whole function is only around 15%
slower for cc1files with insn-attrtab.i being the outlier at 280%
I more elaborate memory handling or doing elimination does not
allow for an apples vs. apples comparison
I while doing CSE on the whole function might perform more
optimizations doing that should never be faster than only doing
CSE on the regions a pass performed a transformation on
17. TODO
I experiment with using ranger instead of the ad-hoc predication
we have
I review equivalence tracking changes
I think of a cheaper way to do “iteration”
I we have simple DCE with a SSA worklist, need region
DCE/DSE