Instruction Combine in LLVM

Instruction Combine
in LLVM
Kai

Instruction Combines on
Different IR
• instcombine optimization pass
• operate on LLVM IR
• class InstructionCombiningPass : public FunctionPass
• DAGCombiner
• operate on SelectionDAG
• class DAGCombiner
• MachineCombiner
• operate on MachineInstr
• class MachineCombiner : public MachineFunctionPass

instcombine optimization pass
• Remove dead basic block.
• Remove dead instructions.
• Constant fold.
BB #0
BB #1
BB #2 BB #3
BB #4
R = add Y, 1
worklist

instcombine optimization pass
R = add Y, 1
worklist
%ext = sext i1 %x to i32 (put into wordlist)
%add = add i32 %ext, 1
%not = xor i1 %x, true (put into worklist)
%add = zext i1 %not to i32 (replace)
instcombine
• visit##OPCODE to do instruction combine.
• lib/Transforms/InstCombine/
add (sext i1 X), 1 —> zext (not X)

Target Independent Code Generator
SelectionDAG
nodes
DAG combine
Legalize typesDAG combine
Legalize
vectors
Legalize types
DAG combine DAG legalize DAG combine
Instruction
selection
LLVM IR
SelectionDAG
Builder
Machine DAGSchedulerMachineInstr

DAGCombiner
• Combine in target independent rules
• Combine in target dependent rules
• XXXTargetLowering::PerformDAGCombine
• Promote

DAGCombiner
Target Independent Rules
• DAGCombiner::visit(SDNode *N)
• ISD::ADD -> visitADD(N)
• (add c1, c2) -> c1 + c2
• (add x, 0) -> x
• (add (sub c1, A), c2) -> (sub (add c1, c2), A)
• (add (sext i1 X), 1) -> (zext (not i1 X))
• ((0-A) + B) -> (B - A)
• (A + (0-B)) -> (A - B)
• (A + (B-A)) -> B
• …

DAGCombiner
Target Dependent Rules
• virtual SDValue
TargetLowering::PerformDAGCombine 
(SDNode *N, DAGCombinerInfo &DCI) const;
• ARMTargetLowering::PerformDAGCombine(N, DCI)
• ISD::ADD -> PerformADDCombine(N, DCI, Subtarget)
• (add (select cc, 0, c), x) -> (select cc, x, (add x, c))
• These transformations eventually create predicated
instructions.

DAGCombiner
Promote
• PromoteIntBinOp
• ADD/SUB/MUL/AND/OR/XOR
• X86TargetLowering::isTypeDesirableForOp(unsigned Opc, EVT VT)
• On x86 i16 is legal, but undesirable since i16 instruction encodings are longer and some
i16 instructions are slow.
• X86TargetLowering::IsDesirableToPromoteOp(SDValue Op, EVT &PVT)
• Return desirable promoted type in PVT.
• DAG.getNode(ISD::TRUNCATE, DL, VT, DAG.getNode(Opc, DL, PVT, NN0, NN1))
• NN0 = promoted operand 0
• NN1 = promoted operand 1
• PromoteIntShiftOp
• SHL/SRA/SRL
• DAG.getNode(ISD::TRUNCATE, DL, VT, DAG.getNode(Opc, DL, PVT, N0, N1))
• N0 = promoted operand 0

DAGCombiner
Promote
• PromoteExtend
• SIGN_EXTEND/ZERO_EXTEND/ANY_EXTEND
• (aext (aext x)) -> (aext x)
• (aext (zext x)) -> (zext x)
• (aext (sext x)) -> (sext x)
• PromoteLoad
• LOAD
• TRUNCATE + SEXTLOAD/ZEXTLOAD

MachineCombiner
2014-08-03 Gerolf Hoﬂehner <ghoﬂehner@apple.com>
MachineCombiner Pass for selecting faster instruction
sequence - target independent framework
When the DAGcombiner selects instruction sequences
it could increase the critical path or resource len.
For example, on arm64 there are multiply-accumulate instructions (madd,
msub). If e.g. the equivalent multiply-add sequence is not on the
crictial path it makes sense to select it instead of the combined,
single accumulate instruction (madd/msub). The reason is that the
conversion from add+mul to the madd could lengthen the critical path
by the latency of the multiply.
But the DAGCombiner would always combine and select the madd/msub
instruction.
This patch uses machine trace metrics to estimate critical path length
and resource length of an original instruction sequence vs a combined
instruction sequence and picks the faster code based on its estimates.
https://reviews.llvm.org/rL214666

2014-08-07 Gerolf Hoflehner
e4fa341 MachineCombiner Pass for selecting faster instruction
sequence on AArch64
2015-06-10 Sanjay Patel
c826b54 [x86] Add a reassociation optimization to increase ILP
via the MachineCombiner pass
2015-07-15 Hal Finkel
8913d18 [PowerPC] Use the MachineCombiner to reassociate fadd/
fmul
2015-09-21 15:09 Chad Rosier
c5d4530 [Machine Combiner] Refactor machine reassociation code
to be target-independent.
MachineCombiner

• Only combine a sequence of instructions when this neither
lengthens the critical path nor increase resource pressure.
• When optimizing for code size always combine when the new
sequence is shorter.
• bool TargetInstrInfo::getMachineCombinerPatterns(MI, Patterns)
• Pattern should be sorted in priority order since the pattern
evaluator stops checking as soon as it ﬁnds a faster sequence.
• void TargetInstrInfo::genAlternativeCodeSequence(MI, Pattern,
InsInstrs, DelInstr, InstrIdxForVirtReg)
• When getMachineCombinerPatterns() ﬁnds patterns, this
function generates the instructions that could replace the
original code sequence.
MachineCombiner

MachineCombiner
start
MBB in MF
end
MI in MBB
getMachineCombiner
Patterns()
P in
Patterns
genAlternativeCodeS
equence()
improve
throughput in
loop
improve
code size
improve
critical path
replace code
sequence
delete InsInstrs
TRUEFALSE TRUE
FALSE
TRUE
FALSE
TRUETRUETRUE
FALSEFALSEFALSE

Machine Combiner Patterns
Default Patterns
A = ? op ?
B = A op X
C = B op Y
A = ? op ?
B’= X op Y
C = A op B’
Breaking the dependency between A and B, allowing them to be
executed in parallel instead of depending on each other.
Y = ? op2 ? (MI2)
. . .
A = ? opx ?
B = A op1 X (MI1) (B has only one use)
C = B op Y (ROOT) (op is associable)
if op1 != op and op2 == op:
C = Y op B
patterns = {REASSOC_AX_YB, REASSOC_XA_YB}
else:
C = B op Y
patterns = {REASSOC_AX_BY, REASSOC_XA_BY}

Machine Combiner Patterns
AArch64
MADDW rs, rn, rm, WZR (rs has only one use)
ADDW rt, rs, rp
MADDW rt, rn, rm, rp
Find instructions that can be turned into madd.

Instruction Combine in LLVM

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Instruction Combine in LLVM

Similar to Instruction Combine in LLVM (20)

More from Wang Hsiangkai

More from Wang Hsiangkai (9)

Recently uploaded

Recently uploaded (20)

Instruction Combine in LLVM