SlideShare a Scribd company logo
Instruction Combine
in LLVM
Kai
Instruction Combines on
Different IR
• instcombine optimization pass
• operate on LLVM IR
• class InstructionCombiningPass : public FunctionPass
• DAGCombiner
• operate on SelectionDAG
• class DAGCombiner
• MachineCombiner
• operate on MachineInstr
• class MachineCombiner : public MachineFunctionPass
instcombine optimization pass
• Remove dead basic block.
• Remove dead instructions.
• Constant fold.
BB #0
BB #1
BB #2 BB #3
BB #4
R = add Y, 1
worklist
instcombine optimization pass
R = add Y, 1
worklist
%ext = sext i1 %x to i32 (put into wordlist)
%add = add i32 %ext, 1
%not = xor i1 %x, true (put into worklist)
%add = zext i1 %not to i32 (replace)
instcombine
• visit##OPCODE to do instruction combine.
• lib/Transforms/InstCombine/
add (sext i1 X), 1 —> zext (not X)
Target Independent Code Generator
SelectionDAG
nodes
DAG combine
Legalize typesDAG combine
Legalize
vectors
Legalize types
DAG combine DAG legalize DAG combine
Instruction
selection
LLVM IR
SelectionDAG
Builder
Machine DAGSchedulerMachineInstr
DAGCombiner
• Combine in target independent rules
• Combine in target dependent rules
• XXXTargetLowering::PerformDAGCombine
• Promote
DAGCombiner
Target Independent Rules
• DAGCombiner::visit(SDNode *N)
• ISD::ADD -> visitADD(N)
• (add c1, c2) -> c1 + c2
• (add x, 0) -> x
• (add (sub c1, A), c2) -> (sub (add c1, c2), A)
• (add (sext i1 X), 1) -> (zext (not i1 X))
• ((0-A) + B) -> (B - A)
• (A + (0-B)) -> (A - B)
• (A + (B-A)) -> B
• …
DAGCombiner
Target Dependent Rules
• virtual SDValue
TargetLowering::PerformDAGCombine

(SDNode *N, DAGCombinerInfo &DCI) const;
• ARMTargetLowering::PerformDAGCombine(N, DCI)
• ISD::ADD -> PerformADDCombine(N, DCI, Subtarget)
• (add (select cc, 0, c), x) -> (select cc, x, (add x, c))
• These transformations eventually create predicated
instructions.
DAGCombiner
Promote
• PromoteIntBinOp
• ADD/SUB/MUL/AND/OR/XOR
• X86TargetLowering::isTypeDesirableForOp(unsigned Opc, EVT VT)
• On x86 i16 is legal, but undesirable since i16 instruction encodings are longer and some
i16 instructions are slow.
• X86TargetLowering::IsDesirableToPromoteOp(SDValue Op, EVT &PVT)
• Return desirable promoted type in PVT.
• DAG.getNode(ISD::TRUNCATE, DL, VT, DAG.getNode(Opc, DL, PVT, NN0, NN1))
• NN0 = promoted operand 0
• NN1 = promoted operand 1
• PromoteIntShiftOp
• SHL/SRA/SRL
• DAG.getNode(ISD::TRUNCATE, DL, VT, DAG.getNode(Opc, DL, PVT, N0, N1))
• N0 = promoted operand 0
DAGCombiner
Promote
• PromoteExtend
• SIGN_EXTEND/ZERO_EXTEND/ANY_EXTEND
• (aext (aext x)) -> (aext x)
• (aext (zext x)) -> (zext x)
• (aext (sext x)) -> (sext x)
• PromoteLoad
• LOAD
• TRUNCATE + SEXTLOAD/ZEXTLOAD
MachineCombiner
2014-08-03 Gerolf Hoflehner <ghoflehner@apple.com>
MachineCombiner Pass for selecting faster instruction
sequence - target independent framework
When the DAGcombiner selects instruction sequences
it could increase the critical path or resource len.
For example, on arm64 there are multiply-accumulate instructions (madd,
msub). If e.g. the equivalent multiply-add sequence is not on the
crictial path it makes sense to select it instead of the combined,
single accumulate instruction (madd/msub). The reason is that the
conversion from add+mul to the madd could lengthen the critical path
by the latency of the multiply.
But the DAGCombiner would always combine and select the madd/msub
instruction.
This patch uses machine trace metrics to estimate critical path length
and resource length of an original instruction sequence vs a combined
instruction sequence and picks the faster code based on its estimates.
https://reviews.llvm.org/rL214666
2014-08-07 Gerolf Hoflehner
e4fa341 MachineCombiner Pass for selecting faster instruction
sequence on AArch64
2015-06-10 Sanjay Patel
c826b54 [x86] Add a reassociation optimization to increase ILP
via the MachineCombiner pass
2015-07-15 Hal Finkel
8913d18 [PowerPC] Use the MachineCombiner to reassociate fadd/
fmul
2015-09-21 15:09 Chad Rosier
c5d4530 [Machine Combiner] Refactor machine reassociation code
to be target-independent.
MachineCombiner
• Only combine a sequence of instructions when this neither
lengthens the critical path nor increase resource pressure.
• When optimizing for code size always combine when the new
sequence is shorter.
• bool TargetInstrInfo::getMachineCombinerPatterns(MI, Patterns)
• Pattern should be sorted in priority order since the pattern
evaluator stops checking as soon as it finds a faster sequence.
• void TargetInstrInfo::genAlternativeCodeSequence(MI, Pattern,
InsInstrs, DelInstr, InstrIdxForVirtReg)
• When getMachineCombinerPatterns() finds patterns, this
function generates the instructions that could replace the
original code sequence.
MachineCombiner
MachineCombiner
start
MBB in MF
end
MI in MBB
getMachineCombiner
Patterns()
P in
Patterns
genAlternativeCodeS
equence()
improve
throughput in
loop
improve
code size
improve
critical path
replace code
sequence
delete InsInstrs
TRUEFALSE TRUE
FALSE
TRUE
FALSE
TRUETRUETRUE
FALSEFALSEFALSE
Machine Combiner Patterns
Default Patterns
A = ? op ?
B = A op X
C = B op Y
A = ? op ?
B’= X op Y
C = A op B’
Breaking the dependency between A and B, allowing them to be
executed in parallel instead of depending on each other.
Y = ? op2 ? (MI2)
. . .
A = ? opx ?
B = A op1 X (MI1) (B has only one use)
C = B op Y (ROOT) (op is associable)
if op1 != op and op2 == op:
C = Y op B
patterns = {REASSOC_AX_YB, REASSOC_XA_YB}
else:
C = B op Y
patterns = {REASSOC_AX_BY, REASSOC_XA_BY}
Machine Combiner Patterns
AArch64
MADDW rs, rn, rm, WZR (rs has only one use)
ADDW rt, rs, rp
MADDW rt, rn, rm, rp
Find instructions that can be turned into madd.

More Related Content

What's hot

Address/Thread/Memory Sanitizer
Address/Thread/Memory SanitizerAddress/Thread/Memory Sanitizer
Address/Thread/Memory Sanitizer
Platonov Sergey
 
Integrated Register Allocation introduction
Integrated Register Allocation introductionIntegrated Register Allocation introduction
Integrated Register Allocation introduction
Shiva Chen
 
ARM CPUにおけるSIMDを用いた高速計算入門
ARM CPUにおけるSIMDを用いた高速計算入門ARM CPUにおけるSIMDを用いた高速計算入門
ARM CPUにおけるSIMDを用いた高速計算入門
Fixstars Corporation
 
JVM JIT-compiler overview @ JavaOne Moscow 2013
JVM JIT-compiler overview @ JavaOne Moscow 2013JVM JIT-compiler overview @ JavaOne Moscow 2013
JVM JIT-compiler overview @ JavaOne Moscow 2013Vladimir Ivanov
 
How to write a TableGen backend
How to write a TableGen backendHow to write a TableGen backend
How to write a TableGen backend
Min-Yih Hsu
 
ELFの動的リンク
ELFの動的リンクELFの動的リンク
ELFの動的リンク
7shi
 
GDB Rocks!
GDB Rocks!GDB Rocks!
GDB Rocks!
Kent Chen
 
Interrupts on xv6
Interrupts on xv6Interrupts on xv6
Interrupts on xv6
Takuya ASADA
 
技術紹介: S2E: Selective Symbolic Execution Engine
技術紹介: S2E: Selective Symbolic Execution Engine技術紹介: S2E: Selective Symbolic Execution Engine
技術紹介: S2E: Selective Symbolic Execution Engine
Asuka Nakajima
 
VerilatorとSystemC
VerilatorとSystemCVerilatorとSystemC
VerilatorとSystemC
Mr. Vengineer
 
Introduction to Polyhedral Compilation
Introduction to Polyhedral CompilationIntroduction to Polyhedral Compilation
Introduction to Polyhedral Compilation
Akihiro Hayashi
 
A look into the sanitizer family (ASAN & UBSAN) by Akul Pillai
A look into the sanitizer family (ASAN & UBSAN) by Akul PillaiA look into the sanitizer family (ASAN & UBSAN) by Akul Pillai
A look into the sanitizer family (ASAN & UBSAN) by Akul Pillai
Cysinfo Cyber Security Community
 
Vivado hlsのシミュレーションとhlsストリーム
Vivado hlsのシミュレーションとhlsストリームVivado hlsのシミュレーションとhlsストリーム
Vivado hlsのシミュレーションとhlsストリーム
marsee101
 
Introduction to the LLVM Compiler System
Introduction to the LLVM  Compiler SystemIntroduction to the LLVM  Compiler System
Introduction to the LLVM Compiler System
zionsaint
 
from Binary to Binary: How Qemu Works
from Binary to Binary: How Qemu Worksfrom Binary to Binary: How Qemu Works
from Binary to Binary: How Qemu Works
Zhen Wei
 
淺談編譯器最佳化技術
淺談編譯器最佳化技術淺談編譯器最佳化技術
淺談編譯器最佳化技術
Kito Cheng
 
F#入門 ~関数プログラミングとは何か~
F#入門 ~関数プログラミングとは何か~F#入門 ~関数プログラミングとは何か~
F#入門 ~関数プログラミングとは何か~
Nobuhisa Koizumi
 
組み込みでこそC++を使う10の理由
組み込みでこそC++を使う10の理由組み込みでこそC++を使う10の理由
組み込みでこそC++を使う10の理由kikairoya
 

What's hot (20)

Address/Thread/Memory Sanitizer
Address/Thread/Memory SanitizerAddress/Thread/Memory Sanitizer
Address/Thread/Memory Sanitizer
 
LLVM
LLVMLLVM
LLVM
 
Integrated Register Allocation introduction
Integrated Register Allocation introductionIntegrated Register Allocation introduction
Integrated Register Allocation introduction
 
ARM CPUにおけるSIMDを用いた高速計算入門
ARM CPUにおけるSIMDを用いた高速計算入門ARM CPUにおけるSIMDを用いた高速計算入門
ARM CPUにおけるSIMDを用いた高速計算入門
 
JVM JIT-compiler overview @ JavaOne Moscow 2013
JVM JIT-compiler overview @ JavaOne Moscow 2013JVM JIT-compiler overview @ JavaOne Moscow 2013
JVM JIT-compiler overview @ JavaOne Moscow 2013
 
How to write a TableGen backend
How to write a TableGen backendHow to write a TableGen backend
How to write a TableGen backend
 
ELFの動的リンク
ELFの動的リンクELFの動的リンク
ELFの動的リンク
 
GDB Rocks!
GDB Rocks!GDB Rocks!
GDB Rocks!
 
Interrupts on xv6
Interrupts on xv6Interrupts on xv6
Interrupts on xv6
 
技術紹介: S2E: Selective Symbolic Execution Engine
技術紹介: S2E: Selective Symbolic Execution Engine技術紹介: S2E: Selective Symbolic Execution Engine
技術紹介: S2E: Selective Symbolic Execution Engine
 
VerilatorとSystemC
VerilatorとSystemCVerilatorとSystemC
VerilatorとSystemC
 
Introduction to Polyhedral Compilation
Introduction to Polyhedral CompilationIntroduction to Polyhedral Compilation
Introduction to Polyhedral Compilation
 
A look into the sanitizer family (ASAN & UBSAN) by Akul Pillai
A look into the sanitizer family (ASAN & UBSAN) by Akul PillaiA look into the sanitizer family (ASAN & UBSAN) by Akul Pillai
A look into the sanitizer family (ASAN & UBSAN) by Akul Pillai
 
Vivado hlsのシミュレーションとhlsストリーム
Vivado hlsのシミュレーションとhlsストリームVivado hlsのシミュレーションとhlsストリーム
Vivado hlsのシミュレーションとhlsストリーム
 
Introduction to the LLVM Compiler System
Introduction to the LLVM  Compiler SystemIntroduction to the LLVM  Compiler System
Introduction to the LLVM Compiler System
 
from Binary to Binary: How Qemu Works
from Binary to Binary: How Qemu Worksfrom Binary to Binary: How Qemu Works
from Binary to Binary: How Qemu Works
 
淺談編譯器最佳化技術
淺談編譯器最佳化技術淺談編譯器最佳化技術
淺談編譯器最佳化技術
 
F#入門 ~関数プログラミングとは何か~
F#入門 ~関数プログラミングとは何か~F#入門 ~関数プログラミングとは何か~
F#入門 ~関数プログラミングとは何か~
 
Gcc porting
Gcc portingGcc porting
Gcc porting
 
組み込みでこそC++を使う10の理由
組み込みでこそC++を使う10の理由組み込みでこそC++を使う10の理由
組み込みでこそC++を使う10の理由
 

Similar to Instruction Combine in LLVM

A taste of GlobalISel
A taste of GlobalISelA taste of GlobalISel
A taste of GlobalISel
Igalia
 
JVM Memory Model - Yoav Abrahami, Wix
JVM Memory Model - Yoav Abrahami, WixJVM Memory Model - Yoav Abrahami, Wix
JVM Memory Model - Yoav Abrahami, Wix
Codemotion Tel Aviv
 
Java Jit. Compilation and optimization by Andrey Kovalenko
Java Jit. Compilation and optimization by Andrey KovalenkoJava Jit. Compilation and optimization by Andrey Kovalenko
Java Jit. Compilation and optimization by Andrey Kovalenko
Valeriia Maliarenko
 
Дмитрий Вовк: Векторизация кода под мобильные платформы
Дмитрий Вовк: Векторизация кода под мобильные платформыДмитрий Вовк: Векторизация кода под мобильные платформы
Дмитрий Вовк: Векторизация кода под мобильные платформы
DevGAMM Conference
 
How Triton can help to reverse virtual machine based software protections
How Triton can help to reverse virtual machine based software protectionsHow Triton can help to reverse virtual machine based software protections
How Triton can help to reverse virtual machine based software protections
Jonathan Salwan
 
running stable diffusion on android
running stable diffusion on androidrunning stable diffusion on android
running stable diffusion on android
Koan-Sin Tan
 
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerPragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Marina Kolpakova
 
WCTF 2018 binja Editorial
WCTF 2018 binja EditorialWCTF 2018 binja Editorial
WCTF 2018 binja Editorial
Charo_IT
 
GCC Summit 2010
GCC Summit 2010GCC Summit 2010
GCC Summit 2010
regehr
 
Eclipse Day India 2015 - Java bytecode analysis and JIT
Eclipse Day India 2015 - Java bytecode analysis and JITEclipse Day India 2015 - Java bytecode analysis and JIT
Eclipse Day India 2015 - Java bytecode analysis and JIT
Eclipse Day India
 
Sista: Improving Cog’s JIT performance
Sista: Improving Cog’s JIT performanceSista: Improving Cog’s JIT performance
Sista: Improving Cog’s JIT performance
ESUG
 
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...Positive Hack Days
 
C++ unit-1-part-11
C++ unit-1-part-11C++ unit-1-part-11
C++ unit-1-part-11
Jadavsejal
 
Chap7 slides
Chap7 slidesChap7 slides
Chap7 slides
BaliThorat1
 
Parallel program design
Parallel program designParallel program design
Parallel program design
ZongYing Lyu
 
Jvm memory model
Jvm memory modelJvm memory model
Jvm memory model
Yoav Avrahami
 
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIWLec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
Hsien-Hsin Sean Lee, Ph.D.
 
Monkey-patching in Python: a magic trick or a powerful tool?
Monkey-patching in Python: a magic trick or a powerful tool?Monkey-patching in Python: a magic trick or a powerful tool?
Monkey-patching in Python: a magic trick or a powerful tool?
Elizaveta Shashkova
 
20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx
20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx
20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx
eugeniadean34240
 
Presentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel ProgrammingPresentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel Programming
Vengada Karthik Rangaraju
 

Similar to Instruction Combine in LLVM (20)

A taste of GlobalISel
A taste of GlobalISelA taste of GlobalISel
A taste of GlobalISel
 
JVM Memory Model - Yoav Abrahami, Wix
JVM Memory Model - Yoav Abrahami, WixJVM Memory Model - Yoav Abrahami, Wix
JVM Memory Model - Yoav Abrahami, Wix
 
Java Jit. Compilation and optimization by Andrey Kovalenko
Java Jit. Compilation and optimization by Andrey KovalenkoJava Jit. Compilation and optimization by Andrey Kovalenko
Java Jit. Compilation and optimization by Andrey Kovalenko
 
Дмитрий Вовк: Векторизация кода под мобильные платформы
Дмитрий Вовк: Векторизация кода под мобильные платформыДмитрий Вовк: Векторизация кода под мобильные платформы
Дмитрий Вовк: Векторизация кода под мобильные платформы
 
How Triton can help to reverse virtual machine based software protections
How Triton can help to reverse virtual machine based software protectionsHow Triton can help to reverse virtual machine based software protections
How Triton can help to reverse virtual machine based software protections
 
running stable diffusion on android
running stable diffusion on androidrunning stable diffusion on android
running stable diffusion on android
 
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerPragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
 
WCTF 2018 binja Editorial
WCTF 2018 binja EditorialWCTF 2018 binja Editorial
WCTF 2018 binja Editorial
 
GCC Summit 2010
GCC Summit 2010GCC Summit 2010
GCC Summit 2010
 
Eclipse Day India 2015 - Java bytecode analysis and JIT
Eclipse Day India 2015 - Java bytecode analysis and JITEclipse Day India 2015 - Java bytecode analysis and JIT
Eclipse Day India 2015 - Java bytecode analysis and JIT
 
Sista: Improving Cog’s JIT performance
Sista: Improving Cog’s JIT performanceSista: Improving Cog’s JIT performance
Sista: Improving Cog’s JIT performance
 
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
 
C++ unit-1-part-11
C++ unit-1-part-11C++ unit-1-part-11
C++ unit-1-part-11
 
Chap7 slides
Chap7 slidesChap7 slides
Chap7 slides
 
Parallel program design
Parallel program designParallel program design
Parallel program design
 
Jvm memory model
Jvm memory modelJvm memory model
Jvm memory model
 
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIWLec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
 
Monkey-patching in Python: a magic trick or a powerful tool?
Monkey-patching in Python: a magic trick or a powerful tool?Monkey-patching in Python: a magic trick or a powerful tool?
Monkey-patching in Python: a magic trick or a powerful tool?
 
20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx
20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx
20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx
 
Presentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel ProgrammingPresentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel Programming
 

More from Wang Hsiangkai

Debug Line Issues After Relaxation.
Debug Line Issues After Relaxation.Debug Line Issues After Relaxation.
Debug Line Issues After Relaxation.
Wang Hsiangkai
 
Machine Trace Metrics
Machine Trace MetricsMachine Trace Metrics
Machine Trace Metrics
Wang Hsiangkai
 
LTO plugin
LTO pluginLTO plugin
LTO plugin
Wang Hsiangkai
 
Something About Dynamic Linking
Something About Dynamic LinkingSomething About Dynamic Linking
Something About Dynamic Linking
Wang Hsiangkai
 
Effective Modern C++
Effective Modern C++Effective Modern C++
Effective Modern C++
Wang Hsiangkai
 
GCC GENERIC
GCC GENERICGCC GENERIC
GCC GENERIC
Wang Hsiangkai
 
Perf File Format
Perf File FormatPerf File Format
Perf File Format
Wang Hsiangkai
 
Introduction to Perf
Introduction to PerfIntroduction to Perf
Introduction to Perf
Wang Hsiangkai
 
SSA - PHI-functions Placements
SSA - PHI-functions PlacementsSSA - PHI-functions Placements
SSA - PHI-functions Placements
Wang Hsiangkai
 

More from Wang Hsiangkai (9)

Debug Line Issues After Relaxation.
Debug Line Issues After Relaxation.Debug Line Issues After Relaxation.
Debug Line Issues After Relaxation.
 
Machine Trace Metrics
Machine Trace MetricsMachine Trace Metrics
Machine Trace Metrics
 
LTO plugin
LTO pluginLTO plugin
LTO plugin
 
Something About Dynamic Linking
Something About Dynamic LinkingSomething About Dynamic Linking
Something About Dynamic Linking
 
Effective Modern C++
Effective Modern C++Effective Modern C++
Effective Modern C++
 
GCC GENERIC
GCC GENERICGCC GENERIC
GCC GENERIC
 
Perf File Format
Perf File FormatPerf File Format
Perf File Format
 
Introduction to Perf
Introduction to PerfIntroduction to Perf
Introduction to Perf
 
SSA - PHI-functions Placements
SSA - PHI-functions PlacementsSSA - PHI-functions Placements
SSA - PHI-functions Placements
 

Recently uploaded

Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
WSO2
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
kalichargn70th171
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024
Sharepoint Designs
 
Software Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdfSoftware Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdf
MayankTawar1
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Natan Silnitsky
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
XfilesPro
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
Peter Caitens
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
Tier1 app
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Why React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdfWhy React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdf
ayushiqss
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
informapgpstrackings
 

Recently uploaded (20)

Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024
 
Software Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdfSoftware Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdf
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Why React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdfWhy React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdf
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 

Instruction Combine in LLVM

  • 2. Instruction Combines on Different IR • instcombine optimization pass • operate on LLVM IR • class InstructionCombiningPass : public FunctionPass • DAGCombiner • operate on SelectionDAG • class DAGCombiner • MachineCombiner • operate on MachineInstr • class MachineCombiner : public MachineFunctionPass
  • 3. instcombine optimization pass • Remove dead basic block. • Remove dead instructions. • Constant fold. BB #0 BB #1 BB #2 BB #3 BB #4 R = add Y, 1 worklist
  • 4. instcombine optimization pass R = add Y, 1 worklist %ext = sext i1 %x to i32 (put into wordlist) %add = add i32 %ext, 1 %not = xor i1 %x, true (put into worklist) %add = zext i1 %not to i32 (replace) instcombine • visit##OPCODE to do instruction combine. • lib/Transforms/InstCombine/ add (sext i1 X), 1 —> zext (not X)
  • 5. Target Independent Code Generator SelectionDAG nodes DAG combine Legalize typesDAG combine Legalize vectors Legalize types DAG combine DAG legalize DAG combine Instruction selection LLVM IR SelectionDAG Builder Machine DAGSchedulerMachineInstr
  • 6. DAGCombiner • Combine in target independent rules • Combine in target dependent rules • XXXTargetLowering::PerformDAGCombine • Promote
  • 7. DAGCombiner Target Independent Rules • DAGCombiner::visit(SDNode *N) • ISD::ADD -> visitADD(N) • (add c1, c2) -> c1 + c2 • (add x, 0) -> x • (add (sub c1, A), c2) -> (sub (add c1, c2), A) • (add (sext i1 X), 1) -> (zext (not i1 X)) • ((0-A) + B) -> (B - A) • (A + (0-B)) -> (A - B) • (A + (B-A)) -> B • …
  • 8. DAGCombiner Target Dependent Rules • virtual SDValue TargetLowering::PerformDAGCombine
 (SDNode *N, DAGCombinerInfo &DCI) const; • ARMTargetLowering::PerformDAGCombine(N, DCI) • ISD::ADD -> PerformADDCombine(N, DCI, Subtarget) • (add (select cc, 0, c), x) -> (select cc, x, (add x, c)) • These transformations eventually create predicated instructions.
  • 9. DAGCombiner Promote • PromoteIntBinOp • ADD/SUB/MUL/AND/OR/XOR • X86TargetLowering::isTypeDesirableForOp(unsigned Opc, EVT VT) • On x86 i16 is legal, but undesirable since i16 instruction encodings are longer and some i16 instructions are slow. • X86TargetLowering::IsDesirableToPromoteOp(SDValue Op, EVT &PVT) • Return desirable promoted type in PVT. • DAG.getNode(ISD::TRUNCATE, DL, VT, DAG.getNode(Opc, DL, PVT, NN0, NN1)) • NN0 = promoted operand 0 • NN1 = promoted operand 1 • PromoteIntShiftOp • SHL/SRA/SRL • DAG.getNode(ISD::TRUNCATE, DL, VT, DAG.getNode(Opc, DL, PVT, N0, N1)) • N0 = promoted operand 0
  • 10. DAGCombiner Promote • PromoteExtend • SIGN_EXTEND/ZERO_EXTEND/ANY_EXTEND • (aext (aext x)) -> (aext x) • (aext (zext x)) -> (zext x) • (aext (sext x)) -> (sext x) • PromoteLoad • LOAD • TRUNCATE + SEXTLOAD/ZEXTLOAD
  • 11. MachineCombiner 2014-08-03 Gerolf Hoflehner <ghoflehner@apple.com> MachineCombiner Pass for selecting faster instruction sequence - target independent framework When the DAGcombiner selects instruction sequences it could increase the critical path or resource len. For example, on arm64 there are multiply-accumulate instructions (madd, msub). If e.g. the equivalent multiply-add sequence is not on the crictial path it makes sense to select it instead of the combined, single accumulate instruction (madd/msub). The reason is that the conversion from add+mul to the madd could lengthen the critical path by the latency of the multiply. But the DAGCombiner would always combine and select the madd/msub instruction. This patch uses machine trace metrics to estimate critical path length and resource length of an original instruction sequence vs a combined instruction sequence and picks the faster code based on its estimates. https://reviews.llvm.org/rL214666
  • 12. 2014-08-07 Gerolf Hoflehner e4fa341 MachineCombiner Pass for selecting faster instruction sequence on AArch64 2015-06-10 Sanjay Patel c826b54 [x86] Add a reassociation optimization to increase ILP via the MachineCombiner pass 2015-07-15 Hal Finkel 8913d18 [PowerPC] Use the MachineCombiner to reassociate fadd/ fmul 2015-09-21 15:09 Chad Rosier c5d4530 [Machine Combiner] Refactor machine reassociation code to be target-independent. MachineCombiner
  • 13. • Only combine a sequence of instructions when this neither lengthens the critical path nor increase resource pressure. • When optimizing for code size always combine when the new sequence is shorter. • bool TargetInstrInfo::getMachineCombinerPatterns(MI, Patterns) • Pattern should be sorted in priority order since the pattern evaluator stops checking as soon as it finds a faster sequence. • void TargetInstrInfo::genAlternativeCodeSequence(MI, Pattern, InsInstrs, DelInstr, InstrIdxForVirtReg) • When getMachineCombinerPatterns() finds patterns, this function generates the instructions that could replace the original code sequence. MachineCombiner
  • 14. MachineCombiner start MBB in MF end MI in MBB getMachineCombiner Patterns() P in Patterns genAlternativeCodeS equence() improve throughput in loop improve code size improve critical path replace code sequence delete InsInstrs TRUEFALSE TRUE FALSE TRUE FALSE TRUETRUETRUE FALSEFALSEFALSE
  • 15. Machine Combiner Patterns Default Patterns A = ? op ? B = A op X C = B op Y A = ? op ? B’= X op Y C = A op B’ Breaking the dependency between A and B, allowing them to be executed in parallel instead of depending on each other. Y = ? op2 ? (MI2) . . . A = ? opx ? B = A op1 X (MI1) (B has only one use) C = B op Y (ROOT) (op is associable) if op1 != op and op2 == op: C = Y op B patterns = {REASSOC_AX_YB, REASSOC_XA_YB} else: C = B op Y patterns = {REASSOC_AX_BY, REASSOC_XA_BY}
  • 16. Machine Combiner Patterns AArch64 MADDW rs, rn, rm, WZR (rs has only one use) ADDW rt, rs, rp MADDW rt, rn, rm, rp Find instructions that can be turned into madd.