MAKSIM PANCHENKO
AMIR AYUPOV
Adding a BOLT pass
2023 EuroLLVM Developer’s Meeting
Agenda 1. Intro
2. BOLT pipeline and IR
3. Simple peephole rule
• Triaging crashes
4. Adding a BinaryPass
• CFG visualization
01 INTRO
What’s BOLT?
• Binary Optimization and
Layout Tool
• Supports X86 and ARM
ELF programs and
shared libraries
• Part of LLVM monorepo
Cumulative speedup over bootstrapped build,
Building Clang
Ivy Bridge
Broadwell
Skylake
Icelake
Golden Cove
(Alderlake-P)
Gracemont
(Alderlake-E)
Zen1
Zen2
Zen3
0% 45%
13.1%
16.2%
19.2%
19.7%
10.6%
10.6%
11.7%
14.1%
23.5%
16.2%
17.8%
23.4%
16.5%
12.1%
10.5%
12.1%
16.6%
22.5%
PGO +BOLT
30.6%
23.9%
21.1%
22.7%
36.2%
42.6%
34.1%
29.4%
46.0%
What does BOLT do?
01 INTRO
• Profile-guided optimizations at whole program level
﹘Code layout optimizations
﹘Sampling (LBR) or instrumentation profile driven
﹘Supporting third-party libraries with no source, and hand-written assembly
• Other uses
﹘Whole program transformations or instrumentation (e.g. spectre mitigation)
﹘Reverse engineering
﹘Profile analysis
01 INTRO
Why would you want to add a BOLT pass?
• Exploring optimization opportunities leveraging accurate profiling information:
﹘HW mechanisms: alignment for macro-fusion (Intel/AMD), atomic execution
(Intel TSX/ARM TME)
﹘OS/HW feedbacks: memory profile (Linux perf), branch mispredictions and
latency information (Intel LBR/ARM BRBE)
• Looking for binary patterns leveraging rich analysis framework:
﹘Metadata: CFI/EH information, DWARF parsing and updates
﹘Functions and instructions: call graph, register/stack slot liveness
02 BOLT pipeline and IR
Discover
Disassemble
Build CFG
Read Profile
Optimize
Emit
Rewrite
1. Find functions and data in code
2. Identify instructions inside functions
3. Analyze instructions and build CFG
4. Read profile and attribute to CFG edges
5. Execute local and global optimization passes
6. Generate code and process relocations
7. Write code to file and update ELF
02 BOLT PIPELINE AND IR
BOLT processing pipeline
02 BOLT PIPELINE AND IR
Module
Function Function Function
Function
Basic
Block
Basic
Block
Basic
Block
BasicBlock
Instruction Instruction Instruction
BinaryContext
Binary
Function
Binary
Function
Binary
Function
BinaryFunction
Binary
Basic
Block
Binary
Basic
Block
Binary
Basic
Block
BinaryBasicBlock
MCInst MCInst MCInst
LLVM IR BOLT IR
Module
Function
BasicBlock
Instruction
LLVM IR Machine IR
Machine
ModuleInfo
Machine
Function
Machine
BasicBlock
Machine
Instruction
Binary
Context
Binary
Function
Binary
BasicBlock
MCInst via
MCPlus
BOLT IR
Machine
ModuleInfo*
Machine
Function*
Selection
DAG
SDNode
SelectionDAG MC
Object
File
Symbols*
Labels*
MCInst
02 BOLT PIPELINE AND IR
MCPlus(Plus):
extensible
abstraction layer
MCPlus is BOLT’s abstraction layer for
• target-specific info beyond
MCRegInfo/MCInstrAnalysis
• target-independent info beyond
opcode/operands
• analyses and manipulations
MCPlusBuilder class (BC->MIB)
Simple checks: isNoop,
isIndirectBranch
Annotations: addAnnotation
Analysis: analyzePLTEntry,
evaluateX86MemoryOperand
Instruction creation: createTailCall
Complex transformations:
indirectCallPromotion
02 BOLT PIPELINE AND IR
MCPlus
examples
• Raising semantical level:
﹘tail call, invoke, jump table
• Attaching analysis
information to instructions:
﹘Liveness, ReachingDefs
# MC Jump -> MCPlus Tail Call
00000002: ja func # TAILCALL # CTCTakenCount: 4
# MC Call -> MCPlus Invoke using EH annotations
00000043: callq _Z11filteri # handler: .LLP2;
action: 1;
GNU_args_size = 0
# MC Indirect jump -> MCPlus Jump Table
00000014: jmpq *%rax # JUMPTABLE @0x290
# JTIndexReg: 0
# Analysis information
0040117a: !PHI %r8 # ID: 3
# DF: %r8[23.], %r8[0.] -> %r8[23.]
02 BOLT PIPELINE AND IR
03 Simple peephole rule
The best way to zero a register on X86?
• movl $0x0, %eax
• andl $0x0, %eax
• subl %eax, %eax
• xorl %eax, %eax
# [0xb8,0x00,0x00,0x00,0x00]
# [0x83,0xe0,0x00]
# [0x29,0xc0]
# [0x31,0xc0]
bool shortenInstruction(MCInst &Inst,
const MCSubtargetInfo &STI) const override {
...
Inst.setOpcode(NewOpcode);
// Replace `mov[lq] $0x0, %[er]ax` with `xor[lq] %[er]ax, %[er]ax`
switch (NewOpcode) {
default:
break;
case X86::MOV64ri:
case X86::MOV64ri32:
case X86::MOV32ri:
auto OpNum = MCPlus::getNumPrimeOperands(Inst) - 1;
if (Inst.getOperand(OpNum).isImm() && !Inst.getOperand(OpNum).getImm()) {
if (NewOpcode == X86::MOV32ri)
NewOpcode = X86::XOR32rr;
else
NewOpcode = X86::XOR64rr;
MCOperand Op = Inst.getOperand(0);
Inst.setOpcode(NewOpcode);
Inst.clear();
Inst.addOperand(Op);
Inst.addOperand(Op);
Inst.addOperand(Op);
}
break;
}
if (NewOpcode == OldOpcode)
return false;
Inst.setOpcode(NewOpcode);
return true;
}
03 SIMPLE PEEPHOLE RULE
Leveraging
existing
passes
shortenInstructions
pass calls
MCPlusBuilder::
shortenInstruction –
fits the bill.
Running tests
Open-source BOLT tests:
• Source-based tests, in-repo
• Binary tests, out-of-repo
Both leverage LLVM Lit infra
# https://github.com/rafaelauler/bolt-tests
$ llvm-lit -a tools/bolttests/X86/clang.test
-- Testing: 1 tests, 1 workers --
FAIL: bolt-tests :: X86/clang.test (1 of 1)
**** TEST 'bolt-tests :: X86/clang.test' FAILED ****
...
Stack dump:
0. Program arguments: clang.test.tmp <...>
1. <eof> parser at end of file
clang.test.tmp: error: unable to execute command:
Segmentation fault (core dumped)
--
********************
Failed Tests (1):
bolt-tests :: X86/clang.test
03 SIMPLE PEEPHOLE RULE
Bughunter script
Bisecting to a function which causes a
crash.
Pass the resulting function as
--funcs=funcname
to reproduce the issue.
03 TRIAGING A CRASH
bolt/utils/bughunter.sh
Invocation:
BOLT=/build/llvm-bolt 
BOLT_OPTIONS="-v=1" 
INPUT_BINARY=/path/to/binary 
# COMMAND_LINE="--version" or
# OFFLINE=1 
bolt/utils/bughunter.sh
Output:
Text file containing the culprit function.
Bughunter
example
$ BOLT=llvm-bolt INPUT_BINARY=clang 
BOLT_OPTIONS="-b clang.fdata <...>" 
COMMAND_LINE="bf.cpp -c" bolt/utils/bughunter.sh
Verify input binary passes
INPUT_BINARY: clang bf.cpp -c
Input binary passes.
Verify optimized binary fails
OPTIMIZED_BINARY: clang.bolt bf.cpp -c
Optimized binary fails as expected.
Iteration 0, trying func-names.abc / 45312 functions
...
The function(s) that failed are in func-names.xyz
To reproduce, run:
llvm-bolt <...> -funcs-file-no-regex=func-names.xyz
03 TRIAGING A CRASH
$ llvm-bolt clang --funcs=_ZN4ll… --print-all <...>
Binary Function "_ZN4…" after shorten-instructions {
State : CFG constructed
Address : 0x1d78640
Section : .text
BB Layout : .LBB00, ...
}
.LBB00 (30 instructions, align : 1)
Entry Point
00000000: pushq %rbp
--print-all
03 TRIAGING A CRASH
Producing text dumps for all
processed functions after
each pass.
--print-disasm
--print-cfg
--print-{pass}
--print-only=funcname
Before:
cmpb $0x0, 0x8(%rax)
movl $0x0, %edx
movq -0x8(%rbp), %r13
movq (%rdi), %rax
cmovel %edx, %ebx
After:
cmpb $0x0, 0x8(%rax)
xorl %edx, %edx
movq -0x8(%rbp), %r13
movq (%rdi), %rax
cmovel %edx, %ebx
$ llvm-bolt clang --funcs=_ZN4ll… --asm-dump <...>
BOLT-INFO: Dumping function assembly to _ZN4llvm….s
AsmDump
03 TRIAGING A CRASH
Producing an annotated
assembly which can be
turned into a BOLT test.
--asm-dump[=dir]
# _ZN4llvm….s
.globl _ZN4…
.type _ZN4…, %function
_ZN4…:
.cfi_startproc
.LBB00:
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %rsp, %rbp
...
# bolt/test/X86/test.s
.globl _start
.type _start, %function
_start:
.cfi_startproc
cmpb $0x0, 0x8(%rax)
movl $0x0, %edx
movq -0x8(%rbp), %r13
movq (%rdi), %rax
cmovel %edx, %ebx
.cfi_endproc
.size _start, .-_start
04 Adding a pass
Logistics
Inherit from
BinaryFunctionPass
Add to bolt/lib/Passes/
BinaryPasses.cpp or
ZeroIdiom.cpp
/* bolt/include/bolt/Passes/ZeroIdiom.h */
class ZeroIdiom : public BinaryFunctionPass {
public:
explicit ZeroIdiom(const cl::opt<bool> &PrintPass)
: BinaryFunctionPass(PrintPass) {}
const char *getName() const override {
return "zero-idiom";
}
void runOnFunctions(BinaryContext &) override;
};
/* bolt/lib/Passes/ZeroIdiom.cpp */
ZeroIdiom::runOnFunctions(BinaryContext &BC) {
04 ADDING A PASS
Logistics
Append invocation to
BinaryPassManager.cpp
/* bolt/lib/Rewrite/BinaryPassManager.cpp */
static cl::opt<bool> PrintZeroIdiom(
"print-zero-idiom",
cl::desc("print functions after zero idiom pass"),
cl::cat(BoltOptCategory));
void BinaryFunctionPassManager::runAllPasses(
BinaryContext &BC) {
Manager.registerPass(
std::make_unique<ZeroIdiom>(PrintZeroIdiom));
}
04 ADDING A PASS
Extra
analyses
Make use of
DataflowManager
providing
LivenessAnalysis
/* bolt/lib/Passes/ZeroIdiom.cpp */
void ZeroIdiom::runOnFunction(BinaryFunction &BF,
DataflowInfoManager &Info)
{
BinaryContext &BC = BF.getBinaryContext();
LivenessAnalysis &LA = Info.getLivenessAnalysis();
for (BinaryBasicBlock &BB : BF) {
for (MCInst &Inst : BB) {
if (LA.isAlive(&Inst, BC.MIB->getFlagsReg()))
continue;
BC.MIB->replaceZeroIdiom(Inst);
}
}
}
04 ADDING A PASS
Running in
parallel
ParallelUtilities
provides means to run a
pass in parallel on
BinaryFunctions
/* bolt/lib/Passes/ZeroIdiom.cpp */
#include "bolt/Core/ParallelUtilities.h"
ZeroIdiom::runOnFunctions(BinaryContext &BC) {
auto RA = std::make_unique<RegAnalysis>(
BC, &BC.getBinaryFunctions(), nullptr);
ParallelUtilities::WorkFuncWithAllocTy WorkFun =
[&](BinaryFunction &BF, AllocatorIdTy AllocId) {
DataflowInfoManager Info(BF, RA.get(), nullptr,
AllocId);
runOnFunction(BF, Info);
};
ParallelUtilities::runOnEachFunctionWithUniqueAllocId(
BC,
ParallelUtilities::SchedulingPolicy::SP_INST_LINEAR,
WorkFun, nullptr, "ZeroIdiom");
}
04 ADDING A PASS
Success!
$ llvm-lit -a tools/bolttests/X86/clang.test
-- Testing: 1 tests, 1 workers --
PASS: bolt-tests :: X86/clang.test (1 of 1)
Command Output (stdout):
--
...
BOLT-INFO: zero-idiom replaced 465 instructions
(exec count 509) in 367 functions saving 1395 bytes
********************
Testing Time: 106.50s
Passed: 1
04 ADDING A PASS
04 CFG VISUALIZATION
dot format
llvm-bolt
--dump-dot-all
Outputs
funcname-00_build-cfg.dot
04 CFG VISUALIZATION
Interactive HTML
llvm-bolt
--dump-dot-all
--print-loops --dot-tooltip-code
bolt/utils/dot2html/dot2html.py
main-25_zero-idiom.dot{,.html}
Adding a BOLT pass

Adding a BOLT pass

  • 1.
    MAKSIM PANCHENKO AMIR AYUPOV Addinga BOLT pass 2023 EuroLLVM Developer’s Meeting
  • 2.
    Agenda 1. Intro 2.BOLT pipeline and IR 3. Simple peephole rule • Triaging crashes 4. Adding a BinaryPass • CFG visualization
  • 3.
    01 INTRO What’s BOLT? •Binary Optimization and Layout Tool • Supports X86 and ARM ELF programs and shared libraries • Part of LLVM monorepo Cumulative speedup over bootstrapped build, Building Clang Ivy Bridge Broadwell Skylake Icelake Golden Cove (Alderlake-P) Gracemont (Alderlake-E) Zen1 Zen2 Zen3 0% 45% 13.1% 16.2% 19.2% 19.7% 10.6% 10.6% 11.7% 14.1% 23.5% 16.2% 17.8% 23.4% 16.5% 12.1% 10.5% 12.1% 16.6% 22.5% PGO +BOLT 30.6% 23.9% 21.1% 22.7% 36.2% 42.6% 34.1% 29.4% 46.0%
  • 4.
    What does BOLTdo? 01 INTRO • Profile-guided optimizations at whole program level ﹘Code layout optimizations ﹘Sampling (LBR) or instrumentation profile driven ﹘Supporting third-party libraries with no source, and hand-written assembly • Other uses ﹘Whole program transformations or instrumentation (e.g. spectre mitigation) ﹘Reverse engineering ﹘Profile analysis
  • 5.
    01 INTRO Why wouldyou want to add a BOLT pass? • Exploring optimization opportunities leveraging accurate profiling information: ﹘HW mechanisms: alignment for macro-fusion (Intel/AMD), atomic execution (Intel TSX/ARM TME) ﹘OS/HW feedbacks: memory profile (Linux perf), branch mispredictions and latency information (Intel LBR/ARM BRBE) • Looking for binary patterns leveraging rich analysis framework: ﹘Metadata: CFI/EH information, DWARF parsing and updates ﹘Functions and instructions: call graph, register/stack slot liveness
  • 6.
  • 7.
    Discover Disassemble Build CFG Read Profile Optimize Emit Rewrite 1.Find functions and data in code 2. Identify instructions inside functions 3. Analyze instructions and build CFG 4. Read profile and attribute to CFG edges 5. Execute local and global optimization passes 6. Generate code and process relocations 7. Write code to file and update ELF 02 BOLT PIPELINE AND IR BOLT processing pipeline
  • 8.
    02 BOLT PIPELINEAND IR Module Function Function Function Function Basic Block Basic Block Basic Block BasicBlock Instruction Instruction Instruction BinaryContext Binary Function Binary Function Binary Function BinaryFunction Binary Basic Block Binary Basic Block Binary Basic Block BinaryBasicBlock MCInst MCInst MCInst LLVM IR BOLT IR
  • 9.
    Module Function BasicBlock Instruction LLVM IR MachineIR Machine ModuleInfo Machine Function Machine BasicBlock Machine Instruction Binary Context Binary Function Binary BasicBlock MCInst via MCPlus BOLT IR Machine ModuleInfo* Machine Function* Selection DAG SDNode SelectionDAG MC Object File Symbols* Labels* MCInst 02 BOLT PIPELINE AND IR
  • 10.
    MCPlus(Plus): extensible abstraction layer MCPlus isBOLT’s abstraction layer for • target-specific info beyond MCRegInfo/MCInstrAnalysis • target-independent info beyond opcode/operands • analyses and manipulations MCPlusBuilder class (BC->MIB) Simple checks: isNoop, isIndirectBranch Annotations: addAnnotation Analysis: analyzePLTEntry, evaluateX86MemoryOperand Instruction creation: createTailCall Complex transformations: indirectCallPromotion 02 BOLT PIPELINE AND IR
  • 11.
    MCPlus examples • Raising semanticallevel: ﹘tail call, invoke, jump table • Attaching analysis information to instructions: ﹘Liveness, ReachingDefs # MC Jump -> MCPlus Tail Call 00000002: ja func # TAILCALL # CTCTakenCount: 4 # MC Call -> MCPlus Invoke using EH annotations 00000043: callq _Z11filteri # handler: .LLP2; action: 1; GNU_args_size = 0 # MC Indirect jump -> MCPlus Jump Table 00000014: jmpq *%rax # JUMPTABLE @0x290 # JTIndexReg: 0 # Analysis information 0040117a: !PHI %r8 # ID: 3 # DF: %r8[23.], %r8[0.] -> %r8[23.] 02 BOLT PIPELINE AND IR
  • 12.
  • 13.
    The best wayto zero a register on X86? • movl $0x0, %eax • andl $0x0, %eax • subl %eax, %eax • xorl %eax, %eax # [0xb8,0x00,0x00,0x00,0x00] # [0x83,0xe0,0x00] # [0x29,0xc0] # [0x31,0xc0]
  • 14.
    bool shortenInstruction(MCInst &Inst, constMCSubtargetInfo &STI) const override { ... Inst.setOpcode(NewOpcode); // Replace `mov[lq] $0x0, %[er]ax` with `xor[lq] %[er]ax, %[er]ax` switch (NewOpcode) { default: break; case X86::MOV64ri: case X86::MOV64ri32: case X86::MOV32ri: auto OpNum = MCPlus::getNumPrimeOperands(Inst) - 1; if (Inst.getOperand(OpNum).isImm() && !Inst.getOperand(OpNum).getImm()) { if (NewOpcode == X86::MOV32ri) NewOpcode = X86::XOR32rr; else NewOpcode = X86::XOR64rr; MCOperand Op = Inst.getOperand(0); Inst.setOpcode(NewOpcode); Inst.clear(); Inst.addOperand(Op); Inst.addOperand(Op); Inst.addOperand(Op); } break; } if (NewOpcode == OldOpcode) return false; Inst.setOpcode(NewOpcode); return true; } 03 SIMPLE PEEPHOLE RULE Leveraging existing passes shortenInstructions pass calls MCPlusBuilder:: shortenInstruction – fits the bill.
  • 15.
    Running tests Open-source BOLTtests: • Source-based tests, in-repo • Binary tests, out-of-repo Both leverage LLVM Lit infra # https://github.com/rafaelauler/bolt-tests $ llvm-lit -a tools/bolttests/X86/clang.test -- Testing: 1 tests, 1 workers -- FAIL: bolt-tests :: X86/clang.test (1 of 1) **** TEST 'bolt-tests :: X86/clang.test' FAILED **** ... Stack dump: 0. Program arguments: clang.test.tmp <...> 1. <eof> parser at end of file clang.test.tmp: error: unable to execute command: Segmentation fault (core dumped) -- ******************** Failed Tests (1): bolt-tests :: X86/clang.test 03 SIMPLE PEEPHOLE RULE
  • 16.
    Bughunter script Bisecting toa function which causes a crash. Pass the resulting function as --funcs=funcname to reproduce the issue. 03 TRIAGING A CRASH bolt/utils/bughunter.sh Invocation: BOLT=/build/llvm-bolt BOLT_OPTIONS="-v=1" INPUT_BINARY=/path/to/binary # COMMAND_LINE="--version" or # OFFLINE=1 bolt/utils/bughunter.sh Output: Text file containing the culprit function.
  • 17.
    Bughunter example $ BOLT=llvm-bolt INPUT_BINARY=clang BOLT_OPTIONS="-b clang.fdata <...>" COMMAND_LINE="bf.cpp -c" bolt/utils/bughunter.sh Verify input binary passes INPUT_BINARY: clang bf.cpp -c Input binary passes. Verify optimized binary fails OPTIMIZED_BINARY: clang.bolt bf.cpp -c Optimized binary fails as expected. Iteration 0, trying func-names.abc / 45312 functions ... The function(s) that failed are in func-names.xyz To reproduce, run: llvm-bolt <...> -funcs-file-no-regex=func-names.xyz 03 TRIAGING A CRASH
  • 18.
    $ llvm-bolt clang--funcs=_ZN4ll… --print-all <...> Binary Function "_ZN4…" after shorten-instructions { State : CFG constructed Address : 0x1d78640 Section : .text BB Layout : .LBB00, ... } .LBB00 (30 instructions, align : 1) Entry Point 00000000: pushq %rbp --print-all 03 TRIAGING A CRASH Producing text dumps for all processed functions after each pass. --print-disasm --print-cfg --print-{pass} --print-only=funcname Before: cmpb $0x0, 0x8(%rax) movl $0x0, %edx movq -0x8(%rbp), %r13 movq (%rdi), %rax cmovel %edx, %ebx After: cmpb $0x0, 0x8(%rax) xorl %edx, %edx movq -0x8(%rbp), %r13 movq (%rdi), %rax cmovel %edx, %ebx
  • 19.
    $ llvm-bolt clang--funcs=_ZN4ll… --asm-dump <...> BOLT-INFO: Dumping function assembly to _ZN4llvm….s AsmDump 03 TRIAGING A CRASH Producing an annotated assembly which can be turned into a BOLT test. --asm-dump[=dir] # _ZN4llvm….s .globl _ZN4… .type _ZN4…, %function _ZN4…: .cfi_startproc .LBB00: pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset %rbp, -16 movq %rsp, %rbp ... # bolt/test/X86/test.s .globl _start .type _start, %function _start: .cfi_startproc cmpb $0x0, 0x8(%rax) movl $0x0, %edx movq -0x8(%rbp), %r13 movq (%rdi), %rax cmovel %edx, %ebx .cfi_endproc .size _start, .-_start
  • 20.
  • 21.
    Logistics Inherit from BinaryFunctionPass Add tobolt/lib/Passes/ BinaryPasses.cpp or ZeroIdiom.cpp /* bolt/include/bolt/Passes/ZeroIdiom.h */ class ZeroIdiom : public BinaryFunctionPass { public: explicit ZeroIdiom(const cl::opt<bool> &PrintPass) : BinaryFunctionPass(PrintPass) {} const char *getName() const override { return "zero-idiom"; } void runOnFunctions(BinaryContext &) override; }; /* bolt/lib/Passes/ZeroIdiom.cpp */ ZeroIdiom::runOnFunctions(BinaryContext &BC) { 04 ADDING A PASS
  • 22.
    Logistics Append invocation to BinaryPassManager.cpp /*bolt/lib/Rewrite/BinaryPassManager.cpp */ static cl::opt<bool> PrintZeroIdiom( "print-zero-idiom", cl::desc("print functions after zero idiom pass"), cl::cat(BoltOptCategory)); void BinaryFunctionPassManager::runAllPasses( BinaryContext &BC) { Manager.registerPass( std::make_unique<ZeroIdiom>(PrintZeroIdiom)); } 04 ADDING A PASS
  • 23.
    Extra analyses Make use of DataflowManager providing LivenessAnalysis /*bolt/lib/Passes/ZeroIdiom.cpp */ void ZeroIdiom::runOnFunction(BinaryFunction &BF, DataflowInfoManager &Info) { BinaryContext &BC = BF.getBinaryContext(); LivenessAnalysis &LA = Info.getLivenessAnalysis(); for (BinaryBasicBlock &BB : BF) { for (MCInst &Inst : BB) { if (LA.isAlive(&Inst, BC.MIB->getFlagsReg())) continue; BC.MIB->replaceZeroIdiom(Inst); } } } 04 ADDING A PASS
  • 24.
    Running in parallel ParallelUtilities provides meansto run a pass in parallel on BinaryFunctions /* bolt/lib/Passes/ZeroIdiom.cpp */ #include "bolt/Core/ParallelUtilities.h" ZeroIdiom::runOnFunctions(BinaryContext &BC) { auto RA = std::make_unique<RegAnalysis>( BC, &BC.getBinaryFunctions(), nullptr); ParallelUtilities::WorkFuncWithAllocTy WorkFun = [&](BinaryFunction &BF, AllocatorIdTy AllocId) { DataflowInfoManager Info(BF, RA.get(), nullptr, AllocId); runOnFunction(BF, Info); }; ParallelUtilities::runOnEachFunctionWithUniqueAllocId( BC, ParallelUtilities::SchedulingPolicy::SP_INST_LINEAR, WorkFun, nullptr, "ZeroIdiom"); } 04 ADDING A PASS
  • 25.
    Success! $ llvm-lit -atools/bolttests/X86/clang.test -- Testing: 1 tests, 1 workers -- PASS: bolt-tests :: X86/clang.test (1 of 1) Command Output (stdout): -- ... BOLT-INFO: zero-idiom replaced 465 instructions (exec count 509) in 367 functions saving 1395 bytes ******************** Testing Time: 106.50s Passed: 1 04 ADDING A PASS
  • 26.
    04 CFG VISUALIZATION dotformat llvm-bolt --dump-dot-all Outputs funcname-00_build-cfg.dot
  • 27.
    04 CFG VISUALIZATION InteractiveHTML llvm-bolt --dump-dot-all --print-loops --dot-tooltip-code bolt/utils/dot2html/dot2html.py main-25_zero-idiom.dot{,.html}