SlideShare a Scribd company logo
淺談Compiler最佳化技術 
Hsinchu Tech Chat Group 
Date : Dec 7th, 2014 
Kito Cheng 
kito.cheng@gmail.com
2 
自我介紹 
安第斯山脈 
Compiler Team 
專業打雜工
3 
Compiler?
4 
Compiler?
5 
Compilation Flow 
[1] Compilers: Principles, Techniques, and Tools (2nd Edition) p.5
6 
Compilation Flow 
通常大學部編譯器課程僅能 
涵蓋 Parser 部份 
以及陽春的 Code Generation 
[1] Compilers: Principles, Techniques, and Tools (2nd Edition) p.5
7 
Compilation Flow 
但 Compiler 超好玩超神奇的部份 
其實都在最佳化的地方 
[1] Compilers: Principles, Techniques, and Tools (2nd Edition) p.5
8 
Compilation Flow 
但 Compiler 超好玩超神奇的部份 
其實都在最佳化的地方 
透過最佳化, 
程式可以變得又小又快! 
[1] Compilers: Principles, Techniques, and Tools (2nd Edition) p.5
9 
Compilation Flow 
[1] Compilers: Principles, Techniques, and Tools (2nd Edition) p.5
10 
• 這次分享基本上不會涉及太多高深理論, 
僅會透過介紹概念並透過範例來講解 
• 使用 LLVM 來作為說明以及展示的輔助工 
具
基礎知識惡補11
基礎知識惡補 
12 
• Basic Block 
• Control Flow Graph 
• Static Single Assignment Form
Basic Block 
13 
• 單一進入點, 單一出口點的程式區段 
• http://en.wikipedia.org/wiki/Basic_bl 
ock
Control Flow Graph 
14 
• 簡稱CFG, 簡單來說就是程式的流程圖 
• http://en.wikipedia.org/wiki/Control_ 
flow_graph
Basic Block 
15 
int foo (ini n) 
{ 
int ret; 
if (n > 10) 
ret = n * 2; 
else 
ret = n + 2; 
return ret; 
}
Basic Block 
16 
int foo (ini n) 
{ 
int ret; 
if (n > 10) 
ret = n * 2; 
else 
ret = n + 2; 
return ret; 
}
Basic Block 
17 
int foo (ini n) 
{ 
int ret; 
if (n > 10) 
ret = n * 2; 
else 
ret = n + 2; 
return ret; 
} 
int ret; 
if (n > 10) 
ret = n * 2; ret = n + 2; 
return ret;
CFG 
18 
int foo (ini n) 
{ 
int ret; 
if (n > 10) 
ret = n * 2; 
else 
ret = n + 2; 
return ret; 
} 
int ret; 
if (n > 10) 
ret = n * 2; ret = n + 2; 
return ret;
Basic Block 
19 
int sum (int n) 
{ 
int ret = 0; 
int i; 
for (i = 0; i < n; ++i) 
ret += i; 
return ret; 
}
Basic Block 
20 
int sum (int n) 
{ 
int ret = 0; 
int i; 
for (i = 0; i < n; ++i) 
ret += i; 
return ret; 
}
Basic Block 
21 
int sum (int n) 
{ 
int ret = 0; 
int i; 
for (i = 0; i < n; ++i) 
ret += i; 
return ret; 
} 
int ret = 0; 
int i; 
i = 0; 
i < n; 
ret += i; 
++i 
return ret
CFG 
22 
int sum (int n) 
{ 
int ret = 0; 
int i; 
for (i = 0; i < n; ++i) 
ret += i; 
return ret; 
} 
int ret = 0; 
int i; 
i = 0; 
i < n; 
ret += i; 
++i 
return ret
Static Single Assignment 
23 
• 將變數標上版本 
• 每個值只會賦值/寫入一次 
• http://en.wikipedia.org/wiki/Static_s 
ingle_assignment_form
SSA 
24 
int foo () 
{ 
int ret; 
ret = 10; 
ret = 20; 
return ret; 
}
SSA 
25 
int foo () 
{ 
int ret; 
ret = 10; 
ret = 20; 
return ret; 
} 
int foo () 
{ 
int ret; 
ret1 = 10; 
ret2 = 20; 
return ret2; 
} 
每次賦值都會一個版本號
SSA 
26 
int foo () 
{ 
int ret; 
ret = 10; 
ret = 20; 
return ret; 
} 
int foo () 
{ 
int ret; 
ret1 = 10; 
ret2 = 20; 
return ret2; 
} 
每次賦值都會一個版本號 
標完後可以馬上知道 
是使用哪個運算式的結果
SSA 
27 
int foo (ini n) 
{ 
int ret; 
if (n > 10) 
ret = n * 2; 
else 
ret = n + 2; 
return ret; 
}
SSA 
28 
int foo (ini n) 
{ 
int ret; 
if (n > 10) 
ret = n * 2; 
else 
ret = n + 2; 
return ret; 
} 
int foo (ini n) 
{ 
int ret; 
if (n > 10) 
ret1 = n * 2; 
else 
ret2 = n + 2; 
return ret?; 
} 
程式中有分歧點會合時 
無法判定是從何而來
SSA 
29 
int foo (ini n) 
{ 
int ret; 
if (n > 10) 
ret = n * 2; 
else 
ret = n + 2; 
return ret; 
} 
int foo (ini n) 
{ 
int ret; 
if (n > 10) 
ret1 = n * 2; 
else 
ret2 = n + 2; 
ret3 = Φ (ret1, ret2) 
return ret3; 
} 
此時需要使用Φ來 
處理這種情況, 
表示值的定義 
需由程式流程決定 
並給予新的版本號
30 L L V M
LLVM 
31 
• 好用好玩而且最近很夯的 Compiler, 安 
裝方法如下: 
– sudo apt-get install llvm clang xdot 
– sudo yum install llvm clang python-xdot 
python-setuptools
LLVM 
32 
• 好用好玩而且最近很夯的 Compiler, 安 
裝方法如下: 
– sudo apt-get install llvm clang xdot 
– sudo yum install llvm clang python-xdot 
python-setuptools 
xdot 是要看圖用的
LLVM 
33 
• 好用好玩而且最近很夯的 Compiler, 安 
裝方法如下: 
– sudo apt-get install llvm clang xdot 
– sudo yum install llvm clang python-xdot 
python-setuptools 
xdot 是要看圖用的 
這個嘛...Fedora 套件系統 
相依性沒設定好, xdot 的相依套件
LLVM 
34 
• 好用好玩而且最近很夯的 Compiler, 安 
裝方法如下: 
– sudo apt-get install llvm clang xdot 
– sudo yum install llvm clang python-xdot 
python-setuptools 
– 不是 apt-get 或 yum ? 那就假設你是高手 
會自己想辦法XD
LLVM 
35 
• 好用好玩而且最近很夯的 Compiler, 安 
裝方法如下: 
– sudo apt-get install llvm clang xdot 
– sudo yum install llvm clang python-xdot 
python-setuptools 
– 不是 apt-get 或 yum ? 那就假設你是高手 
會自己想辦法XD 
– Windows !? 聽說官網有安裝檔?
LLVM 
36 
• 好用好玩而且最近很夯的 Compiler, 安 
裝方法如下: 
– sudo apt-get install llvm clang xdot 
– sudo yum install llvm clang python-xdot 
python-setuptools 
– 不是 apt-get 或 yum ? 那就假設你是高手 
會自己想辦法XD 
– Windows !? 聽說官網有安裝檔? 
– 建議自己 build, 不然會沒有部份debug功能
LLVM IR 
37 
• v = operation type op1, op2, opn... 
– %sum = add i32 %op1, %op2 
運算元 
型態 
運算子們 
運算結果
空空的LLVM函數 
38 
define void @empty() { 
ret void 
} 
宣告函數的起手式 
回傳型態 
參數列 
@函數名稱 
回傳 + 型態
有一個參數的LLVM函數 
39 
參數列, 有一個參數叫 %a 
define void @arg1(i32 %a) { 
ret void 
}
有一個參數並且直接回傳 
的LLVM函數 
40 
回傳值是 i32 
define i32 @arg1(i32 %a) { 
ret i32 %a 
} 
回傳 + 型態 + 回傳值
有一個參數並且回傳其參數加十 
的LLVM函數 
41 
define i32 @arg1(i32 %a) { 
%t = add i32 %a, 10 
ret i32 %t 
} 
%a加10放到%1
LLVM IR 
42 
• SSA-Based IR 
– %sum = add i32 %op1, %op2 
– %sum = mul i32 %op1, %op2 
– error: multiple definition of local 
value named 'sum'
SSA!? 
43 
• 對編譯器來講 SSA Form 很友善, 但對 
於正常人來說寫 SSA Form 不太直覺...
SSA!? 
44 
• 對編譯器來講 SSA Form 很友善, 但對 
於正常人來說寫 SSA Form 不太直覺... 
– 習慣Functional programming者例外...XD
SSA!? 
45 
• 對編譯器來講 SSA Form 很友善, 但對 
於正常人來說寫 SSA Form 不太直覺... 
– 習慣Functional programming者例外...XD 
• 手動插入PHI 更是件麻煩事
alloca 
46 
• 用來產生區域變數 
– 分配到的空間放到 stack 
• 使用上有點類似C語言的malloc, 但概念不太一 
樣
alloca 
47 
define void @foo() { 
%var = alloca i32 
ret void 
} 
所產生的位置, 型別 
可以看作是一個i32*
alloca 
48 
• 每次存取都必須透過 load/store 
– 但在最佳化過程中, 若非必要則會變為 
Register (透過mem2reg pass) 
• 若為 array 或必須對其取位址, 則可能 
無法變成 Register
alloca/store 
49 
define void @foo() { 
%var = alloca i32 
store i32 10, i32* %var 
ret void 
} 
要存的值與型別型別跟要存的目標位置
alloca/load 
50 
define void @foo() { 
%var = alloca i32 
store i32 10, i32* %var 
%t0 = load i32* %var 
ret void 
} 
讀取回來的值型別跟要讀取的目標位置
LLVM/Clang 
51 
• 今天的分享中只會使用以下兩個工具: 
– clang : 把 c 變成 LLVM IR 
– opt : 進行最佳化以及觀察的工具
View CFG by LLVM 
52 
• clang foo.c -S -emit-llvm 
• opt foo.ll -veiw-cfg 
int foo(int a, int b) 
{ 
if (a > b) 
return a; 
else 
return b; 
}
View CFG by LLVM 
53 
垃圾指令有點多, 
但在觀察階段開最佳化, 
又會干擾學習
View CFG by LLVM 
54 
垃圾指令有點多, 
但在觀察階段開最佳化, 
又會干擾學習 
opt foo.ll -O1 -veiw-cfg 
開完最佳化後剩三道指令一個BB...
opt 使用注意事項 (1/3) 
55 
• 參數的位置很重要!! 
opt foo.ll -view-cfg -O1 
先秀出 CFG 再進行最佳化 
opt foo.ll -O1 -view-cfg 
先進行最佳化再來看 CFG
opt 使用注意事項 (2/3) 
56 
• 參數可以重複下 
opt foo.ll -view-cfg -O1 -view-cfg 
先秀出 CFG 
再進行最佳化 
最後再看一次 CFG
opt 使用注意事項 (3/3) 
57 
• 參數可以重複下, 最佳化也可以重複作 
opt foo.ll -O1 -view-cfg -O1 -view-cfg 
再進行最佳化 
進行最佳化
mem2reg 
58 
• mem2reg: 不必要的 alloca 以及 
load/store 砍掉 
• 並且把程式變得比較有 SSA Form 的樣子
mem2reg 
opt foo.ll -mem2reg -view-cfg 
59 
phi node 出現了! 
並且也將 alloca 
以及 load/store 砍光
60 
Compiler Optimization 
編譯器最佳化
Propagation 
61 
• Propagation: 傳遞 
– Constant Propagation 
– Copy Propagation
Constant Propagation 
62 
int foo(int a) 
{ 
int magic_num = 10; 
return a + magic_num; 
} 
int foo(int a) 
{ 
int magic_num = 10; 
return a + 10; 
}
Constant Propagation 
63 
opt foo.ll -mem2reg -view-cfg 
int foo(int a) 
{ 
int magic_num = 10; 
return a + magic_num; 
} 
這種最佳化太基本了, 
在mem2reg過程順便作掉 
int foo(int a) 
{ 
int magic_num = 10; 
return a + 10; 
}
Constant Propagation 
64 
int foo(int a) 
{ 
int magic_num = 10; 
return a + magic_num; 
} 
int foo(int a) 
{ 
int magic_num = 10; 
return a + 10; 
} 
千萬不要覺得寫成右邊那樣 
會比較快就寫一堆 
該死的 Magic Number!!!!
Copy Propagation 
65 
b = a 
c = b 
b = a 
c = a
Constant Folding 
66 
• Constant Folding: 常數折疊! 
– 若運算對象都是常數,那就先算出來!
Constant Folding 
67 
• Constant Folding: 常數折疊! 
– 若運算對象都是常數,那就先算出來! 
• a = 123 + 456
Constant Folding 
68 
• Constant Folding: 常數折疊! 
– 若運算對象都是常數,那就先算出來! 
• a = 123 + 456 
– a = 579
Constant Folding 
69 
• Constant Folding: 常數折疊! 
– 若運算對象都是常數,那就先算出來! 
• a = 123 + 456 
– a = 579 
• 程式中不一定有一堆這種常數運算, 但經 
過Constant Propagation 後會慢慢出現
Constant Folding 
70 
a = 10 
b = 100 + a
Constant Folding 
71 
a = 10 
b = 100 + a 
a = 10 
b = 100 + 10 
Constant Propagation
Constant Folding 
72 
a = 10 
b = 100 + a 
a = 10 
b = 100 + 10 
a = 10 
b = 110 
Constant Propagation 
Constant Folding
73 
• 程式中哪來中這麼多常數可以玩!? 
• Propagation跟Folding都是基礎小招, 
與其它最佳化搭配起來可發揮最大效用!
74 
• LLVM這幾樣基礎最佳化都是順便做的, 難 
以獨立觀察... 
• Copy/Constant Propagation 基本上都會 
在 mem2reg 過程中順便處理掉
觀察 Constant Folding 
75 
• Constant Folding 則可以在 LLVM 的 
Constant Propagation Pass 中處理 
define i32 @folding() { 
%t = add i32 10, 20 
ret i32 %t 
} 
define i32 @folding() { 
ret i32 30 
} 
opt -S cfolding.ll -constprop
Function Inline 
76 
• Inline: 行內函數? 內嵌函數? 
• 概念就是把函數內容複製一份到呼叫端 
• 節省掉函數的呼叫並且可探索更多的最佳 
化機會!
Inline + Propagation 
77 
• Inline後原本參數的傳遞變成單純的拷貝 
行為 
– Copy Propagation 
– Constant Propagation
Inline + Propagation 
78 
int add(int a, int b) 
{ 
return a + b; 
} 
int foo(int n){ 
int sum = 0; 
int i, t; 
for (i = 0; i < n ;++i) { 
t = add(10, 20); 
sum = add(sum, i); 
sum = add(sum, t); 
} 
return sum; 
}
Inline + Propagation 
79 
int add(int a, int b) 
{ 
return a + b; 
} 
int foo(int n){ 
int sum = 0; 
int i, t; 
for (i = 0; i < n ;++i) { 
t = add(10, 20); 
sum = add(sum, i); 
sum = add(sum, t); 
} 
return sum; 
} 
define i32 @add(i32 %a, i32 %b) { 
%1 = add i32 %a, %b 
ret i32 %1 
} 
define i32 @foo(i32 %n) { 
br label %1 
; <label>:1 
%sum.0 = phi i32 [ 0, %0 ], [ %6, %7 ] 
%i.0 = phi i32 [ 0, %0 ], [ %8, %7 ] 
%2 = icmp slt i32 %i.0, %n 
br i1 %2, label %3, label %9 
; <label>:3 
%4 = call i32 @add(i32 10, i32 20) 
%5 = call i32 @add(i32 %sum.0, i32 %i.0) 
%6 = call i32 @add(i32 %5, i32 %4) 
br label %7 
; <label>:7 
%8 = add i32 %i.0, 1 
br label %1 
; <label>:9 
ret i32 %sum.0 
} 
clang -emit-llvm -S inline.c 
opt inline.ll -mem2reg -S
Inline + Propagation 
80 
define i32 @add(i32 %a, i32 %b) { 
%1 = add i32 %a, %b 
ret i32 %1 
} 
define i32 @foo(i32 %n) { 
br label %1 
; <label>:1 
%sum.0 = phi i32 [ 0, %0 ], [ %6, %7 ] 
%i.0 = phi i32 [ 0, %0 ], [ %8, %7 ] 
%2 = icmp slt i32 %i.0, %n 
br i1 %2, label %3, label %9 
; <label>:3 
%4 = call i32 @add(i32 10, i32 20) 
%5 = call i32 @add(i32 %sum.0, i32 %i.0) 
%6 = call i32 @add(i32 %5, i32 %4) 
br label %7 
; <label>:7 
%8 = add i32 %i.0, 1 
br label %1 
; <label>:9 
ret i32 %sum.0 
} 
define i32 @foo(i32 %n) { 
br label %1 
; <label>:1 
%sum.0 = phi i32 [ 0, %0 ], [ %5, %6 ] 
%i.0 = phi i32 [ 0, %0 ], [ %7, %6 ] 
%2 = icmp slt i32 %i.0, %n 
br i1 %2, label %3, label %8 
; <label>:3 
%4 = add i32 %sum.0, %i.0 
%5 = add i32 %4, 30 
br label %6 
; <label>:6 
%7 = add i32 %i.0, 1 
br label %1 
; <label>:8 
ret i32 %sum.0 
} 
opt inline.ll -mem2reg -inline -S
DCE 
81 
• DCE: Dead Code Elimination, 死碼消除? 
• 在經過前面介紹的幾樣最佳化後, 慢慢的 
會出現一些冗於的程式碼, 以及一些明顯 
永遠不會成立的跳躍條件
DCE 
82 
int foo() 
{ 
a = 5; 
if (a > 10) 
b = 10; 
else 
b = 20; 
return b; 
}
DCE 
83 
int foo() 
{ 
a = 5; 
if (a > 10) 
b = 10; 
else 
b = 20; 
return b; 
} 
int foo() 
{ 
a = 5; 
if (5 > 10) 
b = 10; 
else 
b = 20; 
return b; 
} 
Constant Propagation
DCE 
84 
int foo() 
{ 
a = 5; 
if (a > 10) 
b = 10; 
else 
b = 20; 
return b; 
} 
int foo() 
{ 
a = 5; 
if (5 > 10) 
b = 10; 
else 
b = 20; 
return b; 
} 
int foo() 
{ 
a = 5; 
if (false) 
b = 10; 
else 
b = 20; 
return b; 
} 
Constant Propagation 
Constant 
Folding
DCE 
85 
int foo() 
{ 
a = 5; 
if (a > 10) 
b = 10; 
else 
b = 20; 
return b; 
} 
int foo() 
{ 
a = 5; 
if (5 > 10) 
b = 10; 
else 
b = 20; 
return b; 
} 
int foo() 
{ 
a = 5; 
if (false) 
b = 10; 
else 
b = 20; 
return b; 
} 
Constant Propagation 
int foo() 
{ 
b = 20; 
return b; 
} 
Constant 
Folding 
DCE
DCE 
86 
int foo() 
{ 
a = 5; 
if (a > 10) 
b = 10; 
else 
b = 20; 
return b; 
} 
int foo() 
{ 
a = 5; 
if (5 > 10) 
b = 10; 
else 
b = 20; 
return b; 
} 
int foo() 
{ 
a = 5; 
if (false) 
b = 10; 
else 
b = 20; 
return b; 
} 
int foo() 
{ 
b = 20; 
return b; 
} 
int foo() 
{ 
return 20; 
} 
Constant Propagation 
Constant 
Folding 
Constant DCE 
Propagation
用LLVM觀察DCE (1/5) 
87 
int foo() 
{ 
int a; 
int b; 
a = 5; 
if (a > 10) 
b = a + 10; 
else 
b = a + 20; 
return b; 
} 
clang -S -emit-llvm dce.c 
define i32 @foo() { 
entry: 
%a = alloca i32 
%b = alloca i32 
store i32 5, i32* %a 
%0 = load i32* %a 
%cmp = icmp sgt i32 %0, 10 
br i1 %cmp, label %if.then, label %if.else 
if.then: 
%1 = load i32* %a 
%add = add i32 %1, 10 
store i32 %add, i32* %b 
br label %if.end 
if.else: 
%2 = load i32* %a 
%add1 = add i32 %2, 20 
store i32 %add1, i32* %b 
br label %if.end 
if.end: 
%3 = load i32* %b 
ret i32 %3 
}
用LLVM觀察DCE (2/5) 
88 
define i32 @foo() { 
entry: 
%a = alloca i32 
%b = alloca i32 
store i32 5, i32* %a 
%0 = load i32* %a 
%cmp = icmp sgt i32 %0, 10 
br i1 %cmp, label %if.then, label %if.else 
if.then: 
%1 = load i32* %a 
%add = add i32 %1, 10 
store i32 %add, i32* %b 
br label %if.end 
if.else: 
%2 = load i32* %a 
%add1 = add i32 %2, 20 
store i32 %add1, i32* %b 
br label %if.end 
if.end: 
%3 = load i32* %b 
ret i32 %3 
} 
opt dce.c -mem2reg -S 
define i32 @foo() { 
entry: 
%cmp = icmp sgt i32 5, 10 
br i1 %cmp, label %if.then, label %if.else 
if.then: 
%add = add i32 5, 10 
br label %if.end 
if.else: 
%add1 = add i32 5, 20 
br label %if.end 
if.end: 
%b.0 = phi i32 [ %add, %if.then ], 
[ %add1, %if.else ] 
ret i32 %b.0 
}
用LLVM觀察DCE (3/5) 
89 
define i32 @foo() { 
entry: 
%cmp = icmp sgt i32 5, 10 
br i1 %cmp, label %if.then, label %if.else 
if.then: 
%add = add i32 5, 10 
br label %if.end 
if.else: 
%add1 = add i32 5, 20 
br label %if.end 
if.end: %if.else, %if.then 
%b.0 = phi i32 [ %add, %if.then ], 
[ %add1, %if.else ] 
ret i32 %b.0 
} 
-constprop 
opt dce.ll -mem2reg -constprop -S 
define i32 @foo() { 
entry: 
br i1 false, label %if.then, 
label %if.else 
if.then: 
br label %if.end 
if.else: 
br label %if.end 
if.end: 
%b.0 = phi i32 [ 15, %if.then ], 
[ 25, %if.else ] 
ret i32 %b.0 
}
用LLVM觀察DCE (4/5) 
90 
-dce 
define i32 @foo() { 
entry: 
br i1 false, label %if.then, 
label %if.else 
if.then: 
br label %if.end 
if.else: 
br label %if.end 
if.end: 
%b.0 = phi i32 [ 15, %if.then ], 
[ 25, %if.else ] 
ret i32 %b.0 
} 
opt dce.ll -mem2reg -constprop -dce -S 
define i32 @foo() { 
entry: 
br i1 false, label %if.then, 
label %if.else 
if.then: 
br label %if.end 
if.else: 
br label %if.end 
if.end: 
%b.0 = phi i32 [ 15, %if.then ], 
[ 25, %if.else ] 
ret i32 %b.0 
}
用LLVM觀察DCE (4/5) 
91 
-dce 
define i32 @foo() { 
entry: 
br i1 false, label %if.then, 
label %if.else 
if.then: 
br label %if.end 
if.else: 
br label %if.end 
if.end: 
%b.0 = phi i32 [ 15, %if.then ], 
[ 25, %if.else ] 
ret i32 %b.0 
} 
define i32 @foo() { 
entry: 
br i1 false, label %if.then, 
label %if.else 
if.then: 
br label %if.end 
if.else: 
br label %if.end 
if.end: 
%b.0 = phi i32 [ 15, %if.then ], 
[ 25, %if.else ] 
ret i32 %b.0 
} 
看起來好像沒變化?? 
opt dce.ll -mem2reg -constprop -dce -S
用LLVM觀察DCE (4/5) 
92 
-dce 
define i32 @foo() { 
entry: 
br i1 false, label %if.then, 
label %if.else 
if.then: 
br label %if.end 
if.else: 
br label %if.end 
if.end: 
%b.0 = phi i32 [ 15, %if.then ], 
[ 25, %if.else ] 
ret i32 %b.0 
} 
define i32 @foo() { 
entry: 
br i1 false, label %if.then, 
label %if.else 
if.then: 
br label %if.end 
if.else: 
br label %if.end 
if.end: 
%b.0 = phi i32 [ 15, %if.then ], 
[ 25, %if.else ] 
ret i32 %b.0 
} 
看起來好像沒變化?? 
LLVM 將 CFG 化簡部份交給-simplifycfg pass
用LLVM觀察DCE (5/5) 
93 
-simplifycfg 
define i32 @foo() { 
entry: 
br i1 false, label %if.then, 
label %if.else 
if.then: 
br label %if.end 
if.else: 
br label %if.end 
if.end: 
%b.0 = phi i32 [ 15, %if.then ], 
[ 25, %if.else ] 
ret i32 %b.0 
} 
opt dce.ll -mem2reg -constprop -simplifycfg -S 
define i32 @foo() { 
entry: 
ret i32 25 
}
用LLVM觀察DCE - 2 (1/2) 
94 
-simplifycfg 
opt dce.ll -mem2reg -simplifycfg -S 
define i32 @foo() { 
entry: 
%cmp = icmp sgt i32 5, 10 
%add = add i32 5, 10 
%add1 = add i32 5, 20 
%b.0 = select i1 %cmp, i32 %add, 
i32 %add1 
ret i32 %b.0 
} 
define i32 @foo() { 
entry: 
%cmp = icmp sgt i32 5, 10 
br i1 %cmp, label %if.then, label %if.else 
if.then: 
%add = add i32 5, 10 
br label %if.end 
if.else: 
%add1 = add i32 5, 20 
br label %if.end 
if.end: %if.else, %if.then 
%b.0 = phi i32 [ %add, %if.then ], 
[ %add1, %if.else ] 
ret i32 %b.0 
}
用LLVM觀察DCE - 2 (2/2) 
95 
-constprop 
define i32 @foo() { 
entry: 
%cmp = icmp sgt i32 5, 10 
%add = add i32 5, 10 
%add1 = add i32 5, 20 
%b.0 = select i1 %cmp, i32 %add, 
i32 %add1 
ret i32 %b.0 
} 
opt dce.ll -mem2reg -simplifycfg -constprop -S 
define i32 @foo() { 
entry: 
ret i32 25 
}
CSE 
96 
• CSE:Common subexpression elimination 
– 把可以共用的部份共用!
CSE 
97 
a = b * c + g; 
d = b * c * e; 
t = b * c; 
a = t + g; 
d = t * e;
用LLVM觀察CSE (1/2) 
define i32 @foo(i32 %b, i32 %c, i32 %g, i32 %e) { 
entry: 
%mul = mul i32 %b, %c 
%add = add i32 %mul, %g 
%mul1 = mul i32 %b, %c 
%mul2 = mul i32 %mul1, %e 
%add3 = add i32 %add, %mul2 
ret i32 %add3 
} 
98 
int foo(int b, int c, int g, int e) 
{ 
int a = b * c + g; 
int d = b * c * e; 
return a + d; 
} 
clang -emit-llvm -S cse.c 
opt cse.ll -mem2reg -S
用LLVM觀察CSE (2/2) 
99 
opt cse.ll -mem2reg -early-cse -S 
define i32 @foo(i32 %b, i32 %c, i32 %g, i32 %e) { 
entry: 
%mul = mul i32 %b, %c 
%add = add i32 %mul, %g 
%mul1 = mul i32 %b, %c 
%mul2 = mul i32 %mul1, %e 
%add3 = add i32 %add, %mul2 
ret i32 %add3 
} 
define i32 @foo(i32 %b, i32 %c, i32 %g, i32 %e) { 
entry: 
%mul = mul i32 %b, %c 
%add = add i32 %mul, %g 
%mul2 = mul i32 %mul, %e 
%add3 = add i32 %add, %mul2 
ret i32 %add3 
} 
-early-cse
Loop Unroll 
100 
• Loop Unroll:迴圈展開 
– 跳躍指令在大多數架構下比一般運算指令貴 
– 展開後Loop index可能從變數變成常數 
sum = 0; 
for (i = 0; i < 3; ++i) 
sum = sum + i 
sum = 0; 
sum = sum + 0 
sum = sum + 1 
sum = sum + 2
用LLVM觀察Loop Unroll (1/8) 
101 
int add(int a, int b) 
{ 
return a + b; 
}i 
nt foo() 
{ 
int sum = 0; 
int i; 
for (i = 0; i < 3; ++i) 
sum = add(sum, i); 
return sum; 
} 
clang -emit-llvm -S for.c 
opt for.ll -mem2reg -S 
define i32 @add(i32 %a, i32 %b) { 
entry: 
%add = add i32 %a, %b 
ret i32 %add 
} 
define i32 @foo() { 
entry: 
br label %for.cond 
for.cond: 
%i.0 = phi i32 [ 0, %entry ], 
[ %inc, %for.inc ] 
%sum.0 = phi i32 [ 0, %entry ], 
[ %call, %for.inc ] 
%cmp = icmp slt i32 %i.0, 3 
br i1 %cmp, label %for.body, label %for.end 
for.body: 
%call = call i32 @add(i32 %sum.0, i32 %i.0) 
br label %for.inc 
for.inc: 
%inc = add i32 %i.0, 1 
br label %for.cond 
for.end: 
ret i32 %sum.0 
}
用LLVM觀察Loop Unroll (2/8) 
102 
define i32 @foo() { 
entry: 
br label %for.cond 
for.cond: 
%i.0 = phi i32 [ 0, %entry ], 
[ %inc, %for.inc ] 
%sum.0 = phi i32 [ 0, %entry ], 
[ %call, %for.inc ] 
%cmp = icmp slt i32 %i.0, 3 
br i1 %cmp, label %for.body, label %for.end 
for.body: 
%call = call i32 @add(i32 %sum.0, i32 %i.0) 
br label %for.inc 
for.inc: 
%inc = add i32 %i.0, 1 
br label %for.cond 
for.end: 
ret i32 %sum.0 
} 
opt for.ll -mem2reg -loop-unroll -S 
define i32 @foo() { 
entry: 
br label %for.cond 
for.cond: 
%i.0 = phi i32 [ 0, %entry ], 
[ %inc, %for.inc ] 
%sum.0 = phi i32 [ 0, %entry ], 
[ %call, %for.inc ] 
%cmp = icmp slt i32 %i.0, 3 
br i1 %cmp, label %for.body, label %for.end 
for.body: 
%call = call i32 @add(i32 %sum.0, i32 %i.0) 
br label %for.inc 
for.inc: 
%inc = add i32 %i.0, 1 
br label %for.cond 
for.end: 
%sum.0.lcssa = phi i32 [ %sum.0, %for.cond ] 
ret i32 %sum.0.lcssa 
} 
-loop-unroll
用LLVM觀察Loop Unroll (2/8) 
103 
define i32 @foo() { 
entry: 
br label %for.cond 
for.cond: 
%i.0 = phi i32 [ 0, %entry ], 
[ %inc, %for.inc ] 
%sum.0 = phi i32 [ 0, %entry ], 
[ %call, %for.inc ] 
%cmp = icmp slt i32 %i.0, 3 
br i1 %cmp, label %for.body, label %for.end 
for.body: 
%call = call i32 @add(i32 %sum.0, i32 %i.0) 
br label %for.inc 
for.inc: 
%inc = add i32 %i.0, 1 
br label %for.cond 
for.end: 
ret i32 %sum.0 
} 
-loop-unroll 
opt for.ll -mem2reg -loop-unroll -S 
define i32 @foo() { 
entry: 
br label %for.cond 
for.cond: 
%i.0 = phi i32 [ 0, %entry ], 
[ %inc, %for.inc ] 
%sum.0 = phi i32 [ 0, %entry ], 
[ %call, %for.inc ] 
%cmp = icmp slt i32 %i.0, 3 
br i1 %cmp, label %for.body, label %for.end 
for.body: 
%call = call i32 @add(i32 %sum.0, i32 %i.0) 
br label %for.inc 
for.inc: 
%inc = add i32 %i.0, 1 
br label %for.cond 
for.end: 
%sum.0.lcssa = phi i32 [ %sum.0, %for.cond ] 
ret i32 %sum.0.lcssa 
} 
似乎 Unroll 不開????
用LLVM觀察Loop Unroll (3/8) 
$ opt -mem2reg -S for.ll -loop-unroll -debug 
Args: opt -mem2reg -S for.ll -loop-unroll -debug 
Loop Unroll: F[foo] Loop %for.cond 
Loop Size = 8 
Can't unroll; loop not terminated by a 
conditional branch. 
104 
define i32 @foo() { 
entry: 
br label %for.cond 
for.cond: 
%i.0 = phi i32 [ 0, %entry ], 
[ %inc, %for.inc ] 
%sum.0 = phi i32 [ 0, %entry ], 
[ %call, %for.inc ] 
%cmp = icmp slt i32 %i.0, 3 
br i1 %cmp, label %for.body, label %for.end 
for.body: 
%call = call i32 @add(i32 %sum.0, i32 %i.0) 
br label %for.inc 
for.inc: 
%inc = add i32 %i.0, 1 
br label %for.cond 
for.end: 
ret i32 %sum.0 
} 
opt for.ll -mem2reg -loop-unroll -S -debug 
跟你抱怨這個 Loop, 
Loop Unroll Pass 認不得!? 
-loop-unroll 
-debug
用LLVM觀察Loop Unroll (4/8) 
105 
define i32 @foo() { 
entry: 
br label %for.cond 
for.cond: 
%i.0 = phi i32 [ 0, %entry ], 
[ %inc, %for.inc ] 
%sum.0 = phi i32 [ 0, %entry ], 
[ %call, %for.inc ] 
%cmp = icmp slt i32 %i.0, 3 
br i1 %cmp, label %for.body, label %for.end 
for.body: 
%call = call i32 @add(i32 %sum.0, i32 %i.0) 
br label %for.inc 
for.inc: 
%inc = add i32 %i.0, 1 
br label %for.cond 
for.end: 
ret i32 %sum.0 
} 
opt for.ll -mem2reg -loop-rotate -S 
define i32 @foo() { 
entry: 
br label %for.body 
for.body: 
%sum.02 = phi i32 [ 0, %entry ], 
[ %call, %for.inc ] 
%i.01 = phi i32 [ 0, %entry ], 
[ %inc, %for.inc ] 
%call = call i32 @add(i32 %sum.02, i32 %i.01) 
br label %for.inc 
for.inc: 
%inc = add i32 %i.01, 1 
%cmp = icmp slt i32 %inc, 3 
br i1 %cmp, label %for.body, label %for.end 
for.end: 
%sum.0.lcssa = phi i32 [ %call, %for.inc ] 
ret i32 %sum.0.lcssa 
} 
翻轉吧!迴圈! 
-loop-rorate
用LLVM觀察Loop Unroll (5/8) 
106 
opt for.ll -mem2reg -view-cfg -loop-rotate -view-cfg -S 
翻轉吧!迴圈! 
-loop-rorate
用LLVM觀察Loop Unroll (6/8) 
107 
-loop-unroll 
opt for.ll -mem2reg -loop-rotate -loop-unroll -view-cfg -S
用LLVM觀察Loop Unroll (7/8) 
define i32 @foo() { 
entry: 
%call = call i32 @add(i32 0, i32 0) 
%call.1 = call i32 @add(i32 %call, i32 1) 
%call.2 = call i32 @add(i32 %call.1, i32 2) 
ret i32 %call.2 
} 
108 
define i32 @foo() { 
entry: 
br label %for.body 
for.body: 
%call = call i32 @add(i32 0, i32 0) 
br label %for.inc 
for.inc: 
%call.1 = call i32 @add(i32 %call, i32 1) 
br label %for.inc.1 
for.inc.1: 
%call.2 = call i32 @add(i32 %call.1, i32 2) 
br label %for.inc.2 
for.inc.2: 
ret i32 %call.2 
} 
-simplifycfg 
opt for.ll -mem2reg -loop-rotate -loop-unroll -simplifycfg -view-cfg 
-S
用LLVM觀察Loop Unroll (8/8) 
109 
define i32 @add(i32 %a, i32 %b) { 
entry: 
%add = add i32 %a, %b 
ret i32 %add 
} 
define i32 @foo() { 
entry: 
%call = call i32 @add(i32 0, i32 0) 
%call.1 = call i32 @add(i32 %call, i32 1) 
%call.2 = call i32 @add(i32 %call.1, i32 2) 
ret i32 %call.2 
} 
opt for.ll -mem2reg -loop-rotate -loop-unroll -simplifycfg  
-inline -constprop -S 
-inline 
define i32 @foo() { 
entry: 
%add.i = add i32 1, 2 
ret i32 %add.i 
}
用LLVM觀察Loop Unroll (8/8) 
-constprop 
110 
define i32 @add(i32 %a, i32 %b) { 
entry: 
%add = add i32 %a, %b 
ret i32 %add 
} 
define i32 @foo() { 
entry: 
%call = call i32 @add(i32 0, i32 0) 
%call.1 = call i32 @add(i32 %call, i32 1) 
%call.2 = call i32 @add(i32 %call.1, i32 2) 
ret i32 %call.2 
} 
opt for.ll -mem2reg -loop-rotate -loop-unroll -simplifycfg  
-inline -constprop -S 
-inline 
define i32 @foo() { 
entry: 
%add.i = add i32 1, 2 
ret i32 %add.i 
} 
define i32 @foo() { 
entry: 
ret i32 3 
}
Compiler Optimization 
111 
• 編譯器不同最佳化之間可以交互作用 
• 順序也會影響最佳化結果
LLVM 
112 
• 透過 opt -help 可以看到
113
Overview of GCC Optimization Pass 
114 
$ gcc a.c -fdump-tree-all -fdump-rtl-all -O3 
$ ls a.c.* 
a.c.001t.tu a.c.059t.forwprop2 a.c.093t.sink a.c.138t.copyrename4 a.c.204r.outof_cfglayout 
a.c.003t.original a.c.060t.objsz1 a.c.096t.loop a.c.139t.crited2 a.c.205r.split1 
a.c.004t.gimple a.c.061t.alias a.c.097t.loopinit a.c.141t.uncprop1 a.c.206r.subreg2 
a.c.006t.omplower a.c.062t.retslot a.c.098t.lim1 a.c.142t.local-pure-const2 a.c.208r.mode_sw 
a.c.007t.lower a.c.063t.fre2 a.c.099t.copyprop5 a.c.168t.nrv a.c.209r.asmcons 
a.c.010t.eh a.c.064t.copyprop3 a.c.100t.dceloop1 a.c.169t.optimized a.c.213r.ira 
a.c.011t.cfg a.c.065t.mergephi2 a.c.101t.unswitch a.c.170r.expand a.c.214r.reload 
a.c.015t.ssa a.c.066t.vrp1 a.c.102t.sccp a.c.171r.vregs a.c.215r.postreload 
a.c.017t.inline_param1 a.c.067t.dce1 a.c.104t.ldist a.c.172r.into_cfglayout a.c.216r.gcse2 
a.c.018t.einline a.c.068t.cdce a.c.105t.copyprop6 a.c.173r.jump a.c.217r.split2 
a.c.019t.early_optimizations a.c.069t.cselim a.c.111t.ivcanon a.c.174r.subreg1 a.c.218r.ree 
a.c.020t.copyrename1 a.c.070t.ifcombine a.c.113t.ifcvt a.c.175r.dfinit a.c.221r.pro_and_epilogue 
a.c.021t.ccp1 a.c.071t.phiopt1 a.c.114t.vect a.c.176r.cse1 a.c.222r.dse2 
a.c.022t.forwprop1 a.c.072t.tailr2 a.c.115t.dceloop3 a.c.177r.fwprop1 a.c.223r.csa 
a.c.023t.ealias a.c.073t.ch a.c.116t.pcom a.c.178r.cprop1 a.c.224r.jump2 
a.c.024t.esra a.c.075t.cplxlower1 a.c.117t.cunroll a.c.179r.pre a.c.225r.peephole2 
a.c.025t.fre1 a.c.076t.sra a.c.118t.slp a.c.181r.cprop2 a.c.226r.ce3 
a.c.026t.copyprop1 a.c.077t.copyrename3 a.c.120t.ivopts a.c.184r.ce1 a.c.228r.cprop_hardreg 
a.c.027t.mergephi1 a.c.078t.dom1 a.c.121t.lim3 a.c.185r.reginfo a.c.229r.rtl_dce 
a.c.028t.cddce1 a.c.079t.isolate-paths a.c.122t.loopdone a.c.186r.loop2 a.c.230r.compgotos 
a.c.029t.eipa_sra a.c.080t.phicprop1 a.c.123t.veclower21 a.c.187r.loop2_init a.c.231r.bbro 
a.c.030t.tailr1 a.c.081t.dse1 a.c.125t.reassoc2 a.c.188r.loop2_invariant a.c.233r.split4 
a.c.031t.switchconv a.c.082t.reassoc1 a.c.126t.slsr a.c.189r.loop2_unswitch a.c.234r.sched2 
a.c.033t.profile_estimate a.c.083t.dce2 a.c.127t.dom2 a.c.192r.loop2_done a.c.236r.stack 
a.c.034t.local-pure-const1 a.c.084t.forwprop3 a.c.128t.phicprop2 a.c.194r.cprop3 a.c.237r.alignments 
a.c.035t.fnsplit a.c.085t.phiopt2 a.c.129t.vrp2 a.c.195r.cse2 a.c.239r.mach 
a.c.036t.release_ssa a.c.086t.strlen a.c.130t.cddce2 a.c.196r.dse1 a.c.240r.barriers 
a.c.037t.inline_param2 a.c.087t.ccp3 a.c.132t.dse2 a.c.197r.fwprop2 a.c.244r.shorten 
a.c.054t.copyrename2 a.c.088t.copyprop4 a.c.133t.forwprop4 a.c.199r.init-regs a.c.245r.nothrow 
a.c.055t.ccp2 a.c.089t.sincos a.c.134t.phiopt3 a.c.200r.ud_dce a.c.246r.dwarf2 
a.c.056t.copyprop2 a.c.090t.bswap a.c.135t.fab1 a.c.201r.combine a.c.247r.final 
a.c.057t.cunrolli a.c.091t.crited1 a.c.136t.widening_mul a.c.202r.ce2 a.c.248r.dfinish 
a.c.058t.phiprop a.c.092t.pre a.c.137t.tailc a.c.203r.bbpart a.c.249t.statistics 
共 165 個pass 的 dump file!
Propagation 
115 
$ gcc a.c -fdump-tree-all -fdump-rtl-all -O3 
$ ls a.c.* 
a.c.001t.tu a.c.059t.forwprop2 a.c.093t.sink a.c.138t.copyrename4 a.c.204r.outof_cfglayout 
a.c.003t.original a.c.060t.objsz1 a.c.096t.loop a.c.139t.crited2 a.c.205r.split1 
a.c.004t.gimple a.c.061t.alias a.c.097t.loopinit a.c.141t.uncprop1 a.c.206r.subreg2 
a.c.006t.omplower a.c.062t.retslot a.c.098t.lim1 a.c.142t.local-pure-const2 a.c.208r.mode_sw 
a.c.007t.lower a.c.063t.fre2 a.c.099t.copyprop5 a.c.168t.nrv a.c.209r.asmcons 
a.c.010t.eh a.c.064t.copyprop3 a.c.100t.dceloop1 a.c.169t.optimized a.c.213r.ira 
a.c.011t.cfg a.c.065t.mergephi2 a.c.101t.unswitch a.c.170r.expand a.c.214r.reload 
a.c.015t.ssa a.c.066t.vrp1 a.c.102t.sccp a.c.171r.vregs a.c.215r.postreload 
a.c.017t.inline_param1 a.c.067t.dce1 a.c.104t.ldist a.c.172r.into_cfglayout a.c.216r.gcse2 
a.c.018t.einline a.c.068t.cdce a.c.105t.copyprop6 a.c.173r.jump a.c.217r.split2 
a.c.019t.early_optimizations a.c.069t.cselim a.c.111t.ivcanon a.c.174r.subreg1 a.c.218r.ree 
a.c.020t.copyrename1 a.c.070t.ifcombine a.c.113t.ifcvt a.c.175r.dfinit a.c.221r.pro_and_epilogue 
a.c.021t.ccp1 a.c.071t.phiopt1 a.c.114t.vect a.c.176r.cse1 a.c.222r.dse2 
a.c.022t.forwprop1 a.c.072t.tailr2 a.c.115t.dceloop3 a.c.177r.fwprop1 a.c.223r.csa 
a.c.023t.ealias a.c.073t.ch a.c.116t.pcom a.c.178r.cprop1 a.c.224r.jump2 
a.c.024t.esra a.c.075t.cplxlower1 a.c.117t.cunroll a.c.179r.pre a.c.225r.peephole2 
a.c.025t.fre1 a.c.076t.sra a.c.118t.slp a.c.181r.cprop2 a.c.226r.ce3 
a.c.026t.copyprop1 a.c.077t.copyrename3 a.c.120t.ivopts a.c.184r.ce1 a.c.228r.cprop_hardreg 
a.c.027t.mergephi1 a.c.078t.dom1 a.c.121t.lim3 a.c.185r.reginfo a.c.229r.rtl_dce 
a.c.028t.cddce1 a.c.079t.isolate-paths a.c.122t.loopdone a.c.186r.loop2 a.c.230r.compgotos 
a.c.029t.eipa_sra a.c.080t.phicprop1 a.c.123t.veclower21 a.c.187r.loop2_init a.c.231r.bbro 
a.c.030t.tailr1 a.c.081t.dse1 a.c.125t.reassoc2 a.c.188r.loop2_invariant a.c.233r.split4 
a.c.031t.switchconv a.c.082t.reassoc1 a.c.126t.slsr a.c.189r.loop2_unswitch a.c.234r.sched2 
a.c.033t.profile_estimate a.c.083t.dce2 a.c.127t.dom2 a.c.192r.loop2_done a.c.236r.stack 
a.c.034t.local-pure-const1 a.c.084t.forwprop3 a.c.128t.phicprop2 a.c.194r.cprop3 a.c.237r.alignments 
a.c.035t.fnsplit a.c.085t.phiopt2 a.c.129t.vrp2 a.c.195r.cse2 a.c.239r.mach 
a.c.036t.release_ssa a.c.086t.strlen a.c.130t.cddce2 a.c.196r.dse1 a.c.240r.barriers 
a.c.037t.inline_param2 a.c.087t.ccp3 a.c.132t.dse2 a.c.197r.fwprop2 a.c.244r.shorten 
a.c.054t.copyrename2 a.c.088t.copyprop4 a.c.133t.forwprop4 a.c.199r.init-regs a.c.245r.nothrow 
a.c.055t.ccp2 a.c.089t.sincos a.c.134t.phiopt3 a.c.200r.ud_dce a.c.246r.dwarf2 
a.c.056t.copyprop2 a.c.090t.bswap a.c.135t.fab1 a.c.201r.combine a.c.247r.final 
a.c.057t.cunrolli a.c.091t.crited1 a.c.136t.widening_mul a.c.202r.ce2 a.c.248r.dfinish 
a.c.058t.phiprop a.c.092t.pre a.c.137t.tailc a.c.203r.bbpart a.c.249t.statistics 
28 / 165 的 pass 在 Propagation!
Inline 
116 
$ gcc a.c -fdump-tree-all -fdump-rtl-all -O3 
$ ls a.c.* 
a.c.001t.tu a.c.059t.forwprop2 a.c.093t.sink a.c.138t.copyrename4 a.c.204r.outof_cfglayout 
a.c.003t.original a.c.060t.objsz1 a.c.096t.loop a.c.139t.crited2 a.c.205r.split1 
a.c.004t.gimple a.c.061t.alias a.c.097t.loopinit a.c.141t.uncprop1 a.c.206r.subreg2 
a.c.006t.omplower a.c.062t.retslot a.c.098t.lim1 a.c.142t.local-pure-const2 a.c.208r.mode_sw 
a.c.007t.lower a.c.063t.fre2 a.c.099t.copyprop5 a.c.168t.nrv a.c.209r.asmcons 
a.c.010t.eh a.c.064t.copyprop3 a.c.100t.dceloop1 a.c.169t.optimized a.c.213r.ira 
a.c.011t.cfg a.c.065t.mergephi2 a.c.101t.unswitch a.c.170r.expand a.c.214r.reload 
a.c.015t.ssa a.c.066t.vrp1 a.c.102t.sccp a.c.171r.vregs a.c.215r.postreload 
a.c.017t.inline_param1 a.c.067t.dce1 a.c.104t.ldist a.c.172r.into_cfglayout a.c.216r.gcse2 
a.c.018t.einline a.c.068t.cdce a.c.105t.copyprop6 a.c.173r.jump a.c.217r.split2 
a.c.019t.early_optimizations a.c.069t.cselim a.c.111t.ivcanon a.c.174r.subreg1 a.c.218r.ree 
a.c.020t.copyrename1 a.c.070t.ifcombine a.c.113t.ifcvt a.c.175r.dfinit a.c.221r.pro_and_epilogue 
a.c.021t.ccp1 a.c.071t.phiopt1 a.c.114t.vect a.c.176r.cse1 a.c.222r.dse2 
a.c.022t.forwprop1 a.c.072t.tailr2 a.c.115t.dceloop3 a.c.177r.fwprop1 a.c.223r.csa 
a.c.023t.ealias a.c.073t.ch a.c.116t.pcom a.c.178r.cprop1 a.c.224r.jump2 
a.c.024t.esra a.c.075t.cplxlower1 a.c.117t.cunroll a.c.179r.pre a.c.225r.peephole2 
a.c.025t.fre1 a.c.076t.sra a.c.118t.slp a.c.181r.cprop2 a.c.226r.ce3 
a.c.026t.copyprop1 a.c.077t.copyrename3 a.c.120t.ivopts a.c.184r.ce1 a.c.228r.cprop_hardreg 
a.c.027t.mergephi1 a.c.078t.dom1 a.c.121t.lim3 a.c.185r.reginfo a.c.229r.rtl_dce 
a.c.028t.cddce1 a.c.079t.isolate-paths a.c.122t.loopdone a.c.186r.loop2 a.c.230r.compgotos 
a.c.029t.eipa_sra a.c.080t.phicprop1 a.c.123t.veclower21 a.c.187r.loop2_init a.c.231r.bbro 
a.c.030t.tailr1 a.c.081t.dse1 a.c.125t.reassoc2 a.c.188r.loop2_invariant a.c.233r.split4 
a.c.031t.switchconv a.c.082t.reassoc1 a.c.126t.slsr a.c.189r.loop2_unswitch a.c.234r.sched2 
a.c.033t.profile_estimate a.c.083t.dce2 a.c.127t.dom2 a.c.192r.loop2_done a.c.236r.stack 
a.c.034t.local-pure-const1 a.c.084t.forwprop3 a.c.128t.phicprop2 a.c.194r.cprop3 a.c.237r.alignments 
a.c.035t.fnsplit a.c.085t.phiopt2 a.c.129t.vrp2 a.c.195r.cse2 a.c.239r.mach 
a.c.036t.release_ssa a.c.086t.strlen a.c.130t.cddce2 a.c.196r.dse1 a.c.240r.barriers 
a.c.037t.inline_param2 a.c.087t.ccp3 a.c.132t.dse2 a.c.197r.fwprop2 a.c.244r.shorten 
a.c.054t.copyrename2 a.c.088t.copyprop4 a.c.133t.forwprop4 a.c.199r.init-regs a.c.245r.nothrow 
a.c.055t.ccp2 a.c.089t.sincos a.c.134t.phiopt3 a.c.200r.ud_dce a.c.246r.dwarf2 
a.c.056t.copyprop2 a.c.090t.bswap a.c.135t.fab1 a.c.201r.combine a.c.247r.final 
a.c.057t.cunrolli a.c.091t.crited1 a.c.136t.widening_mul a.c.202r.ce2 a.c.248r.dfinish 
a.c.058t.phiprop a.c.092t.pre a.c.137t.tailc a.c.203r.bbpart a.c.249t.statistics 
3 / 165 的 pass 在 Inline!
DCE 
117 
$ gcc a.c -fdump-tree-all -fdump-rtl-all -O3 
$ ls a.c.* 
13 / 165 的 pass 在 DCE! 
a.c.001t.tu a.c.059t.forwprop2 a.c.093t.sink a.c.138t.copyrename4 a.c.204r.outof_cfglayout 
a.c.003t.original a.c.060t.objsz1 a.c.096t.loop a.c.139t.crited2 a.c.205r.split1 
a.c.004t.gimple a.c.061t.alias a.c.097t.loopinit a.c.141t.uncprop1 a.c.206r.subreg2 
a.c.006t.omplower a.c.062t.retslot a.c.098t.lim1 a.c.142t.local-pure-const2 a.c.208r.mode_sw 
a.c.007t.lower a.c.063t.fre2 a.c.099t.copyprop5 a.c.168t.nrv a.c.209r.asmcons 
a.c.010t.eh a.c.064t.copyprop3 a.c.100t.dceloop1 a.c.169t.optimized a.c.213r.ira 
a.c.011t.cfg a.c.065t.mergephi2 a.c.101t.unswitch a.c.170r.expand a.c.214r.reload 
a.c.015t.ssa a.c.066t.vrp1 a.c.102t.sccp a.c.171r.vregs a.c.215r.postreload 
a.c.017t.inline_param1 a.c.067t.dce1 a.c.104t.ldist a.c.172r.into_cfglayout a.c.216r.gcse2 
a.c.018t.einline a.c.068t.cdce a.c.105t.copyprop6 a.c.173r.jump a.c.217r.split2 
a.c.019t.early_optimizations a.c.069t.cselim a.c.111t.ivcanon a.c.174r.subreg1 a.c.218r.ree 
a.c.020t.copyrename1 a.c.070t.ifcombine a.c.113t.ifcvt a.c.175r.dfinit a.c.221r.pro_and_epilogue 
a.c.021t.ccp1 a.c.071t.phiopt1 a.c.114t.vect a.c.176r.cse1 a.c.222r.dse2 
a.c.022t.forwprop1 a.c.072t.tailr2 a.c.115t.dceloop3 a.c.177r.fwprop1 a.c.223r.csa 
a.c.023t.ealias a.c.073t.ch a.c.116t.pcom a.c.178r.cprop1 a.c.224r.jump2 
a.c.024t.esra a.c.075t.cplxlower1 a.c.117t.cunroll a.c.179r.pre a.c.225r.peephole2 
a.c.025t.fre1 a.c.076t.sra a.c.118t.slp a.c.181r.cprop2 a.c.226r.ce3 
a.c.026t.copyprop1 a.c.077t.copyrename3 a.c.120t.ivopts a.c.184r.ce1 a.c.228r.cprop_hardreg 
a.c.027t.mergephi1 a.c.078t.dom1 a.c.121t.lim3 a.c.185r.reginfo a.c.229r.rtl_dce 
a.c.028t.cddce1 a.c.079t.isolate-paths a.c.122t.loopdone a.c.186r.loop2 a.c.230r.compgotos 
a.c.029t.eipa_sra a.c.080t.phicprop1 a.c.123t.veclower21 a.c.187r.loop2_init a.c.231r.bbro 
a.c.030t.tailr1 a.c.081t.dse1 a.c.125t.reassoc2 a.c.188r.loop2_invariant a.c.233r.split4 
a.c.031t.switchconv a.c.082t.reassoc1 a.c.126t.slsr a.c.189r.loop2_unswitch a.c.234r.sched2 
a.c.033t.profile_estimate a.c.083t.dce2 a.c.127t.dom2 a.c.192r.loop2_done a.c.236r.stack 
a.c.034t.local-pure-const1 a.c.084t.forwprop3 a.c.128t.phicprop2 a.c.194r.cprop3 a.c.237r.alignments 
a.c.035t.fnsplit a.c.085t.phiopt2 a.c.129t.vrp2 a.c.195r.cse2 a.c.239r.mach 
a.c.036t.release_ssa a.c.086t.strlen a.c.130t.cddce2 a.c.196r.dse1 a.c.240r.barriers 
a.c.037t.inline_param2 a.c.087t.ccp3 a.c.132t.dse2 a.c.197r.fwprop2 a.c.244r.shorten 
a.c.054t.copyrename2 a.c.088t.copyprop4 a.c.133t.forwprop4 a.c.199r.init-regs a.c.245r.nothrow 
a.c.055t.ccp2 a.c.089t.sincos a.c.134t.phiopt3 a.c.200r.ud_dce a.c.246r.dwarf2 
a.c.056t.copyprop2 a.c.090t.bswap a.c.135t.fab1 a.c.201r.combine a.c.247r.final 
a.c.057t.cunrolli a.c.091t.crited1 a.c.136t.widening_mul a.c.202r.ce2 a.c.248r.dfinish 
a.c.058t.phiprop a.c.092t.pre a.c.137t.tailc a.c.203r.bbpart a.c.249t.statistics
CSE 
118 
$ gcc a.c -fdump-tree-all -fdump-rtl-all -O3 
$ ls a.c.* 
4 / 165 的 pass 在 CSE! 
a.c.001t.tu a.c.059t.forwprop2 a.c.093t.sink a.c.138t.copyrename4 a.c.204r.outof_cfglayout 
a.c.003t.original a.c.060t.objsz1 a.c.096t.loop a.c.139t.crited2 a.c.205r.split1 
a.c.004t.gimple a.c.061t.alias a.c.097t.loopinit a.c.141t.uncprop1 a.c.206r.subreg2 
a.c.006t.omplower a.c.062t.retslot a.c.098t.lim1 a.c.142t.local-pure-const2 a.c.208r.mode_sw 
a.c.007t.lower a.c.063t.fre2 a.c.099t.copyprop5 a.c.168t.nrv a.c.209r.asmcons 
a.c.010t.eh a.c.064t.copyprop3 a.c.100t.dceloop1 a.c.169t.optimized a.c.213r.ira 
a.c.011t.cfg a.c.065t.mergephi2 a.c.101t.unswitch a.c.170r.expand a.c.214r.reload 
a.c.015t.ssa a.c.066t.vrp1 a.c.102t.sccp a.c.171r.vregs a.c.215r.postreload 
a.c.017t.inline_param1 a.c.067t.dce1 a.c.104t.ldist a.c.172r.into_cfglayout a.c.216r.gcse2 
a.c.018t.einline a.c.068t.cdce a.c.105t.copyprop6 a.c.173r.jump a.c.217r.split2 
a.c.019t.early_optimizations a.c.069t.cselim a.c.111t.ivcanon a.c.174r.subreg1 a.c.218r.ree 
a.c.020t.copyrename1 a.c.070t.ifcombine a.c.113t.ifcvt a.c.175r.dfinit a.c.221r.pro_and_epilogue 
a.c.021t.ccp1 a.c.071t.phiopt1 a.c.114t.vect a.c.176r.cse1 a.c.222r.dse2 
a.c.022t.forwprop1 a.c.072t.tailr2 a.c.115t.dceloop3 a.c.177r.fwprop1 a.c.223r.csa 
a.c.023t.ealias a.c.073t.ch a.c.116t.pcom a.c.178r.cprop1 a.c.224r.jump2 
a.c.024t.esra a.c.075t.cplxlower1 a.c.117t.cunroll a.c.179r.pre a.c.225r.peephole2 
a.c.025t.fre1 a.c.076t.sra a.c.118t.slp a.c.181r.cprop2 a.c.226r.ce3 
a.c.026t.copyprop1 a.c.077t.copyrename3 a.c.120t.ivopts a.c.184r.ce1 a.c.228r.cprop_hardreg 
a.c.027t.mergephi1 a.c.078t.dom1 a.c.121t.lim3 a.c.185r.reginfo a.c.229r.rtl_dce 
a.c.028t.cddce1 a.c.079t.isolate-paths a.c.122t.loopdone a.c.186r.loop2 a.c.230r.compgotos 
a.c.029t.eipa_sra a.c.080t.phicprop1 a.c.123t.veclower21 a.c.187r.loop2_init a.c.231r.bbro 
a.c.030t.tailr1 a.c.081t.dse1 a.c.125t.reassoc2 a.c.188r.loop2_invariant a.c.233r.split4 
a.c.031t.switchconv a.c.082t.reassoc1 a.c.126t.slsr a.c.189r.loop2_unswitch a.c.234r.sched2 
a.c.033t.profile_estimate a.c.083t.dce2 a.c.127t.dom2 a.c.192r.loop2_done a.c.236r.stack 
a.c.034t.local-pure-const1 a.c.084t.forwprop3 a.c.128t.phicprop2 a.c.194r.cprop3 a.c.237r.alignments 
a.c.035t.fnsplit a.c.085t.phiopt2 a.c.129t.vrp2 a.c.195r.cse2 a.c.239r.mach 
a.c.036t.release_ssa a.c.086t.strlen a.c.130t.cddce2 a.c.196r.dse1 a.c.240r.barriers 
a.c.037t.inline_param2 a.c.087t.ccp3 a.c.132t.dse2 a.c.197r.fwprop2 a.c.244r.shorten 
a.c.054t.copyrename2 a.c.088t.copyprop4 a.c.133t.forwprop4 a.c.199r.init-regs a.c.245r.nothrow 
a.c.055t.ccp2 a.c.089t.sincos a.c.134t.phiopt3 a.c.200r.ud_dce a.c.246r.dwarf2 
a.c.056t.copyprop2 a.c.090t.bswap a.c.135t.fab1 a.c.201r.combine a.c.247r.final 
a.c.057t.cunrolli a.c.091t.crited1 a.c.136t.widening_mul a.c.202r.ce2 a.c.248r.dfinish 
a.c.058t.phiprop a.c.092t.pre a.c.137t.tailc a.c.203r.bbpart a.c.249t.statistics
Unroll 
119 
$ gcc a.c -fdump-tree-all -fdump-rtl-all -O3 
$ ls a.c.* 
a.c.001t.tu a.c.059t.forwprop2 a.c.093t.sink a.c.138t.copyrename4 a.c.204r.outof_cfglayout 
a.c.003t.original a.c.060t.objsz1 a.c.096t.loop a.c.139t.crited2 a.c.205r.split1 
a.c.004t.gimple a.c.061t.alias a.c.097t.loopinit a.c.141t.uncprop1 a.c.206r.subreg2 
a.c.006t.omplower a.c.062t.retslot a.c.098t.lim1 a.c.142t.local-pure-const2 a.c.208r.mode_sw 
a.c.007t.lower a.c.063t.fre2 a.c.099t.copyprop5 a.c.168t.nrv a.c.209r.asmcons 
a.c.010t.eh a.c.064t.copyprop3 a.c.100t.dceloop1 a.c.169t.optimized a.c.213r.ira 
a.c.011t.cfg a.c.065t.mergephi2 a.c.101t.unswitch a.c.170r.expand a.c.214r.reload 
a.c.015t.ssa a.c.066t.vrp1 a.c.102t.sccp a.c.171r.vregs a.c.215r.postreload 
a.c.017t.inline_param1 a.c.067t.dce1 a.c.104t.ldist a.c.172r.into_cfglayout a.c.216r.gcse2 
a.c.018t.einline a.c.068t.cdce a.c.105t.copyprop6 a.c.173r.jump a.c.217r.split2 
a.c.019t.early_optimizations a.c.069t.cselim a.c.111t.ivcanon a.c.174r.subreg1 a.c.218r.ree 
a.c.020t.copyrename1 a.c.070t.ifcombine a.c.113t.ifcvt a.c.175r.dfinit a.c.221r.pro_and_epilogue 
a.c.021t.ccp1 a.c.071t.phiopt1 a.c.114t.vect a.c.176r.cse1 a.c.222r.dse2 
a.c.022t.forwprop1 a.c.072t.tailr2 a.c.115t.dceloop3 a.c.177r.fwprop1 a.c.223r.csa 
a.c.023t.ealias a.c.073t.ch a.c.116t.pcom a.c.178r.cprop1 a.c.224r.jump2 
a.c.024t.esra a.c.075t.cplxlower1 a.c.117t.cunroll a.c.179r.pre a.c.225r.peephole2 
a.c.025t.fre1 a.c.076t.sra a.c.118t.slp a.c.181r.cprop2 a.c.226r.ce3 
a.c.026t.copyprop1 a.c.077t.copyrename3 a.c.120t.ivopts a.c.184r.ce1 a.c.228r.cprop_hardreg 
a.c.027t.mergephi1 a.c.078t.dom1 a.c.121t.lim3 a.c.185r.reginfo a.c.229r.rtl_dce 
a.c.028t.cddce1 a.c.079t.isolate-paths a.c.122t.loopdone a.c.186r.loop2 a.c.230r.compgotos 
a.c.029t.eipa_sra a.c.080t.phicprop1 a.c.123t.veclower21 a.c.187r.loop2_init a.c.231r.bbro 
a.c.030t.tailr1 a.c.081t.dse1 a.c.125t.reassoc2 a.c.188r.loop2_invariant a.c.233r.split4 
a.c.031t.switchconv a.c.082t.reassoc1 a.c.126t.slsr a.c.189r.loop2_unswitch a.c.234r.sched2 
a.c.033t.profile_estimate a.c.083t.dce2 a.c.127t.dom2 a.c.192r.loop2_done a.c.236r.stack 
a.c.034t.local-pure-const1 a.c.084t.forwprop3 a.c.128t.phicprop2 a.c.194r.cprop3 a.c.237r.alignments 
a.c.035t.fnsplit a.c.085t.phiopt2 a.c.129t.vrp2 a.c.195r.cse2 a.c.239r.mach 
a.c.036t.release_ssa a.c.086t.strlen a.c.130t.cddce2 a.c.196r.dse1 a.c.240r.barriers 
a.c.037t.inline_param2 a.c.087t.ccp3 a.c.132t.dse2 a.c.197r.fwprop2 a.c.244r.shorten 
a.c.054t.copyrename2 a.c.088t.copyprop4 a.c.133t.forwprop4 a.c.199r.init-regs a.c.245r.nothrow 
a.c.055t.ccp2 a.c.089t.sincos a.c.134t.phiopt3 a.c.200r.ud_dce a.c.246r.dwarf2 
a.c.056t.copyprop2 a.c.090t.bswap a.c.135t.fab1 a.c.201r.combine a.c.247r.final 
a.c.057t.cunrolli a.c.091t.crited1 a.c.136t.widening_mul a.c.202r.ce2 a.c.248r.dfinish 
a.c.058t.phiprop a.c.092t.pre a.c.137t.tailc a.c.203r.bbpart a.c.249t.statistics 
2 / 165 的 pass 在 Unroll!
Propagation + DCE + CSE + Inline + Unroll 
120 
50 / 165 ! 
a.c.001t.tu a.c.059t.forwprop2 a.c.093t.sink a.c.138t.copyrename4 a.c.204r.outof_cfglayout 
a.c.003t.original a.c.060t.objsz1 a.c.096t.loop a.c.139t.crited2 a.c.205r.split1 
a.c.004t.gimple a.c.061t.alias a.c.097t.loopinit a.c.141t.uncprop1 a.c.206r.subreg2 
a.c.006t.omplower a.c.062t.retslot a.c.098t.lim1 a.c.142t.local-pure-const2 a.c.208r.mode_sw 
a.c.007t.lower a.c.063t.fre2 a.c.099t.copyprop5 a.c.168t.nrv a.c.209r.asmcons 
a.c.010t.eh a.c.064t.copyprop3 a.c.100t.dceloop1 a.c.169t.optimized a.c.213r.ira 
a.c.011t.cfg a.c.065t.mergephi2 a.c.101t.unswitch a.c.170r.expand a.c.214r.reload 
a.c.015t.ssa a.c.066t.vrp1 a.c.102t.sccp a.c.171r.vregs a.c.215r.postreload 
a.c.017t.inline_param1 a.c.067t.dce1 a.c.104t.ldist a.c.172r.into_cfglayout a.c.216r.gcse2 
a.c.018t.einline a.c.068t.cdce a.c.105t.copyprop6 a.c.173r.jump a.c.217r.split2 
a.c.019t.early_optimizations a.c.069t.cselim a.c.111t.ivcanon a.c.174r.subreg1 a.c.218r.ree 
a.c.020t.copyrename1 a.c.070t.ifcombine a.c.113t.ifcvt a.c.175r.dfinit a.c.221r.pro_and_epilogue 
a.c.021t.ccp1 a.c.071t.phiopt1 a.c.114t.vect a.c.176r.cse1 a.c.222r.dse2 
a.c.022t.forwprop1 a.c.072t.tailr2 a.c.115t.dceloop3 a.c.177r.fwprop1 a.c.223r.csa 
a.c.023t.ealias a.c.073t.ch a.c.116t.pcom a.c.178r.cprop1 a.c.224r.jump2 
a.c.024t.esra a.c.075t.cplxlower1 a.c.117t.cunroll a.c.179r.pre a.c.225r.peephole2 
a.c.025t.fre1 a.c.076t.sra a.c.118t.slp a.c.181r.cprop2 a.c.226r.ce3 
a.c.026t.copyprop1 a.c.077t.copyrename3 a.c.120t.ivopts a.c.184r.ce1 a.c.228r.cprop_hardreg 
a.c.027t.mergephi1 a.c.078t.dom1 a.c.121t.lim3 a.c.185r.reginfo a.c.229r.rtl_dce 
a.c.028t.cddce1 a.c.079t.isolate-paths a.c.122t.loopdone a.c.186r.loop2 a.c.230r.compgotos 
a.c.029t.eipa_sra a.c.080t.phicprop1 a.c.123t.veclower21 a.c.187r.loop2_init a.c.231r.bbro 
a.c.030t.tailr1 a.c.081t.dse1 a.c.125t.reassoc2 a.c.188r.loop2_invariant a.c.233r.split4 
a.c.031t.switchconv a.c.082t.reassoc1 a.c.126t.slsr a.c.189r.loop2_unswitch a.c.234r.sched2 
a.c.033t.profile_estimate a.c.083t.dce2 a.c.127t.dom2 a.c.192r.loop2_done a.c.236r.stack 
a.c.034t.local-pure-const1 a.c.084t.forwprop3 a.c.128t.phicprop2 a.c.194r.cprop3 a.c.237r.alignments 
a.c.035t.fnsplit a.c.085t.phiopt2 a.c.129t.vrp2 a.c.195r.cse2 a.c.239r.mach 
a.c.036t.release_ssa a.c.086t.strlen a.c.130t.cddce2 a.c.196r.dse1 a.c.240r.barriers 
a.c.037t.inline_param2 a.c.087t.ccp3 a.c.132t.dse2 a.c.197r.fwprop2 a.c.244r.shorten 
a.c.054t.copyrename2 a.c.088t.copyprop4 a.c.133t.forwprop4 a.c.199r.init-regs a.c.245r.nothrow 
a.c.055t.ccp2 a.c.089t.sincos a.c.134t.phiopt3 a.c.200r.ud_dce a.c.246r.dwarf2 
a.c.056t.copyprop2 a.c.090t.bswap a.c.135t.fab1 a.c.201r.combine a.c.247r.final 
a.c.057t.cunrolli a.c.091t.crited1 a.c.136t.widening_mul a.c.202r.ce2 a.c.248r.dfinish 
a.c.058t.phiprop a.c.092t.pre a.c.137t.tailc a.c.203r.bbpart a.c.249t.statistics
Propagation + DCE + CSE + Inline + Unroll 
121 
50 / 165 ! 
a.c.001t.tu a.c.059t.forwprop2 a.c.093t.sink a.c.138t.copyrename4 a.c.204r.outof_cfglayout 
a.c.003t.original a.c.060t.objsz1 a.c.096t.loop a.c.139t.crited2 a.c.205r.split1 
a.c.004t.gimple a.c.061t.alias a.c.097t.loopinit a.c.141t.uncprop1 a.c.206r.subreg2 
a.c.006t.omplower a.c.062t.retslot a.c.098t.lim1 a.c.142t.local-pure-const2 a.c.208r.mode_sw 
a.c.007t.lower a.c.063t.fre2 a.c.099t.copyprop5 a.c.168t.nrv a.c.209r.asmcons 
a.c.010t.eh a.c.064t.copyprop3 a.c.100t.dceloop1 a.c.169t.optimized a.c.213r.ira 
a.c.011t.cfg a.c.065t.mergephi2 a.c.101t.unswitch a.c.170r.expand a.c.214r.reload 
a.c.015t.ssa a.c.066t.vrp1 a.c.102t.sccp a.c.171r.vregs a.c.215r.postreload 
a.c.017t.inline_param1 a.c.067t.dce1 a.c.104t.ldist a.c.172r.into_cfglayout a.c.216r.gcse2 
a.c.018t.einline a.c.068t.cdce a.c.105t.copyprop6 a.c.173r.jump a.c.217r.split2 
a.c.019t.early_optimizations a.c.069t.cselim a.c.111t.ivcanon a.c.174r.subreg1 a.c.218r.ree 
a.c.020t.copyrename1 a.c.021t.ccp1 a.c.022t.forwprop1 聽a.c.完070t.ifcombine 這次a.c.113t.的ifcvt 分享a.c.175r.dfinit a.c.221r.pro_and_epilogue 
a.c.071t.phiopt1 a.c.114t.vect a.c.176r.等cse1 於已a.c.222r.經 
dse2 
a.c.072t.tailr2 a.c.115t.dceloop3 a.c.177r.fwprop1 a.c.223r.csa 
a.c.023t.ealias a.c.073t.ch a.c.116t.pcom a.c.178r.cprop1 a.c.224r.jump2 
a.c.024t.esra 略懂a.c.075t.cplxlower1 a.c.117t.cunroll a.c.179r.pre a.c.225r.peephole2 
a.c.025t.fre1 a.c.076t.sra a.c.118t.slp a.c.181r.cprop2 a.c.226r.ce3 
a.c.026t.copyprop1 a.c.約077t.copyrename3 三分a.c.120t.之ivopts 一a.GCCc.184r.ce1 惹!!! 
a.c.228r.cprop_hardreg 
a.c.027t.mergephi1 a.c.078t.dom1 a.c.121t.lim3 a.c.185r.reginfo a.c.229r.rtl_dce 
a.c.028t.cddce1 a.c.079t.isolate-paths a.c.122t.loopdone a.c.186r.loop2 a.c.230r.compgotos 
a.c.029t.eipa_sra a.c.080t.phicprop1 a.c.123t.veclower21 a.c.187r.loop2_init a.c.231r.bbro 
a.c.030t.tailr1 a.c.081t.dse1 a.c.125t.reassoc2 a.c.188r.loop2_invariant a.c.233r.split4 
a.c.031t.switchconv a.c.082t.reassoc1 a.c.126t.slsr a.c.189r.loop2_unswitch a.c.234r.sched2 
a.c.033t.profile_estimate a.c.083t.dce2 a.c.127t.dom2 a.c.192r.loop2_done a.c.236r.stack 
a.c.034t.local-pure-const1 a.c.084t.forwprop3 a.c.128t.phicprop2 a.c.194r.cprop3 a.c.237r.alignments 
a.c.035t.fnsplit a.c.085t.phiopt2 a.c.129t.vrp2 a.c.195r.cse2 a.c.239r.mach 
a.c.036t.release_ssa a.c.086t.strlen a.c.130t.cddce2 a.c.196r.dse1 a.c.240r.barriers 
a.c.037t.inline_param2 a.c.087t.ccp3 a.c.132t.dse2 a.c.197r.fwprop2 a.c.244r.shorten 
a.c.054t.copyrename2 a.c.088t.copyprop4 a.c.133t.forwprop4 a.c.199r.init-regs a.c.245r.nothrow 
a.c.055t.ccp2 a.c.089t.sincos a.c.134t.phiopt3 a.c.200r.ud_dce a.c.246r.dwarf2 
a.c.056t.copyprop2 a.c.090t.bswap a.c.135t.fab1 a.c.201r.combine a.c.247r.final 
a.c.057t.cunrolli a.c.091t.crited1 a.c.136t.widening_mul a.c.202r.ce2 a.c.248r.dfinish 
a.c.058t.phiprop a.c.092t.pre a.c.137t.tailc a.c.203r.bbpart a.c.249t.statistics
Machine Dependent Compiler Optimization 
122 
機器相依的編譯器最佳化
Machine Dependent Compiler Optimization 
• Register Allocation 
• Instruction Scheduling 
• Peephole Optimization 
123
124 
Advanced Compiler Optimization 
高階編譯器最佳化
Advanced Compiler Optimization 
125 
• Loop Optimization 
• Inter Procedure Optimization 
• Auto Vectorization 
• Auto Parallelization
總結 
126 
• Compiler Optimization很有趣, 但開始 
玩之前一定要先讀一些基礎理論 
• LLVM則是一個相當好的理論與實作的接軌
安第斯山脈工商工商時時間間 
127
工商時間 
安第斯山脈工商時間 
128 
好山好水好無聊 
準時下班氣氛佳
安第斯山脈工商時間 
129 
好山好水好無聊 
準時下班氣氛佳 
Open Source++ 
工商時間
工商時間 
安第斯山脈工商時間 
130 
好山好水好無聊 
準時下班氣氛佳 
Open Source++ 
Toolchain 長期徵人中~
131

More Related Content

What's hot

並列化による高速化
並列化による高速化 並列化による高速化
並列化による高速化
sakura-mike
 
/proc/irq/&lt;irq>/smp_affinity
/proc/irq/&lt;irq>/smp_affinity/proc/irq/&lt;irq>/smp_affinity
/proc/irq/&lt;irq>/smp_affinityTakuya ASADA
 
Qemu device prototyping
Qemu device prototypingQemu device prototyping
Qemu device prototyping
Yan Vugenfirer
 
Scapyで作る・解析するパケット
Scapyで作る・解析するパケットScapyで作る・解析するパケット
Scapyで作る・解析するパケット
Takaaki Hoyo
 
DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet Processing
Michelle Holley
 
QEMU - Binary Translation
QEMU - Binary Translation QEMU - Binary Translation
QEMU - Binary Translation
Jiann-Fuh Liaw
 
JVM Mechanics: Understanding the JIT's Tricks
JVM Mechanics: Understanding the JIT's TricksJVM Mechanics: Understanding the JIT's Tricks
JVM Mechanics: Understanding the JIT's Tricks
Doug Hawkins
 
用Raspberry Pi 學Linux I2C Driver
用Raspberry Pi 學Linux I2C Driver用Raspberry Pi 學Linux I2C Driver
用Raspberry Pi 學Linux I2C Driver
艾鍗科技
 
Distributed Compiler Icecc
Distributed Compiler IceccDistributed Compiler Icecc
Distributed Compiler Icecc
SZ Lin
 
EBPF and Linux Networking
EBPF and Linux NetworkingEBPF and Linux Networking
EBPF and Linux Networking
PLUMgrid
 
Debian or Yocto Project? Which is the best for your Embedded Linux project?
Debian or Yocto Project? Which is the best for your Embedded Linux project?Debian or Yocto Project? Which is the best for your Embedded Linux project?
Debian or Yocto Project? Which is the best for your Embedded Linux project?
Chris Simmonds
 
20111015 勉強会 (PCIe / SR-IOV)
20111015 勉強会 (PCIe / SR-IOV)20111015 勉強会 (PCIe / SR-IOV)
20111015 勉強会 (PCIe / SR-IOV)
Kentaro Ebisawa
 
Qemu JIT Code Generator and System Emulation
Qemu JIT Code Generator and System EmulationQemu JIT Code Generator and System Emulation
Qemu JIT Code Generator and System Emulation
National Cheng Kung University
 
DPDKによる高速コンテナネットワーキング
DPDKによる高速コンテナネットワーキングDPDKによる高速コンテナネットワーキング
DPDKによる高速コンテナネットワーキング
Tomoya Hibi
 
Design and Implementation of GCC Register Allocation
Design and Implementation of GCC Register AllocationDesign and Implementation of GCC Register Allocation
Design and Implementation of GCC Register Allocation
Kito Cheng
 
プログラムを高速化する話Ⅱ 〜GPGPU編〜
プログラムを高速化する話Ⅱ 〜GPGPU編〜プログラムを高速化する話Ⅱ 〜GPGPU編〜
プログラムを高速化する話Ⅱ 〜GPGPU編〜
京大 マイコンクラブ
 
GPUが100倍速いという神話をぶち殺せたらいいな ver.2013
GPUが100倍速いという神話をぶち殺せたらいいな ver.2013GPUが100倍速いという神話をぶち殺せたらいいな ver.2013
GPUが100倍速いという神話をぶち殺せたらいいな ver.2013
Ryo Sakamoto
 
from Binary to Binary: How Qemu Works
from Binary to Binary: How Qemu Worksfrom Binary to Binary: How Qemu Works
from Binary to Binary: How Qemu Works
Zhen Wei
 
深入淺出C語言
深入淺出C語言深入淺出C語言
深入淺出C語言
Simen Li
 

What's hot (20)

並列化による高速化
並列化による高速化 並列化による高速化
並列化による高速化
 
/proc/irq/&lt;irq>/smp_affinity
/proc/irq/&lt;irq>/smp_affinity/proc/irq/&lt;irq>/smp_affinity
/proc/irq/&lt;irq>/smp_affinity
 
Qemu device prototyping
Qemu device prototypingQemu device prototyping
Qemu device prototyping
 
Scapyで作る・解析するパケット
Scapyで作る・解析するパケットScapyで作る・解析するパケット
Scapyで作る・解析するパケット
 
DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet Processing
 
QEMU - Binary Translation
QEMU - Binary Translation QEMU - Binary Translation
QEMU - Binary Translation
 
JVM Mechanics: Understanding the JIT's Tricks
JVM Mechanics: Understanding the JIT's TricksJVM Mechanics: Understanding the JIT's Tricks
JVM Mechanics: Understanding the JIT's Tricks
 
用Raspberry Pi 學Linux I2C Driver
用Raspberry Pi 學Linux I2C Driver用Raspberry Pi 學Linux I2C Driver
用Raspberry Pi 學Linux I2C Driver
 
Distributed Compiler Icecc
Distributed Compiler IceccDistributed Compiler Icecc
Distributed Compiler Icecc
 
EBPF and Linux Networking
EBPF and Linux NetworkingEBPF and Linux Networking
EBPF and Linux Networking
 
llvm入門
llvm入門llvm入門
llvm入門
 
Debian or Yocto Project? Which is the best for your Embedded Linux project?
Debian or Yocto Project? Which is the best for your Embedded Linux project?Debian or Yocto Project? Which is the best for your Embedded Linux project?
Debian or Yocto Project? Which is the best for your Embedded Linux project?
 
20111015 勉強会 (PCIe / SR-IOV)
20111015 勉強会 (PCIe / SR-IOV)20111015 勉強会 (PCIe / SR-IOV)
20111015 勉強会 (PCIe / SR-IOV)
 
Qemu JIT Code Generator and System Emulation
Qemu JIT Code Generator and System EmulationQemu JIT Code Generator and System Emulation
Qemu JIT Code Generator and System Emulation
 
DPDKによる高速コンテナネットワーキング
DPDKによる高速コンテナネットワーキングDPDKによる高速コンテナネットワーキング
DPDKによる高速コンテナネットワーキング
 
Design and Implementation of GCC Register Allocation
Design and Implementation of GCC Register AllocationDesign and Implementation of GCC Register Allocation
Design and Implementation of GCC Register Allocation
 
プログラムを高速化する話Ⅱ 〜GPGPU編〜
プログラムを高速化する話Ⅱ 〜GPGPU編〜プログラムを高速化する話Ⅱ 〜GPGPU編〜
プログラムを高速化する話Ⅱ 〜GPGPU編〜
 
GPUが100倍速いという神話をぶち殺せたらいいな ver.2013
GPUが100倍速いという神話をぶち殺せたらいいな ver.2013GPUが100倍速いという神話をぶち殺せたらいいな ver.2013
GPUが100倍速いという神話をぶち殺せたらいいな ver.2013
 
from Binary to Binary: How Qemu Works
from Binary to Binary: How Qemu Worksfrom Binary to Binary: How Qemu Works
from Binary to Binary: How Qemu Works
 
深入淺出C語言
深入淺出C語言深入淺出C語言
深入淺出C語言
 

Viewers also liked

COSCUP 2014 : open source compiler 戰國時代的軍備競賽
COSCUP 2014 : open source compiler 戰國時代的軍備競賽COSCUP 2014 : open source compiler 戰國時代的軍備競賽
COSCUP 2014 : open source compiler 戰國時代的軍備競賽
Kito Cheng
 
from Source to Binary: How GNU Toolchain Works
from Source to Binary: How GNU Toolchain Worksfrom Source to Binary: How GNU Toolchain Works
from Source to Binary: How GNU Toolchain Works
National Cheng Kung University
 
How A Compiler Works: GNU Toolchain
How A Compiler Works: GNU ToolchainHow A Compiler Works: GNU Toolchain
How A Compiler Works: GNU Toolchain
National Cheng Kung University
 
Making Linux do Hard Real-time
Making Linux do Hard Real-timeMaking Linux do Hard Real-time
Making Linux do Hard Real-time
National Cheng Kung University
 
COSCUP2016 - LLVM框架、由淺入淺
COSCUP2016 - LLVM框架、由淺入淺COSCUP2016 - LLVM框架、由淺入淺
COSCUP2016 - LLVM框架、由淺入淺
hydai
 
Seven waystouseturtle pycon2009
Seven waystouseturtle pycon2009Seven waystouseturtle pycon2009
Seven waystouseturtle pycon2009
A Jorge Garcia
 
A Numerical Method for the Evaluation of Kolmogorov Complexity, An alternativ...
A Numerical Method for the Evaluation of Kolmogorov Complexity, An alternativ...A Numerical Method for the Evaluation of Kolmogorov Complexity, An alternativ...
A Numerical Method for the Evaluation of Kolmogorov Complexity, An alternativ...
Hector Zenil
 
Compiler2016 by abcdabcd987
Compiler2016 by abcdabcd987Compiler2016 by abcdabcd987
Compiler2016 by abcdabcd987
乐群 陈
 
IHaskell 快速入門
IHaskell 快速入門IHaskell 快速入門
IHaskell 快速入門
Carl Su
 
粒子物理與天文物理學簡介
粒子物理與天文物理學簡介粒子物理與天文物理學簡介
粒子物理與天文物理學簡介
Yuan CHAO
 
Turtle Geometry the Python Way
Turtle Geometry the Python WayTurtle Geometry the Python Way
Turtle Geometry the Python Way
Steven Battle
 
Under the Hood of the Testarossa JIT Compiler
Under the Hood of the Testarossa JIT CompilerUnder the Hood of the Testarossa JIT Compiler
Under the Hood of the Testarossa JIT Compiler
Mark Stoodley
 
WebKit and Blink: open development powering the HTML5 revolution
WebKit and Blink: open development powering the HTML5 revolutionWebKit and Blink: open development powering the HTML5 revolution
WebKit and Blink: open development powering the HTML5 revolution
juanjosanchezpenas
 
蒙地卡羅模擬與志願運算
蒙地卡羅模擬與志願運算蒙地卡羅模擬與志願運算
蒙地卡羅模擬與志願運算
Yuan CHAO
 
C++ Builder 程式撰寫基礎 / C++ Builder Basic
C++ Builder 程式撰寫基礎 / C++ Builder Basic C++ Builder 程式撰寫基礎 / C++ Builder Basic
C++ Builder 程式撰寫基礎 / C++ Builder Basic
YKLee3434
 
淺談探索 Linux 系統設計之道
淺談探索 Linux 系統設計之道 淺談探索 Linux 系統設計之道
淺談探索 Linux 系統設計之道
National Cheng Kung University
 
olibc: Another C Library optimized for Embedded Linux
olibc: Another C Library optimized for Embedded Linuxolibc: Another C Library optimized for Embedded Linux
olibc: Another C Library optimized for Embedded Linux
National Cheng Kung University
 
Build Programming Language Runtime with LLVM
Build Programming Language Runtime with LLVMBuild Programming Language Runtime with LLVM
Build Programming Language Runtime with LLVM
National Cheng Kung University
 
JVM JIT-compiler overview @ JavaOne Moscow 2013
JVM JIT-compiler overview @ JavaOne Moscow 2013JVM JIT-compiler overview @ JavaOne Moscow 2013
JVM JIT-compiler overview @ JavaOne Moscow 2013
Vladimir Ivanov
 
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
National Cheng Kung University
 

Viewers also liked (20)

COSCUP 2014 : open source compiler 戰國時代的軍備競賽
COSCUP 2014 : open source compiler 戰國時代的軍備競賽COSCUP 2014 : open source compiler 戰國時代的軍備競賽
COSCUP 2014 : open source compiler 戰國時代的軍備競賽
 
from Source to Binary: How GNU Toolchain Works
from Source to Binary: How GNU Toolchain Worksfrom Source to Binary: How GNU Toolchain Works
from Source to Binary: How GNU Toolchain Works
 
How A Compiler Works: GNU Toolchain
How A Compiler Works: GNU ToolchainHow A Compiler Works: GNU Toolchain
How A Compiler Works: GNU Toolchain
 
Making Linux do Hard Real-time
Making Linux do Hard Real-timeMaking Linux do Hard Real-time
Making Linux do Hard Real-time
 
COSCUP2016 - LLVM框架、由淺入淺
COSCUP2016 - LLVM框架、由淺入淺COSCUP2016 - LLVM框架、由淺入淺
COSCUP2016 - LLVM框架、由淺入淺
 
Seven waystouseturtle pycon2009
Seven waystouseturtle pycon2009Seven waystouseturtle pycon2009
Seven waystouseturtle pycon2009
 
A Numerical Method for the Evaluation of Kolmogorov Complexity, An alternativ...
A Numerical Method for the Evaluation of Kolmogorov Complexity, An alternativ...A Numerical Method for the Evaluation of Kolmogorov Complexity, An alternativ...
A Numerical Method for the Evaluation of Kolmogorov Complexity, An alternativ...
 
Compiler2016 by abcdabcd987
Compiler2016 by abcdabcd987Compiler2016 by abcdabcd987
Compiler2016 by abcdabcd987
 
IHaskell 快速入門
IHaskell 快速入門IHaskell 快速入門
IHaskell 快速入門
 
粒子物理與天文物理學簡介
粒子物理與天文物理學簡介粒子物理與天文物理學簡介
粒子物理與天文物理學簡介
 
Turtle Geometry the Python Way
Turtle Geometry the Python WayTurtle Geometry the Python Way
Turtle Geometry the Python Way
 
Under the Hood of the Testarossa JIT Compiler
Under the Hood of the Testarossa JIT CompilerUnder the Hood of the Testarossa JIT Compiler
Under the Hood of the Testarossa JIT Compiler
 
WebKit and Blink: open development powering the HTML5 revolution
WebKit and Blink: open development powering the HTML5 revolutionWebKit and Blink: open development powering the HTML5 revolution
WebKit and Blink: open development powering the HTML5 revolution
 
蒙地卡羅模擬與志願運算
蒙地卡羅模擬與志願運算蒙地卡羅模擬與志願運算
蒙地卡羅模擬與志願運算
 
C++ Builder 程式撰寫基礎 / C++ Builder Basic
C++ Builder 程式撰寫基礎 / C++ Builder Basic C++ Builder 程式撰寫基礎 / C++ Builder Basic
C++ Builder 程式撰寫基礎 / C++ Builder Basic
 
淺談探索 Linux 系統設計之道
淺談探索 Linux 系統設計之道 淺談探索 Linux 系統設計之道
淺談探索 Linux 系統設計之道
 
olibc: Another C Library optimized for Embedded Linux
olibc: Another C Library optimized for Embedded Linuxolibc: Another C Library optimized for Embedded Linux
olibc: Another C Library optimized for Embedded Linux
 
Build Programming Language Runtime with LLVM
Build Programming Language Runtime with LLVMBuild Programming Language Runtime with LLVM
Build Programming Language Runtime with LLVM
 
JVM JIT-compiler overview @ JavaOne Moscow 2013
JVM JIT-compiler overview @ JavaOne Moscow 2013JVM JIT-compiler overview @ JavaOne Moscow 2013
JVM JIT-compiler overview @ JavaOne Moscow 2013
 
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
 

淺談編譯器最佳化技術

  • 1. 淺談Compiler最佳化技術 Hsinchu Tech Chat Group Date : Dec 7th, 2014 Kito Cheng kito.cheng@gmail.com
  • 2. 2 自我介紹 安第斯山脈 Compiler Team 專業打雜工
  • 5. 5 Compilation Flow [1] Compilers: Principles, Techniques, and Tools (2nd Edition) p.5
  • 6. 6 Compilation Flow 通常大學部編譯器課程僅能 涵蓋 Parser 部份 以及陽春的 Code Generation [1] Compilers: Principles, Techniques, and Tools (2nd Edition) p.5
  • 7. 7 Compilation Flow 但 Compiler 超好玩超神奇的部份 其實都在最佳化的地方 [1] Compilers: Principles, Techniques, and Tools (2nd Edition) p.5
  • 8. 8 Compilation Flow 但 Compiler 超好玩超神奇的部份 其實都在最佳化的地方 透過最佳化, 程式可以變得又小又快! [1] Compilers: Principles, Techniques, and Tools (2nd Edition) p.5
  • 9. 9 Compilation Flow [1] Compilers: Principles, Techniques, and Tools (2nd Edition) p.5
  • 10. 10 • 這次分享基本上不會涉及太多高深理論, 僅會透過介紹概念並透過範例來講解 • 使用 LLVM 來作為說明以及展示的輔助工 具
  • 12. 基礎知識惡補 12 • Basic Block • Control Flow Graph • Static Single Assignment Form
  • 13. Basic Block 13 • 單一進入點, 單一出口點的程式區段 • http://en.wikipedia.org/wiki/Basic_bl ock
  • 14. Control Flow Graph 14 • 簡稱CFG, 簡單來說就是程式的流程圖 • http://en.wikipedia.org/wiki/Control_ flow_graph
  • 15. Basic Block 15 int foo (ini n) { int ret; if (n > 10) ret = n * 2; else ret = n + 2; return ret; }
  • 16. Basic Block 16 int foo (ini n) { int ret; if (n > 10) ret = n * 2; else ret = n + 2; return ret; }
  • 17. Basic Block 17 int foo (ini n) { int ret; if (n > 10) ret = n * 2; else ret = n + 2; return ret; } int ret; if (n > 10) ret = n * 2; ret = n + 2; return ret;
  • 18. CFG 18 int foo (ini n) { int ret; if (n > 10) ret = n * 2; else ret = n + 2; return ret; } int ret; if (n > 10) ret = n * 2; ret = n + 2; return ret;
  • 19. Basic Block 19 int sum (int n) { int ret = 0; int i; for (i = 0; i < n; ++i) ret += i; return ret; }
  • 20. Basic Block 20 int sum (int n) { int ret = 0; int i; for (i = 0; i < n; ++i) ret += i; return ret; }
  • 21. Basic Block 21 int sum (int n) { int ret = 0; int i; for (i = 0; i < n; ++i) ret += i; return ret; } int ret = 0; int i; i = 0; i < n; ret += i; ++i return ret
  • 22. CFG 22 int sum (int n) { int ret = 0; int i; for (i = 0; i < n; ++i) ret += i; return ret; } int ret = 0; int i; i = 0; i < n; ret += i; ++i return ret
  • 23. Static Single Assignment 23 • 將變數標上版本 • 每個值只會賦值/寫入一次 • http://en.wikipedia.org/wiki/Static_s ingle_assignment_form
  • 24. SSA 24 int foo () { int ret; ret = 10; ret = 20; return ret; }
  • 25. SSA 25 int foo () { int ret; ret = 10; ret = 20; return ret; } int foo () { int ret; ret1 = 10; ret2 = 20; return ret2; } 每次賦值都會一個版本號
  • 26. SSA 26 int foo () { int ret; ret = 10; ret = 20; return ret; } int foo () { int ret; ret1 = 10; ret2 = 20; return ret2; } 每次賦值都會一個版本號 標完後可以馬上知道 是使用哪個運算式的結果
  • 27. SSA 27 int foo (ini n) { int ret; if (n > 10) ret = n * 2; else ret = n + 2; return ret; }
  • 28. SSA 28 int foo (ini n) { int ret; if (n > 10) ret = n * 2; else ret = n + 2; return ret; } int foo (ini n) { int ret; if (n > 10) ret1 = n * 2; else ret2 = n + 2; return ret?; } 程式中有分歧點會合時 無法判定是從何而來
  • 29. SSA 29 int foo (ini n) { int ret; if (n > 10) ret = n * 2; else ret = n + 2; return ret; } int foo (ini n) { int ret; if (n > 10) ret1 = n * 2; else ret2 = n + 2; ret3 = Φ (ret1, ret2) return ret3; } 此時需要使用Φ來 處理這種情況, 表示值的定義 需由程式流程決定 並給予新的版本號
  • 30. 30 L L V M
  • 31. LLVM 31 • 好用好玩而且最近很夯的 Compiler, 安 裝方法如下: – sudo apt-get install llvm clang xdot – sudo yum install llvm clang python-xdot python-setuptools
  • 32. LLVM 32 • 好用好玩而且最近很夯的 Compiler, 安 裝方法如下: – sudo apt-get install llvm clang xdot – sudo yum install llvm clang python-xdot python-setuptools xdot 是要看圖用的
  • 33. LLVM 33 • 好用好玩而且最近很夯的 Compiler, 安 裝方法如下: – sudo apt-get install llvm clang xdot – sudo yum install llvm clang python-xdot python-setuptools xdot 是要看圖用的 這個嘛...Fedora 套件系統 相依性沒設定好, xdot 的相依套件
  • 34. LLVM 34 • 好用好玩而且最近很夯的 Compiler, 安 裝方法如下: – sudo apt-get install llvm clang xdot – sudo yum install llvm clang python-xdot python-setuptools – 不是 apt-get 或 yum ? 那就假設你是高手 會自己想辦法XD
  • 35. LLVM 35 • 好用好玩而且最近很夯的 Compiler, 安 裝方法如下: – sudo apt-get install llvm clang xdot – sudo yum install llvm clang python-xdot python-setuptools – 不是 apt-get 或 yum ? 那就假設你是高手 會自己想辦法XD – Windows !? 聽說官網有安裝檔?
  • 36. LLVM 36 • 好用好玩而且最近很夯的 Compiler, 安 裝方法如下: – sudo apt-get install llvm clang xdot – sudo yum install llvm clang python-xdot python-setuptools – 不是 apt-get 或 yum ? 那就假設你是高手 會自己想辦法XD – Windows !? 聽說官網有安裝檔? – 建議自己 build, 不然會沒有部份debug功能
  • 37. LLVM IR 37 • v = operation type op1, op2, opn... – %sum = add i32 %op1, %op2 運算元 型態 運算子們 運算結果
  • 38. 空空的LLVM函數 38 define void @empty() { ret void } 宣告函數的起手式 回傳型態 參數列 @函數名稱 回傳 + 型態
  • 39. 有一個參數的LLVM函數 39 參數列, 有一個參數叫 %a define void @arg1(i32 %a) { ret void }
  • 40. 有一個參數並且直接回傳 的LLVM函數 40 回傳值是 i32 define i32 @arg1(i32 %a) { ret i32 %a } 回傳 + 型態 + 回傳值
  • 41. 有一個參數並且回傳其參數加十 的LLVM函數 41 define i32 @arg1(i32 %a) { %t = add i32 %a, 10 ret i32 %t } %a加10放到%1
  • 42. LLVM IR 42 • SSA-Based IR – %sum = add i32 %op1, %op2 – %sum = mul i32 %op1, %op2 – error: multiple definition of local value named 'sum'
  • 43. SSA!? 43 • 對編譯器來講 SSA Form 很友善, 但對 於正常人來說寫 SSA Form 不太直覺...
  • 44. SSA!? 44 • 對編譯器來講 SSA Form 很友善, 但對 於正常人來說寫 SSA Form 不太直覺... – 習慣Functional programming者例外...XD
  • 45. SSA!? 45 • 對編譯器來講 SSA Form 很友善, 但對 於正常人來說寫 SSA Form 不太直覺... – 習慣Functional programming者例外...XD • 手動插入PHI 更是件麻煩事
  • 46. alloca 46 • 用來產生區域變數 – 分配到的空間放到 stack • 使用上有點類似C語言的malloc, 但概念不太一 樣
  • 47. alloca 47 define void @foo() { %var = alloca i32 ret void } 所產生的位置, 型別 可以看作是一個i32*
  • 48. alloca 48 • 每次存取都必須透過 load/store – 但在最佳化過程中, 若非必要則會變為 Register (透過mem2reg pass) • 若為 array 或必須對其取位址, 則可能 無法變成 Register
  • 49. alloca/store 49 define void @foo() { %var = alloca i32 store i32 10, i32* %var ret void } 要存的值與型別型別跟要存的目標位置
  • 50. alloca/load 50 define void @foo() { %var = alloca i32 store i32 10, i32* %var %t0 = load i32* %var ret void } 讀取回來的值型別跟要讀取的目標位置
  • 51. LLVM/Clang 51 • 今天的分享中只會使用以下兩個工具: – clang : 把 c 變成 LLVM IR – opt : 進行最佳化以及觀察的工具
  • 52. View CFG by LLVM 52 • clang foo.c -S -emit-llvm • opt foo.ll -veiw-cfg int foo(int a, int b) { if (a > b) return a; else return b; }
  • 53. View CFG by LLVM 53 垃圾指令有點多, 但在觀察階段開最佳化, 又會干擾學習
  • 54. View CFG by LLVM 54 垃圾指令有點多, 但在觀察階段開最佳化, 又會干擾學習 opt foo.ll -O1 -veiw-cfg 開完最佳化後剩三道指令一個BB...
  • 55. opt 使用注意事項 (1/3) 55 • 參數的位置很重要!! opt foo.ll -view-cfg -O1 先秀出 CFG 再進行最佳化 opt foo.ll -O1 -view-cfg 先進行最佳化再來看 CFG
  • 56. opt 使用注意事項 (2/3) 56 • 參數可以重複下 opt foo.ll -view-cfg -O1 -view-cfg 先秀出 CFG 再進行最佳化 最後再看一次 CFG
  • 57. opt 使用注意事項 (3/3) 57 • 參數可以重複下, 最佳化也可以重複作 opt foo.ll -O1 -view-cfg -O1 -view-cfg 再進行最佳化 進行最佳化
  • 58. mem2reg 58 • mem2reg: 不必要的 alloca 以及 load/store 砍掉 • 並且把程式變得比較有 SSA Form 的樣子
  • 59. mem2reg opt foo.ll -mem2reg -view-cfg 59 phi node 出現了! 並且也將 alloca 以及 load/store 砍光
  • 60. 60 Compiler Optimization 編譯器最佳化
  • 61. Propagation 61 • Propagation: 傳遞 – Constant Propagation – Copy Propagation
  • 62. Constant Propagation 62 int foo(int a) { int magic_num = 10; return a + magic_num; } int foo(int a) { int magic_num = 10; return a + 10; }
  • 63. Constant Propagation 63 opt foo.ll -mem2reg -view-cfg int foo(int a) { int magic_num = 10; return a + magic_num; } 這種最佳化太基本了, 在mem2reg過程順便作掉 int foo(int a) { int magic_num = 10; return a + 10; }
  • 64. Constant Propagation 64 int foo(int a) { int magic_num = 10; return a + magic_num; } int foo(int a) { int magic_num = 10; return a + 10; } 千萬不要覺得寫成右邊那樣 會比較快就寫一堆 該死的 Magic Number!!!!
  • 65. Copy Propagation 65 b = a c = b b = a c = a
  • 66. Constant Folding 66 • Constant Folding: 常數折疊! – 若運算對象都是常數,那就先算出來!
  • 67. Constant Folding 67 • Constant Folding: 常數折疊! – 若運算對象都是常數,那就先算出來! • a = 123 + 456
  • 68. Constant Folding 68 • Constant Folding: 常數折疊! – 若運算對象都是常數,那就先算出來! • a = 123 + 456 – a = 579
  • 69. Constant Folding 69 • Constant Folding: 常數折疊! – 若運算對象都是常數,那就先算出來! • a = 123 + 456 – a = 579 • 程式中不一定有一堆這種常數運算, 但經 過Constant Propagation 後會慢慢出現
  • 70. Constant Folding 70 a = 10 b = 100 + a
  • 71. Constant Folding 71 a = 10 b = 100 + a a = 10 b = 100 + 10 Constant Propagation
  • 72. Constant Folding 72 a = 10 b = 100 + a a = 10 b = 100 + 10 a = 10 b = 110 Constant Propagation Constant Folding
  • 73. 73 • 程式中哪來中這麼多常數可以玩!? • Propagation跟Folding都是基礎小招, 與其它最佳化搭配起來可發揮最大效用!
  • 74. 74 • LLVM這幾樣基礎最佳化都是順便做的, 難 以獨立觀察... • Copy/Constant Propagation 基本上都會 在 mem2reg 過程中順便處理掉
  • 75. 觀察 Constant Folding 75 • Constant Folding 則可以在 LLVM 的 Constant Propagation Pass 中處理 define i32 @folding() { %t = add i32 10, 20 ret i32 %t } define i32 @folding() { ret i32 30 } opt -S cfolding.ll -constprop
  • 76. Function Inline 76 • Inline: 行內函數? 內嵌函數? • 概念就是把函數內容複製一份到呼叫端 • 節省掉函數的呼叫並且可探索更多的最佳 化機會!
  • 77. Inline + Propagation 77 • Inline後原本參數的傳遞變成單純的拷貝 行為 – Copy Propagation – Constant Propagation
  • 78. Inline + Propagation 78 int add(int a, int b) { return a + b; } int foo(int n){ int sum = 0; int i, t; for (i = 0; i < n ;++i) { t = add(10, 20); sum = add(sum, i); sum = add(sum, t); } return sum; }
  • 79. Inline + Propagation 79 int add(int a, int b) { return a + b; } int foo(int n){ int sum = 0; int i, t; for (i = 0; i < n ;++i) { t = add(10, 20); sum = add(sum, i); sum = add(sum, t); } return sum; } define i32 @add(i32 %a, i32 %b) { %1 = add i32 %a, %b ret i32 %1 } define i32 @foo(i32 %n) { br label %1 ; <label>:1 %sum.0 = phi i32 [ 0, %0 ], [ %6, %7 ] %i.0 = phi i32 [ 0, %0 ], [ %8, %7 ] %2 = icmp slt i32 %i.0, %n br i1 %2, label %3, label %9 ; <label>:3 %4 = call i32 @add(i32 10, i32 20) %5 = call i32 @add(i32 %sum.0, i32 %i.0) %6 = call i32 @add(i32 %5, i32 %4) br label %7 ; <label>:7 %8 = add i32 %i.0, 1 br label %1 ; <label>:9 ret i32 %sum.0 } clang -emit-llvm -S inline.c opt inline.ll -mem2reg -S
  • 80. Inline + Propagation 80 define i32 @add(i32 %a, i32 %b) { %1 = add i32 %a, %b ret i32 %1 } define i32 @foo(i32 %n) { br label %1 ; <label>:1 %sum.0 = phi i32 [ 0, %0 ], [ %6, %7 ] %i.0 = phi i32 [ 0, %0 ], [ %8, %7 ] %2 = icmp slt i32 %i.0, %n br i1 %2, label %3, label %9 ; <label>:3 %4 = call i32 @add(i32 10, i32 20) %5 = call i32 @add(i32 %sum.0, i32 %i.0) %6 = call i32 @add(i32 %5, i32 %4) br label %7 ; <label>:7 %8 = add i32 %i.0, 1 br label %1 ; <label>:9 ret i32 %sum.0 } define i32 @foo(i32 %n) { br label %1 ; <label>:1 %sum.0 = phi i32 [ 0, %0 ], [ %5, %6 ] %i.0 = phi i32 [ 0, %0 ], [ %7, %6 ] %2 = icmp slt i32 %i.0, %n br i1 %2, label %3, label %8 ; <label>:3 %4 = add i32 %sum.0, %i.0 %5 = add i32 %4, 30 br label %6 ; <label>:6 %7 = add i32 %i.0, 1 br label %1 ; <label>:8 ret i32 %sum.0 } opt inline.ll -mem2reg -inline -S
  • 81. DCE 81 • DCE: Dead Code Elimination, 死碼消除? • 在經過前面介紹的幾樣最佳化後, 慢慢的 會出現一些冗於的程式碼, 以及一些明顯 永遠不會成立的跳躍條件
  • 82. DCE 82 int foo() { a = 5; if (a > 10) b = 10; else b = 20; return b; }
  • 83. DCE 83 int foo() { a = 5; if (a > 10) b = 10; else b = 20; return b; } int foo() { a = 5; if (5 > 10) b = 10; else b = 20; return b; } Constant Propagation
  • 84. DCE 84 int foo() { a = 5; if (a > 10) b = 10; else b = 20; return b; } int foo() { a = 5; if (5 > 10) b = 10; else b = 20; return b; } int foo() { a = 5; if (false) b = 10; else b = 20; return b; } Constant Propagation Constant Folding
  • 85. DCE 85 int foo() { a = 5; if (a > 10) b = 10; else b = 20; return b; } int foo() { a = 5; if (5 > 10) b = 10; else b = 20; return b; } int foo() { a = 5; if (false) b = 10; else b = 20; return b; } Constant Propagation int foo() { b = 20; return b; } Constant Folding DCE
  • 86. DCE 86 int foo() { a = 5; if (a > 10) b = 10; else b = 20; return b; } int foo() { a = 5; if (5 > 10) b = 10; else b = 20; return b; } int foo() { a = 5; if (false) b = 10; else b = 20; return b; } int foo() { b = 20; return b; } int foo() { return 20; } Constant Propagation Constant Folding Constant DCE Propagation
  • 87. 用LLVM觀察DCE (1/5) 87 int foo() { int a; int b; a = 5; if (a > 10) b = a + 10; else b = a + 20; return b; } clang -S -emit-llvm dce.c define i32 @foo() { entry: %a = alloca i32 %b = alloca i32 store i32 5, i32* %a %0 = load i32* %a %cmp = icmp sgt i32 %0, 10 br i1 %cmp, label %if.then, label %if.else if.then: %1 = load i32* %a %add = add i32 %1, 10 store i32 %add, i32* %b br label %if.end if.else: %2 = load i32* %a %add1 = add i32 %2, 20 store i32 %add1, i32* %b br label %if.end if.end: %3 = load i32* %b ret i32 %3 }
  • 88. 用LLVM觀察DCE (2/5) 88 define i32 @foo() { entry: %a = alloca i32 %b = alloca i32 store i32 5, i32* %a %0 = load i32* %a %cmp = icmp sgt i32 %0, 10 br i1 %cmp, label %if.then, label %if.else if.then: %1 = load i32* %a %add = add i32 %1, 10 store i32 %add, i32* %b br label %if.end if.else: %2 = load i32* %a %add1 = add i32 %2, 20 store i32 %add1, i32* %b br label %if.end if.end: %3 = load i32* %b ret i32 %3 } opt dce.c -mem2reg -S define i32 @foo() { entry: %cmp = icmp sgt i32 5, 10 br i1 %cmp, label %if.then, label %if.else if.then: %add = add i32 5, 10 br label %if.end if.else: %add1 = add i32 5, 20 br label %if.end if.end: %b.0 = phi i32 [ %add, %if.then ], [ %add1, %if.else ] ret i32 %b.0 }
  • 89. 用LLVM觀察DCE (3/5) 89 define i32 @foo() { entry: %cmp = icmp sgt i32 5, 10 br i1 %cmp, label %if.then, label %if.else if.then: %add = add i32 5, 10 br label %if.end if.else: %add1 = add i32 5, 20 br label %if.end if.end: %if.else, %if.then %b.0 = phi i32 [ %add, %if.then ], [ %add1, %if.else ] ret i32 %b.0 } -constprop opt dce.ll -mem2reg -constprop -S define i32 @foo() { entry: br i1 false, label %if.then, label %if.else if.then: br label %if.end if.else: br label %if.end if.end: %b.0 = phi i32 [ 15, %if.then ], [ 25, %if.else ] ret i32 %b.0 }
  • 90. 用LLVM觀察DCE (4/5) 90 -dce define i32 @foo() { entry: br i1 false, label %if.then, label %if.else if.then: br label %if.end if.else: br label %if.end if.end: %b.0 = phi i32 [ 15, %if.then ], [ 25, %if.else ] ret i32 %b.0 } opt dce.ll -mem2reg -constprop -dce -S define i32 @foo() { entry: br i1 false, label %if.then, label %if.else if.then: br label %if.end if.else: br label %if.end if.end: %b.0 = phi i32 [ 15, %if.then ], [ 25, %if.else ] ret i32 %b.0 }
  • 91. 用LLVM觀察DCE (4/5) 91 -dce define i32 @foo() { entry: br i1 false, label %if.then, label %if.else if.then: br label %if.end if.else: br label %if.end if.end: %b.0 = phi i32 [ 15, %if.then ], [ 25, %if.else ] ret i32 %b.0 } define i32 @foo() { entry: br i1 false, label %if.then, label %if.else if.then: br label %if.end if.else: br label %if.end if.end: %b.0 = phi i32 [ 15, %if.then ], [ 25, %if.else ] ret i32 %b.0 } 看起來好像沒變化?? opt dce.ll -mem2reg -constprop -dce -S
  • 92. 用LLVM觀察DCE (4/5) 92 -dce define i32 @foo() { entry: br i1 false, label %if.then, label %if.else if.then: br label %if.end if.else: br label %if.end if.end: %b.0 = phi i32 [ 15, %if.then ], [ 25, %if.else ] ret i32 %b.0 } define i32 @foo() { entry: br i1 false, label %if.then, label %if.else if.then: br label %if.end if.else: br label %if.end if.end: %b.0 = phi i32 [ 15, %if.then ], [ 25, %if.else ] ret i32 %b.0 } 看起來好像沒變化?? LLVM 將 CFG 化簡部份交給-simplifycfg pass
  • 93. 用LLVM觀察DCE (5/5) 93 -simplifycfg define i32 @foo() { entry: br i1 false, label %if.then, label %if.else if.then: br label %if.end if.else: br label %if.end if.end: %b.0 = phi i32 [ 15, %if.then ], [ 25, %if.else ] ret i32 %b.0 } opt dce.ll -mem2reg -constprop -simplifycfg -S define i32 @foo() { entry: ret i32 25 }
  • 94. 用LLVM觀察DCE - 2 (1/2) 94 -simplifycfg opt dce.ll -mem2reg -simplifycfg -S define i32 @foo() { entry: %cmp = icmp sgt i32 5, 10 %add = add i32 5, 10 %add1 = add i32 5, 20 %b.0 = select i1 %cmp, i32 %add, i32 %add1 ret i32 %b.0 } define i32 @foo() { entry: %cmp = icmp sgt i32 5, 10 br i1 %cmp, label %if.then, label %if.else if.then: %add = add i32 5, 10 br label %if.end if.else: %add1 = add i32 5, 20 br label %if.end if.end: %if.else, %if.then %b.0 = phi i32 [ %add, %if.then ], [ %add1, %if.else ] ret i32 %b.0 }
  • 95. 用LLVM觀察DCE - 2 (2/2) 95 -constprop define i32 @foo() { entry: %cmp = icmp sgt i32 5, 10 %add = add i32 5, 10 %add1 = add i32 5, 20 %b.0 = select i1 %cmp, i32 %add, i32 %add1 ret i32 %b.0 } opt dce.ll -mem2reg -simplifycfg -constprop -S define i32 @foo() { entry: ret i32 25 }
  • 96. CSE 96 • CSE:Common subexpression elimination – 把可以共用的部份共用!
  • 97. CSE 97 a = b * c + g; d = b * c * e; t = b * c; a = t + g; d = t * e;
  • 98. 用LLVM觀察CSE (1/2) define i32 @foo(i32 %b, i32 %c, i32 %g, i32 %e) { entry: %mul = mul i32 %b, %c %add = add i32 %mul, %g %mul1 = mul i32 %b, %c %mul2 = mul i32 %mul1, %e %add3 = add i32 %add, %mul2 ret i32 %add3 } 98 int foo(int b, int c, int g, int e) { int a = b * c + g; int d = b * c * e; return a + d; } clang -emit-llvm -S cse.c opt cse.ll -mem2reg -S
  • 99. 用LLVM觀察CSE (2/2) 99 opt cse.ll -mem2reg -early-cse -S define i32 @foo(i32 %b, i32 %c, i32 %g, i32 %e) { entry: %mul = mul i32 %b, %c %add = add i32 %mul, %g %mul1 = mul i32 %b, %c %mul2 = mul i32 %mul1, %e %add3 = add i32 %add, %mul2 ret i32 %add3 } define i32 @foo(i32 %b, i32 %c, i32 %g, i32 %e) { entry: %mul = mul i32 %b, %c %add = add i32 %mul, %g %mul2 = mul i32 %mul, %e %add3 = add i32 %add, %mul2 ret i32 %add3 } -early-cse
  • 100. Loop Unroll 100 • Loop Unroll:迴圈展開 – 跳躍指令在大多數架構下比一般運算指令貴 – 展開後Loop index可能從變數變成常數 sum = 0; for (i = 0; i < 3; ++i) sum = sum + i sum = 0; sum = sum + 0 sum = sum + 1 sum = sum + 2
  • 101. 用LLVM觀察Loop Unroll (1/8) 101 int add(int a, int b) { return a + b; }i nt foo() { int sum = 0; int i; for (i = 0; i < 3; ++i) sum = add(sum, i); return sum; } clang -emit-llvm -S for.c opt for.ll -mem2reg -S define i32 @add(i32 %a, i32 %b) { entry: %add = add i32 %a, %b ret i32 %add } define i32 @foo() { entry: br label %for.cond for.cond: %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ] %sum.0 = phi i32 [ 0, %entry ], [ %call, %for.inc ] %cmp = icmp slt i32 %i.0, 3 br i1 %cmp, label %for.body, label %for.end for.body: %call = call i32 @add(i32 %sum.0, i32 %i.0) br label %for.inc for.inc: %inc = add i32 %i.0, 1 br label %for.cond for.end: ret i32 %sum.0 }
  • 102. 用LLVM觀察Loop Unroll (2/8) 102 define i32 @foo() { entry: br label %for.cond for.cond: %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ] %sum.0 = phi i32 [ 0, %entry ], [ %call, %for.inc ] %cmp = icmp slt i32 %i.0, 3 br i1 %cmp, label %for.body, label %for.end for.body: %call = call i32 @add(i32 %sum.0, i32 %i.0) br label %for.inc for.inc: %inc = add i32 %i.0, 1 br label %for.cond for.end: ret i32 %sum.0 } opt for.ll -mem2reg -loop-unroll -S define i32 @foo() { entry: br label %for.cond for.cond: %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ] %sum.0 = phi i32 [ 0, %entry ], [ %call, %for.inc ] %cmp = icmp slt i32 %i.0, 3 br i1 %cmp, label %for.body, label %for.end for.body: %call = call i32 @add(i32 %sum.0, i32 %i.0) br label %for.inc for.inc: %inc = add i32 %i.0, 1 br label %for.cond for.end: %sum.0.lcssa = phi i32 [ %sum.0, %for.cond ] ret i32 %sum.0.lcssa } -loop-unroll
  • 103. 用LLVM觀察Loop Unroll (2/8) 103 define i32 @foo() { entry: br label %for.cond for.cond: %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ] %sum.0 = phi i32 [ 0, %entry ], [ %call, %for.inc ] %cmp = icmp slt i32 %i.0, 3 br i1 %cmp, label %for.body, label %for.end for.body: %call = call i32 @add(i32 %sum.0, i32 %i.0) br label %for.inc for.inc: %inc = add i32 %i.0, 1 br label %for.cond for.end: ret i32 %sum.0 } -loop-unroll opt for.ll -mem2reg -loop-unroll -S define i32 @foo() { entry: br label %for.cond for.cond: %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ] %sum.0 = phi i32 [ 0, %entry ], [ %call, %for.inc ] %cmp = icmp slt i32 %i.0, 3 br i1 %cmp, label %for.body, label %for.end for.body: %call = call i32 @add(i32 %sum.0, i32 %i.0) br label %for.inc for.inc: %inc = add i32 %i.0, 1 br label %for.cond for.end: %sum.0.lcssa = phi i32 [ %sum.0, %for.cond ] ret i32 %sum.0.lcssa } 似乎 Unroll 不開????
  • 104. 用LLVM觀察Loop Unroll (3/8) $ opt -mem2reg -S for.ll -loop-unroll -debug Args: opt -mem2reg -S for.ll -loop-unroll -debug Loop Unroll: F[foo] Loop %for.cond Loop Size = 8 Can't unroll; loop not terminated by a conditional branch. 104 define i32 @foo() { entry: br label %for.cond for.cond: %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ] %sum.0 = phi i32 [ 0, %entry ], [ %call, %for.inc ] %cmp = icmp slt i32 %i.0, 3 br i1 %cmp, label %for.body, label %for.end for.body: %call = call i32 @add(i32 %sum.0, i32 %i.0) br label %for.inc for.inc: %inc = add i32 %i.0, 1 br label %for.cond for.end: ret i32 %sum.0 } opt for.ll -mem2reg -loop-unroll -S -debug 跟你抱怨這個 Loop, Loop Unroll Pass 認不得!? -loop-unroll -debug
  • 105. 用LLVM觀察Loop Unroll (4/8) 105 define i32 @foo() { entry: br label %for.cond for.cond: %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ] %sum.0 = phi i32 [ 0, %entry ], [ %call, %for.inc ] %cmp = icmp slt i32 %i.0, 3 br i1 %cmp, label %for.body, label %for.end for.body: %call = call i32 @add(i32 %sum.0, i32 %i.0) br label %for.inc for.inc: %inc = add i32 %i.0, 1 br label %for.cond for.end: ret i32 %sum.0 } opt for.ll -mem2reg -loop-rotate -S define i32 @foo() { entry: br label %for.body for.body: %sum.02 = phi i32 [ 0, %entry ], [ %call, %for.inc ] %i.01 = phi i32 [ 0, %entry ], [ %inc, %for.inc ] %call = call i32 @add(i32 %sum.02, i32 %i.01) br label %for.inc for.inc: %inc = add i32 %i.01, 1 %cmp = icmp slt i32 %inc, 3 br i1 %cmp, label %for.body, label %for.end for.end: %sum.0.lcssa = phi i32 [ %call, %for.inc ] ret i32 %sum.0.lcssa } 翻轉吧!迴圈! -loop-rorate
  • 106. 用LLVM觀察Loop Unroll (5/8) 106 opt for.ll -mem2reg -view-cfg -loop-rotate -view-cfg -S 翻轉吧!迴圈! -loop-rorate
  • 107. 用LLVM觀察Loop Unroll (6/8) 107 -loop-unroll opt for.ll -mem2reg -loop-rotate -loop-unroll -view-cfg -S
  • 108. 用LLVM觀察Loop Unroll (7/8) define i32 @foo() { entry: %call = call i32 @add(i32 0, i32 0) %call.1 = call i32 @add(i32 %call, i32 1) %call.2 = call i32 @add(i32 %call.1, i32 2) ret i32 %call.2 } 108 define i32 @foo() { entry: br label %for.body for.body: %call = call i32 @add(i32 0, i32 0) br label %for.inc for.inc: %call.1 = call i32 @add(i32 %call, i32 1) br label %for.inc.1 for.inc.1: %call.2 = call i32 @add(i32 %call.1, i32 2) br label %for.inc.2 for.inc.2: ret i32 %call.2 } -simplifycfg opt for.ll -mem2reg -loop-rotate -loop-unroll -simplifycfg -view-cfg -S
  • 109. 用LLVM觀察Loop Unroll (8/8) 109 define i32 @add(i32 %a, i32 %b) { entry: %add = add i32 %a, %b ret i32 %add } define i32 @foo() { entry: %call = call i32 @add(i32 0, i32 0) %call.1 = call i32 @add(i32 %call, i32 1) %call.2 = call i32 @add(i32 %call.1, i32 2) ret i32 %call.2 } opt for.ll -mem2reg -loop-rotate -loop-unroll -simplifycfg -inline -constprop -S -inline define i32 @foo() { entry: %add.i = add i32 1, 2 ret i32 %add.i }
  • 110. 用LLVM觀察Loop Unroll (8/8) -constprop 110 define i32 @add(i32 %a, i32 %b) { entry: %add = add i32 %a, %b ret i32 %add } define i32 @foo() { entry: %call = call i32 @add(i32 0, i32 0) %call.1 = call i32 @add(i32 %call, i32 1) %call.2 = call i32 @add(i32 %call.1, i32 2) ret i32 %call.2 } opt for.ll -mem2reg -loop-rotate -loop-unroll -simplifycfg -inline -constprop -S -inline define i32 @foo() { entry: %add.i = add i32 1, 2 ret i32 %add.i } define i32 @foo() { entry: ret i32 3 }
  • 111. Compiler Optimization 111 • 編譯器不同最佳化之間可以交互作用 • 順序也會影響最佳化結果
  • 112. LLVM 112 • 透過 opt -help 可以看到
  • 113. 113
  • 114. Overview of GCC Optimization Pass 114 $ gcc a.c -fdump-tree-all -fdump-rtl-all -O3 $ ls a.c.* a.c.001t.tu a.c.059t.forwprop2 a.c.093t.sink a.c.138t.copyrename4 a.c.204r.outof_cfglayout a.c.003t.original a.c.060t.objsz1 a.c.096t.loop a.c.139t.crited2 a.c.205r.split1 a.c.004t.gimple a.c.061t.alias a.c.097t.loopinit a.c.141t.uncprop1 a.c.206r.subreg2 a.c.006t.omplower a.c.062t.retslot a.c.098t.lim1 a.c.142t.local-pure-const2 a.c.208r.mode_sw a.c.007t.lower a.c.063t.fre2 a.c.099t.copyprop5 a.c.168t.nrv a.c.209r.asmcons a.c.010t.eh a.c.064t.copyprop3 a.c.100t.dceloop1 a.c.169t.optimized a.c.213r.ira a.c.011t.cfg a.c.065t.mergephi2 a.c.101t.unswitch a.c.170r.expand a.c.214r.reload a.c.015t.ssa a.c.066t.vrp1 a.c.102t.sccp a.c.171r.vregs a.c.215r.postreload a.c.017t.inline_param1 a.c.067t.dce1 a.c.104t.ldist a.c.172r.into_cfglayout a.c.216r.gcse2 a.c.018t.einline a.c.068t.cdce a.c.105t.copyprop6 a.c.173r.jump a.c.217r.split2 a.c.019t.early_optimizations a.c.069t.cselim a.c.111t.ivcanon a.c.174r.subreg1 a.c.218r.ree a.c.020t.copyrename1 a.c.070t.ifcombine a.c.113t.ifcvt a.c.175r.dfinit a.c.221r.pro_and_epilogue a.c.021t.ccp1 a.c.071t.phiopt1 a.c.114t.vect a.c.176r.cse1 a.c.222r.dse2 a.c.022t.forwprop1 a.c.072t.tailr2 a.c.115t.dceloop3 a.c.177r.fwprop1 a.c.223r.csa a.c.023t.ealias a.c.073t.ch a.c.116t.pcom a.c.178r.cprop1 a.c.224r.jump2 a.c.024t.esra a.c.075t.cplxlower1 a.c.117t.cunroll a.c.179r.pre a.c.225r.peephole2 a.c.025t.fre1 a.c.076t.sra a.c.118t.slp a.c.181r.cprop2 a.c.226r.ce3 a.c.026t.copyprop1 a.c.077t.copyrename3 a.c.120t.ivopts a.c.184r.ce1 a.c.228r.cprop_hardreg a.c.027t.mergephi1 a.c.078t.dom1 a.c.121t.lim3 a.c.185r.reginfo a.c.229r.rtl_dce a.c.028t.cddce1 a.c.079t.isolate-paths a.c.122t.loopdone a.c.186r.loop2 a.c.230r.compgotos a.c.029t.eipa_sra a.c.080t.phicprop1 a.c.123t.veclower21 a.c.187r.loop2_init a.c.231r.bbro a.c.030t.tailr1 a.c.081t.dse1 a.c.125t.reassoc2 a.c.188r.loop2_invariant a.c.233r.split4 a.c.031t.switchconv a.c.082t.reassoc1 a.c.126t.slsr a.c.189r.loop2_unswitch a.c.234r.sched2 a.c.033t.profile_estimate a.c.083t.dce2 a.c.127t.dom2 a.c.192r.loop2_done a.c.236r.stack a.c.034t.local-pure-const1 a.c.084t.forwprop3 a.c.128t.phicprop2 a.c.194r.cprop3 a.c.237r.alignments a.c.035t.fnsplit a.c.085t.phiopt2 a.c.129t.vrp2 a.c.195r.cse2 a.c.239r.mach a.c.036t.release_ssa a.c.086t.strlen a.c.130t.cddce2 a.c.196r.dse1 a.c.240r.barriers a.c.037t.inline_param2 a.c.087t.ccp3 a.c.132t.dse2 a.c.197r.fwprop2 a.c.244r.shorten a.c.054t.copyrename2 a.c.088t.copyprop4 a.c.133t.forwprop4 a.c.199r.init-regs a.c.245r.nothrow a.c.055t.ccp2 a.c.089t.sincos a.c.134t.phiopt3 a.c.200r.ud_dce a.c.246r.dwarf2 a.c.056t.copyprop2 a.c.090t.bswap a.c.135t.fab1 a.c.201r.combine a.c.247r.final a.c.057t.cunrolli a.c.091t.crited1 a.c.136t.widening_mul a.c.202r.ce2 a.c.248r.dfinish a.c.058t.phiprop a.c.092t.pre a.c.137t.tailc a.c.203r.bbpart a.c.249t.statistics 共 165 個pass 的 dump file!
  • 115. Propagation 115 $ gcc a.c -fdump-tree-all -fdump-rtl-all -O3 $ ls a.c.* a.c.001t.tu a.c.059t.forwprop2 a.c.093t.sink a.c.138t.copyrename4 a.c.204r.outof_cfglayout a.c.003t.original a.c.060t.objsz1 a.c.096t.loop a.c.139t.crited2 a.c.205r.split1 a.c.004t.gimple a.c.061t.alias a.c.097t.loopinit a.c.141t.uncprop1 a.c.206r.subreg2 a.c.006t.omplower a.c.062t.retslot a.c.098t.lim1 a.c.142t.local-pure-const2 a.c.208r.mode_sw a.c.007t.lower a.c.063t.fre2 a.c.099t.copyprop5 a.c.168t.nrv a.c.209r.asmcons a.c.010t.eh a.c.064t.copyprop3 a.c.100t.dceloop1 a.c.169t.optimized a.c.213r.ira a.c.011t.cfg a.c.065t.mergephi2 a.c.101t.unswitch a.c.170r.expand a.c.214r.reload a.c.015t.ssa a.c.066t.vrp1 a.c.102t.sccp a.c.171r.vregs a.c.215r.postreload a.c.017t.inline_param1 a.c.067t.dce1 a.c.104t.ldist a.c.172r.into_cfglayout a.c.216r.gcse2 a.c.018t.einline a.c.068t.cdce a.c.105t.copyprop6 a.c.173r.jump a.c.217r.split2 a.c.019t.early_optimizations a.c.069t.cselim a.c.111t.ivcanon a.c.174r.subreg1 a.c.218r.ree a.c.020t.copyrename1 a.c.070t.ifcombine a.c.113t.ifcvt a.c.175r.dfinit a.c.221r.pro_and_epilogue a.c.021t.ccp1 a.c.071t.phiopt1 a.c.114t.vect a.c.176r.cse1 a.c.222r.dse2 a.c.022t.forwprop1 a.c.072t.tailr2 a.c.115t.dceloop3 a.c.177r.fwprop1 a.c.223r.csa a.c.023t.ealias a.c.073t.ch a.c.116t.pcom a.c.178r.cprop1 a.c.224r.jump2 a.c.024t.esra a.c.075t.cplxlower1 a.c.117t.cunroll a.c.179r.pre a.c.225r.peephole2 a.c.025t.fre1 a.c.076t.sra a.c.118t.slp a.c.181r.cprop2 a.c.226r.ce3 a.c.026t.copyprop1 a.c.077t.copyrename3 a.c.120t.ivopts a.c.184r.ce1 a.c.228r.cprop_hardreg a.c.027t.mergephi1 a.c.078t.dom1 a.c.121t.lim3 a.c.185r.reginfo a.c.229r.rtl_dce a.c.028t.cddce1 a.c.079t.isolate-paths a.c.122t.loopdone a.c.186r.loop2 a.c.230r.compgotos a.c.029t.eipa_sra a.c.080t.phicprop1 a.c.123t.veclower21 a.c.187r.loop2_init a.c.231r.bbro a.c.030t.tailr1 a.c.081t.dse1 a.c.125t.reassoc2 a.c.188r.loop2_invariant a.c.233r.split4 a.c.031t.switchconv a.c.082t.reassoc1 a.c.126t.slsr a.c.189r.loop2_unswitch a.c.234r.sched2 a.c.033t.profile_estimate a.c.083t.dce2 a.c.127t.dom2 a.c.192r.loop2_done a.c.236r.stack a.c.034t.local-pure-const1 a.c.084t.forwprop3 a.c.128t.phicprop2 a.c.194r.cprop3 a.c.237r.alignments a.c.035t.fnsplit a.c.085t.phiopt2 a.c.129t.vrp2 a.c.195r.cse2 a.c.239r.mach a.c.036t.release_ssa a.c.086t.strlen a.c.130t.cddce2 a.c.196r.dse1 a.c.240r.barriers a.c.037t.inline_param2 a.c.087t.ccp3 a.c.132t.dse2 a.c.197r.fwprop2 a.c.244r.shorten a.c.054t.copyrename2 a.c.088t.copyprop4 a.c.133t.forwprop4 a.c.199r.init-regs a.c.245r.nothrow a.c.055t.ccp2 a.c.089t.sincos a.c.134t.phiopt3 a.c.200r.ud_dce a.c.246r.dwarf2 a.c.056t.copyprop2 a.c.090t.bswap a.c.135t.fab1 a.c.201r.combine a.c.247r.final a.c.057t.cunrolli a.c.091t.crited1 a.c.136t.widening_mul a.c.202r.ce2 a.c.248r.dfinish a.c.058t.phiprop a.c.092t.pre a.c.137t.tailc a.c.203r.bbpart a.c.249t.statistics 28 / 165 的 pass 在 Propagation!
  • 116. Inline 116 $ gcc a.c -fdump-tree-all -fdump-rtl-all -O3 $ ls a.c.* a.c.001t.tu a.c.059t.forwprop2 a.c.093t.sink a.c.138t.copyrename4 a.c.204r.outof_cfglayout a.c.003t.original a.c.060t.objsz1 a.c.096t.loop a.c.139t.crited2 a.c.205r.split1 a.c.004t.gimple a.c.061t.alias a.c.097t.loopinit a.c.141t.uncprop1 a.c.206r.subreg2 a.c.006t.omplower a.c.062t.retslot a.c.098t.lim1 a.c.142t.local-pure-const2 a.c.208r.mode_sw a.c.007t.lower a.c.063t.fre2 a.c.099t.copyprop5 a.c.168t.nrv a.c.209r.asmcons a.c.010t.eh a.c.064t.copyprop3 a.c.100t.dceloop1 a.c.169t.optimized a.c.213r.ira a.c.011t.cfg a.c.065t.mergephi2 a.c.101t.unswitch a.c.170r.expand a.c.214r.reload a.c.015t.ssa a.c.066t.vrp1 a.c.102t.sccp a.c.171r.vregs a.c.215r.postreload a.c.017t.inline_param1 a.c.067t.dce1 a.c.104t.ldist a.c.172r.into_cfglayout a.c.216r.gcse2 a.c.018t.einline a.c.068t.cdce a.c.105t.copyprop6 a.c.173r.jump a.c.217r.split2 a.c.019t.early_optimizations a.c.069t.cselim a.c.111t.ivcanon a.c.174r.subreg1 a.c.218r.ree a.c.020t.copyrename1 a.c.070t.ifcombine a.c.113t.ifcvt a.c.175r.dfinit a.c.221r.pro_and_epilogue a.c.021t.ccp1 a.c.071t.phiopt1 a.c.114t.vect a.c.176r.cse1 a.c.222r.dse2 a.c.022t.forwprop1 a.c.072t.tailr2 a.c.115t.dceloop3 a.c.177r.fwprop1 a.c.223r.csa a.c.023t.ealias a.c.073t.ch a.c.116t.pcom a.c.178r.cprop1 a.c.224r.jump2 a.c.024t.esra a.c.075t.cplxlower1 a.c.117t.cunroll a.c.179r.pre a.c.225r.peephole2 a.c.025t.fre1 a.c.076t.sra a.c.118t.slp a.c.181r.cprop2 a.c.226r.ce3 a.c.026t.copyprop1 a.c.077t.copyrename3 a.c.120t.ivopts a.c.184r.ce1 a.c.228r.cprop_hardreg a.c.027t.mergephi1 a.c.078t.dom1 a.c.121t.lim3 a.c.185r.reginfo a.c.229r.rtl_dce a.c.028t.cddce1 a.c.079t.isolate-paths a.c.122t.loopdone a.c.186r.loop2 a.c.230r.compgotos a.c.029t.eipa_sra a.c.080t.phicprop1 a.c.123t.veclower21 a.c.187r.loop2_init a.c.231r.bbro a.c.030t.tailr1 a.c.081t.dse1 a.c.125t.reassoc2 a.c.188r.loop2_invariant a.c.233r.split4 a.c.031t.switchconv a.c.082t.reassoc1 a.c.126t.slsr a.c.189r.loop2_unswitch a.c.234r.sched2 a.c.033t.profile_estimate a.c.083t.dce2 a.c.127t.dom2 a.c.192r.loop2_done a.c.236r.stack a.c.034t.local-pure-const1 a.c.084t.forwprop3 a.c.128t.phicprop2 a.c.194r.cprop3 a.c.237r.alignments a.c.035t.fnsplit a.c.085t.phiopt2 a.c.129t.vrp2 a.c.195r.cse2 a.c.239r.mach a.c.036t.release_ssa a.c.086t.strlen a.c.130t.cddce2 a.c.196r.dse1 a.c.240r.barriers a.c.037t.inline_param2 a.c.087t.ccp3 a.c.132t.dse2 a.c.197r.fwprop2 a.c.244r.shorten a.c.054t.copyrename2 a.c.088t.copyprop4 a.c.133t.forwprop4 a.c.199r.init-regs a.c.245r.nothrow a.c.055t.ccp2 a.c.089t.sincos a.c.134t.phiopt3 a.c.200r.ud_dce a.c.246r.dwarf2 a.c.056t.copyprop2 a.c.090t.bswap a.c.135t.fab1 a.c.201r.combine a.c.247r.final a.c.057t.cunrolli a.c.091t.crited1 a.c.136t.widening_mul a.c.202r.ce2 a.c.248r.dfinish a.c.058t.phiprop a.c.092t.pre a.c.137t.tailc a.c.203r.bbpart a.c.249t.statistics 3 / 165 的 pass 在 Inline!
  • 117. DCE 117 $ gcc a.c -fdump-tree-all -fdump-rtl-all -O3 $ ls a.c.* 13 / 165 的 pass 在 DCE! a.c.001t.tu a.c.059t.forwprop2 a.c.093t.sink a.c.138t.copyrename4 a.c.204r.outof_cfglayout a.c.003t.original a.c.060t.objsz1 a.c.096t.loop a.c.139t.crited2 a.c.205r.split1 a.c.004t.gimple a.c.061t.alias a.c.097t.loopinit a.c.141t.uncprop1 a.c.206r.subreg2 a.c.006t.omplower a.c.062t.retslot a.c.098t.lim1 a.c.142t.local-pure-const2 a.c.208r.mode_sw a.c.007t.lower a.c.063t.fre2 a.c.099t.copyprop5 a.c.168t.nrv a.c.209r.asmcons a.c.010t.eh a.c.064t.copyprop3 a.c.100t.dceloop1 a.c.169t.optimized a.c.213r.ira a.c.011t.cfg a.c.065t.mergephi2 a.c.101t.unswitch a.c.170r.expand a.c.214r.reload a.c.015t.ssa a.c.066t.vrp1 a.c.102t.sccp a.c.171r.vregs a.c.215r.postreload a.c.017t.inline_param1 a.c.067t.dce1 a.c.104t.ldist a.c.172r.into_cfglayout a.c.216r.gcse2 a.c.018t.einline a.c.068t.cdce a.c.105t.copyprop6 a.c.173r.jump a.c.217r.split2 a.c.019t.early_optimizations a.c.069t.cselim a.c.111t.ivcanon a.c.174r.subreg1 a.c.218r.ree a.c.020t.copyrename1 a.c.070t.ifcombine a.c.113t.ifcvt a.c.175r.dfinit a.c.221r.pro_and_epilogue a.c.021t.ccp1 a.c.071t.phiopt1 a.c.114t.vect a.c.176r.cse1 a.c.222r.dse2 a.c.022t.forwprop1 a.c.072t.tailr2 a.c.115t.dceloop3 a.c.177r.fwprop1 a.c.223r.csa a.c.023t.ealias a.c.073t.ch a.c.116t.pcom a.c.178r.cprop1 a.c.224r.jump2 a.c.024t.esra a.c.075t.cplxlower1 a.c.117t.cunroll a.c.179r.pre a.c.225r.peephole2 a.c.025t.fre1 a.c.076t.sra a.c.118t.slp a.c.181r.cprop2 a.c.226r.ce3 a.c.026t.copyprop1 a.c.077t.copyrename3 a.c.120t.ivopts a.c.184r.ce1 a.c.228r.cprop_hardreg a.c.027t.mergephi1 a.c.078t.dom1 a.c.121t.lim3 a.c.185r.reginfo a.c.229r.rtl_dce a.c.028t.cddce1 a.c.079t.isolate-paths a.c.122t.loopdone a.c.186r.loop2 a.c.230r.compgotos a.c.029t.eipa_sra a.c.080t.phicprop1 a.c.123t.veclower21 a.c.187r.loop2_init a.c.231r.bbro a.c.030t.tailr1 a.c.081t.dse1 a.c.125t.reassoc2 a.c.188r.loop2_invariant a.c.233r.split4 a.c.031t.switchconv a.c.082t.reassoc1 a.c.126t.slsr a.c.189r.loop2_unswitch a.c.234r.sched2 a.c.033t.profile_estimate a.c.083t.dce2 a.c.127t.dom2 a.c.192r.loop2_done a.c.236r.stack a.c.034t.local-pure-const1 a.c.084t.forwprop3 a.c.128t.phicprop2 a.c.194r.cprop3 a.c.237r.alignments a.c.035t.fnsplit a.c.085t.phiopt2 a.c.129t.vrp2 a.c.195r.cse2 a.c.239r.mach a.c.036t.release_ssa a.c.086t.strlen a.c.130t.cddce2 a.c.196r.dse1 a.c.240r.barriers a.c.037t.inline_param2 a.c.087t.ccp3 a.c.132t.dse2 a.c.197r.fwprop2 a.c.244r.shorten a.c.054t.copyrename2 a.c.088t.copyprop4 a.c.133t.forwprop4 a.c.199r.init-regs a.c.245r.nothrow a.c.055t.ccp2 a.c.089t.sincos a.c.134t.phiopt3 a.c.200r.ud_dce a.c.246r.dwarf2 a.c.056t.copyprop2 a.c.090t.bswap a.c.135t.fab1 a.c.201r.combine a.c.247r.final a.c.057t.cunrolli a.c.091t.crited1 a.c.136t.widening_mul a.c.202r.ce2 a.c.248r.dfinish a.c.058t.phiprop a.c.092t.pre a.c.137t.tailc a.c.203r.bbpart a.c.249t.statistics
  • 118. CSE 118 $ gcc a.c -fdump-tree-all -fdump-rtl-all -O3 $ ls a.c.* 4 / 165 的 pass 在 CSE! a.c.001t.tu a.c.059t.forwprop2 a.c.093t.sink a.c.138t.copyrename4 a.c.204r.outof_cfglayout a.c.003t.original a.c.060t.objsz1 a.c.096t.loop a.c.139t.crited2 a.c.205r.split1 a.c.004t.gimple a.c.061t.alias a.c.097t.loopinit a.c.141t.uncprop1 a.c.206r.subreg2 a.c.006t.omplower a.c.062t.retslot a.c.098t.lim1 a.c.142t.local-pure-const2 a.c.208r.mode_sw a.c.007t.lower a.c.063t.fre2 a.c.099t.copyprop5 a.c.168t.nrv a.c.209r.asmcons a.c.010t.eh a.c.064t.copyprop3 a.c.100t.dceloop1 a.c.169t.optimized a.c.213r.ira a.c.011t.cfg a.c.065t.mergephi2 a.c.101t.unswitch a.c.170r.expand a.c.214r.reload a.c.015t.ssa a.c.066t.vrp1 a.c.102t.sccp a.c.171r.vregs a.c.215r.postreload a.c.017t.inline_param1 a.c.067t.dce1 a.c.104t.ldist a.c.172r.into_cfglayout a.c.216r.gcse2 a.c.018t.einline a.c.068t.cdce a.c.105t.copyprop6 a.c.173r.jump a.c.217r.split2 a.c.019t.early_optimizations a.c.069t.cselim a.c.111t.ivcanon a.c.174r.subreg1 a.c.218r.ree a.c.020t.copyrename1 a.c.070t.ifcombine a.c.113t.ifcvt a.c.175r.dfinit a.c.221r.pro_and_epilogue a.c.021t.ccp1 a.c.071t.phiopt1 a.c.114t.vect a.c.176r.cse1 a.c.222r.dse2 a.c.022t.forwprop1 a.c.072t.tailr2 a.c.115t.dceloop3 a.c.177r.fwprop1 a.c.223r.csa a.c.023t.ealias a.c.073t.ch a.c.116t.pcom a.c.178r.cprop1 a.c.224r.jump2 a.c.024t.esra a.c.075t.cplxlower1 a.c.117t.cunroll a.c.179r.pre a.c.225r.peephole2 a.c.025t.fre1 a.c.076t.sra a.c.118t.slp a.c.181r.cprop2 a.c.226r.ce3 a.c.026t.copyprop1 a.c.077t.copyrename3 a.c.120t.ivopts a.c.184r.ce1 a.c.228r.cprop_hardreg a.c.027t.mergephi1 a.c.078t.dom1 a.c.121t.lim3 a.c.185r.reginfo a.c.229r.rtl_dce a.c.028t.cddce1 a.c.079t.isolate-paths a.c.122t.loopdone a.c.186r.loop2 a.c.230r.compgotos a.c.029t.eipa_sra a.c.080t.phicprop1 a.c.123t.veclower21 a.c.187r.loop2_init a.c.231r.bbro a.c.030t.tailr1 a.c.081t.dse1 a.c.125t.reassoc2 a.c.188r.loop2_invariant a.c.233r.split4 a.c.031t.switchconv a.c.082t.reassoc1 a.c.126t.slsr a.c.189r.loop2_unswitch a.c.234r.sched2 a.c.033t.profile_estimate a.c.083t.dce2 a.c.127t.dom2 a.c.192r.loop2_done a.c.236r.stack a.c.034t.local-pure-const1 a.c.084t.forwprop3 a.c.128t.phicprop2 a.c.194r.cprop3 a.c.237r.alignments a.c.035t.fnsplit a.c.085t.phiopt2 a.c.129t.vrp2 a.c.195r.cse2 a.c.239r.mach a.c.036t.release_ssa a.c.086t.strlen a.c.130t.cddce2 a.c.196r.dse1 a.c.240r.barriers a.c.037t.inline_param2 a.c.087t.ccp3 a.c.132t.dse2 a.c.197r.fwprop2 a.c.244r.shorten a.c.054t.copyrename2 a.c.088t.copyprop4 a.c.133t.forwprop4 a.c.199r.init-regs a.c.245r.nothrow a.c.055t.ccp2 a.c.089t.sincos a.c.134t.phiopt3 a.c.200r.ud_dce a.c.246r.dwarf2 a.c.056t.copyprop2 a.c.090t.bswap a.c.135t.fab1 a.c.201r.combine a.c.247r.final a.c.057t.cunrolli a.c.091t.crited1 a.c.136t.widening_mul a.c.202r.ce2 a.c.248r.dfinish a.c.058t.phiprop a.c.092t.pre a.c.137t.tailc a.c.203r.bbpart a.c.249t.statistics
  • 119. Unroll 119 $ gcc a.c -fdump-tree-all -fdump-rtl-all -O3 $ ls a.c.* a.c.001t.tu a.c.059t.forwprop2 a.c.093t.sink a.c.138t.copyrename4 a.c.204r.outof_cfglayout a.c.003t.original a.c.060t.objsz1 a.c.096t.loop a.c.139t.crited2 a.c.205r.split1 a.c.004t.gimple a.c.061t.alias a.c.097t.loopinit a.c.141t.uncprop1 a.c.206r.subreg2 a.c.006t.omplower a.c.062t.retslot a.c.098t.lim1 a.c.142t.local-pure-const2 a.c.208r.mode_sw a.c.007t.lower a.c.063t.fre2 a.c.099t.copyprop5 a.c.168t.nrv a.c.209r.asmcons a.c.010t.eh a.c.064t.copyprop3 a.c.100t.dceloop1 a.c.169t.optimized a.c.213r.ira a.c.011t.cfg a.c.065t.mergephi2 a.c.101t.unswitch a.c.170r.expand a.c.214r.reload a.c.015t.ssa a.c.066t.vrp1 a.c.102t.sccp a.c.171r.vregs a.c.215r.postreload a.c.017t.inline_param1 a.c.067t.dce1 a.c.104t.ldist a.c.172r.into_cfglayout a.c.216r.gcse2 a.c.018t.einline a.c.068t.cdce a.c.105t.copyprop6 a.c.173r.jump a.c.217r.split2 a.c.019t.early_optimizations a.c.069t.cselim a.c.111t.ivcanon a.c.174r.subreg1 a.c.218r.ree a.c.020t.copyrename1 a.c.070t.ifcombine a.c.113t.ifcvt a.c.175r.dfinit a.c.221r.pro_and_epilogue a.c.021t.ccp1 a.c.071t.phiopt1 a.c.114t.vect a.c.176r.cse1 a.c.222r.dse2 a.c.022t.forwprop1 a.c.072t.tailr2 a.c.115t.dceloop3 a.c.177r.fwprop1 a.c.223r.csa a.c.023t.ealias a.c.073t.ch a.c.116t.pcom a.c.178r.cprop1 a.c.224r.jump2 a.c.024t.esra a.c.075t.cplxlower1 a.c.117t.cunroll a.c.179r.pre a.c.225r.peephole2 a.c.025t.fre1 a.c.076t.sra a.c.118t.slp a.c.181r.cprop2 a.c.226r.ce3 a.c.026t.copyprop1 a.c.077t.copyrename3 a.c.120t.ivopts a.c.184r.ce1 a.c.228r.cprop_hardreg a.c.027t.mergephi1 a.c.078t.dom1 a.c.121t.lim3 a.c.185r.reginfo a.c.229r.rtl_dce a.c.028t.cddce1 a.c.079t.isolate-paths a.c.122t.loopdone a.c.186r.loop2 a.c.230r.compgotos a.c.029t.eipa_sra a.c.080t.phicprop1 a.c.123t.veclower21 a.c.187r.loop2_init a.c.231r.bbro a.c.030t.tailr1 a.c.081t.dse1 a.c.125t.reassoc2 a.c.188r.loop2_invariant a.c.233r.split4 a.c.031t.switchconv a.c.082t.reassoc1 a.c.126t.slsr a.c.189r.loop2_unswitch a.c.234r.sched2 a.c.033t.profile_estimate a.c.083t.dce2 a.c.127t.dom2 a.c.192r.loop2_done a.c.236r.stack a.c.034t.local-pure-const1 a.c.084t.forwprop3 a.c.128t.phicprop2 a.c.194r.cprop3 a.c.237r.alignments a.c.035t.fnsplit a.c.085t.phiopt2 a.c.129t.vrp2 a.c.195r.cse2 a.c.239r.mach a.c.036t.release_ssa a.c.086t.strlen a.c.130t.cddce2 a.c.196r.dse1 a.c.240r.barriers a.c.037t.inline_param2 a.c.087t.ccp3 a.c.132t.dse2 a.c.197r.fwprop2 a.c.244r.shorten a.c.054t.copyrename2 a.c.088t.copyprop4 a.c.133t.forwprop4 a.c.199r.init-regs a.c.245r.nothrow a.c.055t.ccp2 a.c.089t.sincos a.c.134t.phiopt3 a.c.200r.ud_dce a.c.246r.dwarf2 a.c.056t.copyprop2 a.c.090t.bswap a.c.135t.fab1 a.c.201r.combine a.c.247r.final a.c.057t.cunrolli a.c.091t.crited1 a.c.136t.widening_mul a.c.202r.ce2 a.c.248r.dfinish a.c.058t.phiprop a.c.092t.pre a.c.137t.tailc a.c.203r.bbpart a.c.249t.statistics 2 / 165 的 pass 在 Unroll!
  • 120. Propagation + DCE + CSE + Inline + Unroll 120 50 / 165 ! a.c.001t.tu a.c.059t.forwprop2 a.c.093t.sink a.c.138t.copyrename4 a.c.204r.outof_cfglayout a.c.003t.original a.c.060t.objsz1 a.c.096t.loop a.c.139t.crited2 a.c.205r.split1 a.c.004t.gimple a.c.061t.alias a.c.097t.loopinit a.c.141t.uncprop1 a.c.206r.subreg2 a.c.006t.omplower a.c.062t.retslot a.c.098t.lim1 a.c.142t.local-pure-const2 a.c.208r.mode_sw a.c.007t.lower a.c.063t.fre2 a.c.099t.copyprop5 a.c.168t.nrv a.c.209r.asmcons a.c.010t.eh a.c.064t.copyprop3 a.c.100t.dceloop1 a.c.169t.optimized a.c.213r.ira a.c.011t.cfg a.c.065t.mergephi2 a.c.101t.unswitch a.c.170r.expand a.c.214r.reload a.c.015t.ssa a.c.066t.vrp1 a.c.102t.sccp a.c.171r.vregs a.c.215r.postreload a.c.017t.inline_param1 a.c.067t.dce1 a.c.104t.ldist a.c.172r.into_cfglayout a.c.216r.gcse2 a.c.018t.einline a.c.068t.cdce a.c.105t.copyprop6 a.c.173r.jump a.c.217r.split2 a.c.019t.early_optimizations a.c.069t.cselim a.c.111t.ivcanon a.c.174r.subreg1 a.c.218r.ree a.c.020t.copyrename1 a.c.070t.ifcombine a.c.113t.ifcvt a.c.175r.dfinit a.c.221r.pro_and_epilogue a.c.021t.ccp1 a.c.071t.phiopt1 a.c.114t.vect a.c.176r.cse1 a.c.222r.dse2 a.c.022t.forwprop1 a.c.072t.tailr2 a.c.115t.dceloop3 a.c.177r.fwprop1 a.c.223r.csa a.c.023t.ealias a.c.073t.ch a.c.116t.pcom a.c.178r.cprop1 a.c.224r.jump2 a.c.024t.esra a.c.075t.cplxlower1 a.c.117t.cunroll a.c.179r.pre a.c.225r.peephole2 a.c.025t.fre1 a.c.076t.sra a.c.118t.slp a.c.181r.cprop2 a.c.226r.ce3 a.c.026t.copyprop1 a.c.077t.copyrename3 a.c.120t.ivopts a.c.184r.ce1 a.c.228r.cprop_hardreg a.c.027t.mergephi1 a.c.078t.dom1 a.c.121t.lim3 a.c.185r.reginfo a.c.229r.rtl_dce a.c.028t.cddce1 a.c.079t.isolate-paths a.c.122t.loopdone a.c.186r.loop2 a.c.230r.compgotos a.c.029t.eipa_sra a.c.080t.phicprop1 a.c.123t.veclower21 a.c.187r.loop2_init a.c.231r.bbro a.c.030t.tailr1 a.c.081t.dse1 a.c.125t.reassoc2 a.c.188r.loop2_invariant a.c.233r.split4 a.c.031t.switchconv a.c.082t.reassoc1 a.c.126t.slsr a.c.189r.loop2_unswitch a.c.234r.sched2 a.c.033t.profile_estimate a.c.083t.dce2 a.c.127t.dom2 a.c.192r.loop2_done a.c.236r.stack a.c.034t.local-pure-const1 a.c.084t.forwprop3 a.c.128t.phicprop2 a.c.194r.cprop3 a.c.237r.alignments a.c.035t.fnsplit a.c.085t.phiopt2 a.c.129t.vrp2 a.c.195r.cse2 a.c.239r.mach a.c.036t.release_ssa a.c.086t.strlen a.c.130t.cddce2 a.c.196r.dse1 a.c.240r.barriers a.c.037t.inline_param2 a.c.087t.ccp3 a.c.132t.dse2 a.c.197r.fwprop2 a.c.244r.shorten a.c.054t.copyrename2 a.c.088t.copyprop4 a.c.133t.forwprop4 a.c.199r.init-regs a.c.245r.nothrow a.c.055t.ccp2 a.c.089t.sincos a.c.134t.phiopt3 a.c.200r.ud_dce a.c.246r.dwarf2 a.c.056t.copyprop2 a.c.090t.bswap a.c.135t.fab1 a.c.201r.combine a.c.247r.final a.c.057t.cunrolli a.c.091t.crited1 a.c.136t.widening_mul a.c.202r.ce2 a.c.248r.dfinish a.c.058t.phiprop a.c.092t.pre a.c.137t.tailc a.c.203r.bbpart a.c.249t.statistics
  • 121. Propagation + DCE + CSE + Inline + Unroll 121 50 / 165 ! a.c.001t.tu a.c.059t.forwprop2 a.c.093t.sink a.c.138t.copyrename4 a.c.204r.outof_cfglayout a.c.003t.original a.c.060t.objsz1 a.c.096t.loop a.c.139t.crited2 a.c.205r.split1 a.c.004t.gimple a.c.061t.alias a.c.097t.loopinit a.c.141t.uncprop1 a.c.206r.subreg2 a.c.006t.omplower a.c.062t.retslot a.c.098t.lim1 a.c.142t.local-pure-const2 a.c.208r.mode_sw a.c.007t.lower a.c.063t.fre2 a.c.099t.copyprop5 a.c.168t.nrv a.c.209r.asmcons a.c.010t.eh a.c.064t.copyprop3 a.c.100t.dceloop1 a.c.169t.optimized a.c.213r.ira a.c.011t.cfg a.c.065t.mergephi2 a.c.101t.unswitch a.c.170r.expand a.c.214r.reload a.c.015t.ssa a.c.066t.vrp1 a.c.102t.sccp a.c.171r.vregs a.c.215r.postreload a.c.017t.inline_param1 a.c.067t.dce1 a.c.104t.ldist a.c.172r.into_cfglayout a.c.216r.gcse2 a.c.018t.einline a.c.068t.cdce a.c.105t.copyprop6 a.c.173r.jump a.c.217r.split2 a.c.019t.early_optimizations a.c.069t.cselim a.c.111t.ivcanon a.c.174r.subreg1 a.c.218r.ree a.c.020t.copyrename1 a.c.021t.ccp1 a.c.022t.forwprop1 聽a.c.完070t.ifcombine 這次a.c.113t.的ifcvt 分享a.c.175r.dfinit a.c.221r.pro_and_epilogue a.c.071t.phiopt1 a.c.114t.vect a.c.176r.等cse1 於已a.c.222r.經 dse2 a.c.072t.tailr2 a.c.115t.dceloop3 a.c.177r.fwprop1 a.c.223r.csa a.c.023t.ealias a.c.073t.ch a.c.116t.pcom a.c.178r.cprop1 a.c.224r.jump2 a.c.024t.esra 略懂a.c.075t.cplxlower1 a.c.117t.cunroll a.c.179r.pre a.c.225r.peephole2 a.c.025t.fre1 a.c.076t.sra a.c.118t.slp a.c.181r.cprop2 a.c.226r.ce3 a.c.026t.copyprop1 a.c.約077t.copyrename3 三分a.c.120t.之ivopts 一a.GCCc.184r.ce1 惹!!! a.c.228r.cprop_hardreg a.c.027t.mergephi1 a.c.078t.dom1 a.c.121t.lim3 a.c.185r.reginfo a.c.229r.rtl_dce a.c.028t.cddce1 a.c.079t.isolate-paths a.c.122t.loopdone a.c.186r.loop2 a.c.230r.compgotos a.c.029t.eipa_sra a.c.080t.phicprop1 a.c.123t.veclower21 a.c.187r.loop2_init a.c.231r.bbro a.c.030t.tailr1 a.c.081t.dse1 a.c.125t.reassoc2 a.c.188r.loop2_invariant a.c.233r.split4 a.c.031t.switchconv a.c.082t.reassoc1 a.c.126t.slsr a.c.189r.loop2_unswitch a.c.234r.sched2 a.c.033t.profile_estimate a.c.083t.dce2 a.c.127t.dom2 a.c.192r.loop2_done a.c.236r.stack a.c.034t.local-pure-const1 a.c.084t.forwprop3 a.c.128t.phicprop2 a.c.194r.cprop3 a.c.237r.alignments a.c.035t.fnsplit a.c.085t.phiopt2 a.c.129t.vrp2 a.c.195r.cse2 a.c.239r.mach a.c.036t.release_ssa a.c.086t.strlen a.c.130t.cddce2 a.c.196r.dse1 a.c.240r.barriers a.c.037t.inline_param2 a.c.087t.ccp3 a.c.132t.dse2 a.c.197r.fwprop2 a.c.244r.shorten a.c.054t.copyrename2 a.c.088t.copyprop4 a.c.133t.forwprop4 a.c.199r.init-regs a.c.245r.nothrow a.c.055t.ccp2 a.c.089t.sincos a.c.134t.phiopt3 a.c.200r.ud_dce a.c.246r.dwarf2 a.c.056t.copyprop2 a.c.090t.bswap a.c.135t.fab1 a.c.201r.combine a.c.247r.final a.c.057t.cunrolli a.c.091t.crited1 a.c.136t.widening_mul a.c.202r.ce2 a.c.248r.dfinish a.c.058t.phiprop a.c.092t.pre a.c.137t.tailc a.c.203r.bbpart a.c.249t.statistics
  • 122. Machine Dependent Compiler Optimization 122 機器相依的編譯器最佳化
  • 123. Machine Dependent Compiler Optimization • Register Allocation • Instruction Scheduling • Peephole Optimization 123
  • 124. 124 Advanced Compiler Optimization 高階編譯器最佳化
  • 125. Advanced Compiler Optimization 125 • Loop Optimization • Inter Procedure Optimization • Auto Vectorization • Auto Parallelization
  • 126. 總結 126 • Compiler Optimization很有趣, 但開始 玩之前一定要先讀一些基礎理論 • LLVM則是一個相當好的理論與實作的接軌
  • 128. 工商時間 安第斯山脈工商時間 128 好山好水好無聊 準時下班氣氛佳
  • 129. 安第斯山脈工商時間 129 好山好水好無聊 準時下班氣氛佳 Open Source++ 工商時間
  • 130. 工商時間 安第斯山脈工商時間 130 好山好水好無聊 準時下班氣氛佳 Open Source++ Toolchain 長期徵人中~
  • 131. 131