Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

淺談編譯器最佳化技術

8,096 views

Published on

大部分人對於編譯器印象停留在課堂上所教導的語法分析的理論以及實作, 而讓許多人對於編譯器技術相當怯步, 但在整個編譯過程中語法分析只是個開端, 但其實編譯器技術最有趣的部份在於中後端的最佳化技術, 其神秘的技術可讓程式加速許多, 在這次的分享中主要會介紹一些編譯器的基礎最佳化, 例如 Propagation, Dead Code Elimination, Inline,
Common Subexpression Elimination 及 Loop Unrolling 等, 並透過 llvm 的一些小工具來輔助觀察這些最佳化的結果, 以此作為入門磚來了解編譯器如何運作。

Published in: Software
  • Be the first to comment

淺談編譯器最佳化技術

  1. 1. 淺談Compiler最佳化技術 Hsinchu Tech Chat Group Date : Dec 7th, 2014 Kito Cheng kito.cheng@gmail.com
  2. 2. 2 自我介紹 安第斯山脈 Compiler Team 專業打雜工
  3. 3. 3 Compiler?
  4. 4. 4 Compiler?
  5. 5. 5 Compilation Flow [1] Compilers: Principles, Techniques, and Tools (2nd Edition) p.5
  6. 6. 6 Compilation Flow 通常大學部編譯器課程僅能 涵蓋 Parser 部份 以及陽春的 Code Generation [1] Compilers: Principles, Techniques, and Tools (2nd Edition) p.5
  7. 7. 7 Compilation Flow 但 Compiler 超好玩超神奇的部份 其實都在最佳化的地方 [1] Compilers: Principles, Techniques, and Tools (2nd Edition) p.5
  8. 8. 8 Compilation Flow 但 Compiler 超好玩超神奇的部份 其實都在最佳化的地方 透過最佳化, 程式可以變得又小又快! [1] Compilers: Principles, Techniques, and Tools (2nd Edition) p.5
  9. 9. 9 Compilation Flow [1] Compilers: Principles, Techniques, and Tools (2nd Edition) p.5
  10. 10. 10 • 這次分享基本上不會涉及太多高深理論, 僅會透過介紹概念並透過範例來講解 • 使用 LLVM 來作為說明以及展示的輔助工 具
  11. 11. 基礎知識惡補11
  12. 12. 基礎知識惡補 12 • Basic Block • Control Flow Graph • Static Single Assignment Form
  13. 13. Basic Block 13 • 單一進入點, 單一出口點的程式區段 • http://en.wikipedia.org/wiki/Basic_bl ock
  14. 14. Control Flow Graph 14 • 簡稱CFG, 簡單來說就是程式的流程圖 • http://en.wikipedia.org/wiki/Control_ flow_graph
  15. 15. Basic Block 15 int foo (ini n) { int ret; if (n > 10) ret = n * 2; else ret = n + 2; return ret; }
  16. 16. Basic Block 16 int foo (ini n) { int ret; if (n > 10) ret = n * 2; else ret = n + 2; return ret; }
  17. 17. Basic Block 17 int foo (ini n) { int ret; if (n > 10) ret = n * 2; else ret = n + 2; return ret; } int ret; if (n > 10) ret = n * 2; ret = n + 2; return ret;
  18. 18. CFG 18 int foo (ini n) { int ret; if (n > 10) ret = n * 2; else ret = n + 2; return ret; } int ret; if (n > 10) ret = n * 2; ret = n + 2; return ret;
  19. 19. Basic Block 19 int sum (int n) { int ret = 0; int i; for (i = 0; i < n; ++i) ret += i; return ret; }
  20. 20. Basic Block 20 int sum (int n) { int ret = 0; int i; for (i = 0; i < n; ++i) ret += i; return ret; }
  21. 21. Basic Block 21 int sum (int n) { int ret = 0; int i; for (i = 0; i < n; ++i) ret += i; return ret; } int ret = 0; int i; i = 0; i < n; ret += i; ++i return ret
  22. 22. CFG 22 int sum (int n) { int ret = 0; int i; for (i = 0; i < n; ++i) ret += i; return ret; } int ret = 0; int i; i = 0; i < n; ret += i; ++i return ret
  23. 23. Static Single Assignment 23 • 將變數標上版本 • 每個值只會賦值/寫入一次 • http://en.wikipedia.org/wiki/Static_s ingle_assignment_form
  24. 24. SSA 24 int foo () { int ret; ret = 10; ret = 20; return ret; }
  25. 25. SSA 25 int foo () { int ret; ret = 10; ret = 20; return ret; } int foo () { int ret; ret1 = 10; ret2 = 20; return ret2; } 每次賦值都會一個版本號
  26. 26. SSA 26 int foo () { int ret; ret = 10; ret = 20; return ret; } int foo () { int ret; ret1 = 10; ret2 = 20; return ret2; } 每次賦值都會一個版本號 標完後可以馬上知道 是使用哪個運算式的結果
  27. 27. SSA 27 int foo (ini n) { int ret; if (n > 10) ret = n * 2; else ret = n + 2; return ret; }
  28. 28. SSA 28 int foo (ini n) { int ret; if (n > 10) ret = n * 2; else ret = n + 2; return ret; } int foo (ini n) { int ret; if (n > 10) ret1 = n * 2; else ret2 = n + 2; return ret?; } 程式中有分歧點會合時 無法判定是從何而來
  29. 29. SSA 29 int foo (ini n) { int ret; if (n > 10) ret = n * 2; else ret = n + 2; return ret; } int foo (ini n) { int ret; if (n > 10) ret1 = n * 2; else ret2 = n + 2; ret3 = Φ (ret1, ret2) return ret3; } 此時需要使用Φ來 處理這種情況, 表示值的定義 需由程式流程決定 並給予新的版本號
  30. 30. 30 L L V M
  31. 31. LLVM 31 • 好用好玩而且最近很夯的 Compiler, 安 裝方法如下: – sudo apt-get install llvm clang xdot – sudo yum install llvm clang python-xdot python-setuptools
  32. 32. LLVM 32 • 好用好玩而且最近很夯的 Compiler, 安 裝方法如下: – sudo apt-get install llvm clang xdot – sudo yum install llvm clang python-xdot python-setuptools xdot 是要看圖用的
  33. 33. LLVM 33 • 好用好玩而且最近很夯的 Compiler, 安 裝方法如下: – sudo apt-get install llvm clang xdot – sudo yum install llvm clang python-xdot python-setuptools xdot 是要看圖用的 這個嘛...Fedora 套件系統 相依性沒設定好, xdot 的相依套件
  34. 34. LLVM 34 • 好用好玩而且最近很夯的 Compiler, 安 裝方法如下: – sudo apt-get install llvm clang xdot – sudo yum install llvm clang python-xdot python-setuptools – 不是 apt-get 或 yum ? 那就假設你是高手 會自己想辦法XD
  35. 35. LLVM 35 • 好用好玩而且最近很夯的 Compiler, 安 裝方法如下: – sudo apt-get install llvm clang xdot – sudo yum install llvm clang python-xdot python-setuptools – 不是 apt-get 或 yum ? 那就假設你是高手 會自己想辦法XD – Windows !? 聽說官網有安裝檔?
  36. 36. LLVM 36 • 好用好玩而且最近很夯的 Compiler, 安 裝方法如下: – sudo apt-get install llvm clang xdot – sudo yum install llvm clang python-xdot python-setuptools – 不是 apt-get 或 yum ? 那就假設你是高手 會自己想辦法XD – Windows !? 聽說官網有安裝檔? – 建議自己 build, 不然會沒有部份debug功能
  37. 37. LLVM IR 37 • v = operation type op1, op2, opn... – %sum = add i32 %op1, %op2 運算元 型態 運算子們 運算結果
  38. 38. 空空的LLVM函數 38 define void @empty() { ret void } 宣告函數的起手式 回傳型態 參數列 @函數名稱 回傳 + 型態
  39. 39. 有一個參數的LLVM函數 39 參數列, 有一個參數叫 %a define void @arg1(i32 %a) { ret void }
  40. 40. 有一個參數並且直接回傳 的LLVM函數 40 回傳值是 i32 define i32 @arg1(i32 %a) { ret i32 %a } 回傳 + 型態 + 回傳值
  41. 41. 有一個參數並且回傳其參數加十 的LLVM函數 41 define i32 @arg1(i32 %a) { %t = add i32 %a, 10 ret i32 %t } %a加10放到%1
  42. 42. LLVM IR 42 • SSA-Based IR – %sum = add i32 %op1, %op2 – %sum = mul i32 %op1, %op2 – error: multiple definition of local value named 'sum'
  43. 43. SSA!? 43 • 對編譯器來講 SSA Form 很友善, 但對 於正常人來說寫 SSA Form 不太直覺...
  44. 44. SSA!? 44 • 對編譯器來講 SSA Form 很友善, 但對 於正常人來說寫 SSA Form 不太直覺... – 習慣Functional programming者例外...XD
  45. 45. SSA!? 45 • 對編譯器來講 SSA Form 很友善, 但對 於正常人來說寫 SSA Form 不太直覺... – 習慣Functional programming者例外...XD • 手動插入PHI 更是件麻煩事
  46. 46. alloca 46 • 用來產生區域變數 – 分配到的空間放到 stack • 使用上有點類似C語言的malloc, 但概念不太一 樣
  47. 47. alloca 47 define void @foo() { %var = alloca i32 ret void } 所產生的位置, 型別 可以看作是一個i32*
  48. 48. alloca 48 • 每次存取都必須透過 load/store – 但在最佳化過程中, 若非必要則會變為 Register (透過mem2reg pass) • 若為 array 或必須對其取位址, 則可能 無法變成 Register
  49. 49. alloca/store 49 define void @foo() { %var = alloca i32 store i32 10, i32* %var ret void } 要存的值與型別型別跟要存的目標位置
  50. 50. alloca/load 50 define void @foo() { %var = alloca i32 store i32 10, i32* %var %t0 = load i32* %var ret void } 讀取回來的值型別跟要讀取的目標位置
  51. 51. LLVM/Clang 51 • 今天的分享中只會使用以下兩個工具: – clang : 把 c 變成 LLVM IR – opt : 進行最佳化以及觀察的工具
  52. 52. View CFG by LLVM 52 • clang foo.c -S -emit-llvm • opt foo.ll -veiw-cfg int foo(int a, int b) { if (a > b) return a; else return b; }
  53. 53. View CFG by LLVM 53 垃圾指令有點多, 但在觀察階段開最佳化, 又會干擾學習
  54. 54. View CFG by LLVM 54 垃圾指令有點多, 但在觀察階段開最佳化, 又會干擾學習 opt foo.ll -O1 -veiw-cfg 開完最佳化後剩三道指令一個BB...
  55. 55. opt 使用注意事項 (1/3) 55 • 參數的位置很重要!! opt foo.ll -view-cfg -O1 先秀出 CFG 再進行最佳化 opt foo.ll -O1 -view-cfg 先進行最佳化再來看 CFG
  56. 56. opt 使用注意事項 (2/3) 56 • 參數可以重複下 opt foo.ll -view-cfg -O1 -view-cfg 先秀出 CFG 再進行最佳化 最後再看一次 CFG
  57. 57. opt 使用注意事項 (3/3) 57 • 參數可以重複下, 最佳化也可以重複作 opt foo.ll -O1 -view-cfg -O1 -view-cfg 再進行最佳化 進行最佳化
  58. 58. mem2reg 58 • mem2reg: 不必要的 alloca 以及 load/store 砍掉 • 並且把程式變得比較有 SSA Form 的樣子
  59. 59. mem2reg opt foo.ll -mem2reg -view-cfg 59 phi node 出現了! 並且也將 alloca 以及 load/store 砍光
  60. 60. 60 Compiler Optimization 編譯器最佳化
  61. 61. Propagation 61 • Propagation: 傳遞 – Constant Propagation – Copy Propagation
  62. 62. Constant Propagation 62 int foo(int a) { int magic_num = 10; return a + magic_num; } int foo(int a) { int magic_num = 10; return a + 10; }
  63. 63. Constant Propagation 63 opt foo.ll -mem2reg -view-cfg int foo(int a) { int magic_num = 10; return a + magic_num; } 這種最佳化太基本了, 在mem2reg過程順便作掉 int foo(int a) { int magic_num = 10; return a + 10; }
  64. 64. Constant Propagation 64 int foo(int a) { int magic_num = 10; return a + magic_num; } int foo(int a) { int magic_num = 10; return a + 10; } 千萬不要覺得寫成右邊那樣 會比較快就寫一堆 該死的 Magic Number!!!!
  65. 65. Copy Propagation 65 b = a c = b b = a c = a
  66. 66. Constant Folding 66 • Constant Folding: 常數折疊! – 若運算對象都是常數,那就先算出來!
  67. 67. Constant Folding 67 • Constant Folding: 常數折疊! – 若運算對象都是常數,那就先算出來! • a = 123 + 456
  68. 68. Constant Folding 68 • Constant Folding: 常數折疊! – 若運算對象都是常數,那就先算出來! • a = 123 + 456 – a = 579
  69. 69. Constant Folding 69 • Constant Folding: 常數折疊! – 若運算對象都是常數,那就先算出來! • a = 123 + 456 – a = 579 • 程式中不一定有一堆這種常數運算, 但經 過Constant Propagation 後會慢慢出現
  70. 70. Constant Folding 70 a = 10 b = 100 + a
  71. 71. Constant Folding 71 a = 10 b = 100 + a a = 10 b = 100 + 10 Constant Propagation
  72. 72. Constant Folding 72 a = 10 b = 100 + a a = 10 b = 100 + 10 a = 10 b = 110 Constant Propagation Constant Folding
  73. 73. 73 • 程式中哪來中這麼多常數可以玩!? • Propagation跟Folding都是基礎小招, 與其它最佳化搭配起來可發揮最大效用!
  74. 74. 74 • LLVM這幾樣基礎最佳化都是順便做的, 難 以獨立觀察... • Copy/Constant Propagation 基本上都會 在 mem2reg 過程中順便處理掉
  75. 75. 觀察 Constant Folding 75 • Constant Folding 則可以在 LLVM 的 Constant Propagation Pass 中處理 define i32 @folding() { %t = add i32 10, 20 ret i32 %t } define i32 @folding() { ret i32 30 } opt -S cfolding.ll -constprop
  76. 76. Function Inline 76 • Inline: 行內函數? 內嵌函數? • 概念就是把函數內容複製一份到呼叫端 • 節省掉函數的呼叫並且可探索更多的最佳 化機會!
  77. 77. Inline + Propagation 77 • Inline後原本參數的傳遞變成單純的拷貝 行為 – Copy Propagation – Constant Propagation
  78. 78. Inline + Propagation 78 int add(int a, int b) { return a + b; } int foo(int n){ int sum = 0; int i, t; for (i = 0; i < n ;++i) { t = add(10, 20); sum = add(sum, i); sum = add(sum, t); } return sum; }
  79. 79. Inline + Propagation 79 int add(int a, int b) { return a + b; } int foo(int n){ int sum = 0; int i, t; for (i = 0; i < n ;++i) { t = add(10, 20); sum = add(sum, i); sum = add(sum, t); } return sum; } define i32 @add(i32 %a, i32 %b) { %1 = add i32 %a, %b ret i32 %1 } define i32 @foo(i32 %n) { br label %1 ; <label>:1 %sum.0 = phi i32 [ 0, %0 ], [ %6, %7 ] %i.0 = phi i32 [ 0, %0 ], [ %8, %7 ] %2 = icmp slt i32 %i.0, %n br i1 %2, label %3, label %9 ; <label>:3 %4 = call i32 @add(i32 10, i32 20) %5 = call i32 @add(i32 %sum.0, i32 %i.0) %6 = call i32 @add(i32 %5, i32 %4) br label %7 ; <label>:7 %8 = add i32 %i.0, 1 br label %1 ; <label>:9 ret i32 %sum.0 } clang -emit-llvm -S inline.c opt inline.ll -mem2reg -S
  80. 80. Inline + Propagation 80 define i32 @add(i32 %a, i32 %b) { %1 = add i32 %a, %b ret i32 %1 } define i32 @foo(i32 %n) { br label %1 ; <label>:1 %sum.0 = phi i32 [ 0, %0 ], [ %6, %7 ] %i.0 = phi i32 [ 0, %0 ], [ %8, %7 ] %2 = icmp slt i32 %i.0, %n br i1 %2, label %3, label %9 ; <label>:3 %4 = call i32 @add(i32 10, i32 20) %5 = call i32 @add(i32 %sum.0, i32 %i.0) %6 = call i32 @add(i32 %5, i32 %4) br label %7 ; <label>:7 %8 = add i32 %i.0, 1 br label %1 ; <label>:9 ret i32 %sum.0 } define i32 @foo(i32 %n) { br label %1 ; <label>:1 %sum.0 = phi i32 [ 0, %0 ], [ %5, %6 ] %i.0 = phi i32 [ 0, %0 ], [ %7, %6 ] %2 = icmp slt i32 %i.0, %n br i1 %2, label %3, label %8 ; <label>:3 %4 = add i32 %sum.0, %i.0 %5 = add i32 %4, 30 br label %6 ; <label>:6 %7 = add i32 %i.0, 1 br label %1 ; <label>:8 ret i32 %sum.0 } opt inline.ll -mem2reg -inline -S
  81. 81. DCE 81 • DCE: Dead Code Elimination, 死碼消除? • 在經過前面介紹的幾樣最佳化後, 慢慢的 會出現一些冗於的程式碼, 以及一些明顯 永遠不會成立的跳躍條件
  82. 82. DCE 82 int foo() { a = 5; if (a > 10) b = 10; else b = 20; return b; }
  83. 83. DCE 83 int foo() { a = 5; if (a > 10) b = 10; else b = 20; return b; } int foo() { a = 5; if (5 > 10) b = 10; else b = 20; return b; } Constant Propagation
  84. 84. DCE 84 int foo() { a = 5; if (a > 10) b = 10; else b = 20; return b; } int foo() { a = 5; if (5 > 10) b = 10; else b = 20; return b; } int foo() { a = 5; if (false) b = 10; else b = 20; return b; } Constant Propagation Constant Folding
  85. 85. DCE 85 int foo() { a = 5; if (a > 10) b = 10; else b = 20; return b; } int foo() { a = 5; if (5 > 10) b = 10; else b = 20; return b; } int foo() { a = 5; if (false) b = 10; else b = 20; return b; } Constant Propagation int foo() { b = 20; return b; } Constant Folding DCE
  86. 86. DCE 86 int foo() { a = 5; if (a > 10) b = 10; else b = 20; return b; } int foo() { a = 5; if (5 > 10) b = 10; else b = 20; return b; } int foo() { a = 5; if (false) b = 10; else b = 20; return b; } int foo() { b = 20; return b; } int foo() { return 20; } Constant Propagation Constant Folding Constant DCE Propagation
  87. 87. 用LLVM觀察DCE (1/5) 87 int foo() { int a; int b; a = 5; if (a > 10) b = a + 10; else b = a + 20; return b; } clang -S -emit-llvm dce.c define i32 @foo() { entry: %a = alloca i32 %b = alloca i32 store i32 5, i32* %a %0 = load i32* %a %cmp = icmp sgt i32 %0, 10 br i1 %cmp, label %if.then, label %if.else if.then: %1 = load i32* %a %add = add i32 %1, 10 store i32 %add, i32* %b br label %if.end if.else: %2 = load i32* %a %add1 = add i32 %2, 20 store i32 %add1, i32* %b br label %if.end if.end: %3 = load i32* %b ret i32 %3 }
  88. 88. 用LLVM觀察DCE (2/5) 88 define i32 @foo() { entry: %a = alloca i32 %b = alloca i32 store i32 5, i32* %a %0 = load i32* %a %cmp = icmp sgt i32 %0, 10 br i1 %cmp, label %if.then, label %if.else if.then: %1 = load i32* %a %add = add i32 %1, 10 store i32 %add, i32* %b br label %if.end if.else: %2 = load i32* %a %add1 = add i32 %2, 20 store i32 %add1, i32* %b br label %if.end if.end: %3 = load i32* %b ret i32 %3 } opt dce.c -mem2reg -S define i32 @foo() { entry: %cmp = icmp sgt i32 5, 10 br i1 %cmp, label %if.then, label %if.else if.then: %add = add i32 5, 10 br label %if.end if.else: %add1 = add i32 5, 20 br label %if.end if.end: %b.0 = phi i32 [ %add, %if.then ], [ %add1, %if.else ] ret i32 %b.0 }
  89. 89. 用LLVM觀察DCE (3/5) 89 define i32 @foo() { entry: %cmp = icmp sgt i32 5, 10 br i1 %cmp, label %if.then, label %if.else if.then: %add = add i32 5, 10 br label %if.end if.else: %add1 = add i32 5, 20 br label %if.end if.end: %if.else, %if.then %b.0 = phi i32 [ %add, %if.then ], [ %add1, %if.else ] ret i32 %b.0 } -constprop opt dce.ll -mem2reg -constprop -S define i32 @foo() { entry: br i1 false, label %if.then, label %if.else if.then: br label %if.end if.else: br label %if.end if.end: %b.0 = phi i32 [ 15, %if.then ], [ 25, %if.else ] ret i32 %b.0 }
  90. 90. 用LLVM觀察DCE (4/5) 90 -dce define i32 @foo() { entry: br i1 false, label %if.then, label %if.else if.then: br label %if.end if.else: br label %if.end if.end: %b.0 = phi i32 [ 15, %if.then ], [ 25, %if.else ] ret i32 %b.0 } opt dce.ll -mem2reg -constprop -dce -S define i32 @foo() { entry: br i1 false, label %if.then, label %if.else if.then: br label %if.end if.else: br label %if.end if.end: %b.0 = phi i32 [ 15, %if.then ], [ 25, %if.else ] ret i32 %b.0 }
  91. 91. 用LLVM觀察DCE (4/5) 91 -dce define i32 @foo() { entry: br i1 false, label %if.then, label %if.else if.then: br label %if.end if.else: br label %if.end if.end: %b.0 = phi i32 [ 15, %if.then ], [ 25, %if.else ] ret i32 %b.0 } define i32 @foo() { entry: br i1 false, label %if.then, label %if.else if.then: br label %if.end if.else: br label %if.end if.end: %b.0 = phi i32 [ 15, %if.then ], [ 25, %if.else ] ret i32 %b.0 } 看起來好像沒變化?? opt dce.ll -mem2reg -constprop -dce -S
  92. 92. 用LLVM觀察DCE (4/5) 92 -dce define i32 @foo() { entry: br i1 false, label %if.then, label %if.else if.then: br label %if.end if.else: br label %if.end if.end: %b.0 = phi i32 [ 15, %if.then ], [ 25, %if.else ] ret i32 %b.0 } define i32 @foo() { entry: br i1 false, label %if.then, label %if.else if.then: br label %if.end if.else: br label %if.end if.end: %b.0 = phi i32 [ 15, %if.then ], [ 25, %if.else ] ret i32 %b.0 } 看起來好像沒變化?? LLVM 將 CFG 化簡部份交給-simplifycfg pass
  93. 93. 用LLVM觀察DCE (5/5) 93 -simplifycfg define i32 @foo() { entry: br i1 false, label %if.then, label %if.else if.then: br label %if.end if.else: br label %if.end if.end: %b.0 = phi i32 [ 15, %if.then ], [ 25, %if.else ] ret i32 %b.0 } opt dce.ll -mem2reg -constprop -simplifycfg -S define i32 @foo() { entry: ret i32 25 }
  94. 94. 用LLVM觀察DCE - 2 (1/2) 94 -simplifycfg opt dce.ll -mem2reg -simplifycfg -S define i32 @foo() { entry: %cmp = icmp sgt i32 5, 10 %add = add i32 5, 10 %add1 = add i32 5, 20 %b.0 = select i1 %cmp, i32 %add, i32 %add1 ret i32 %b.0 } define i32 @foo() { entry: %cmp = icmp sgt i32 5, 10 br i1 %cmp, label %if.then, label %if.else if.then: %add = add i32 5, 10 br label %if.end if.else: %add1 = add i32 5, 20 br label %if.end if.end: %if.else, %if.then %b.0 = phi i32 [ %add, %if.then ], [ %add1, %if.else ] ret i32 %b.0 }
  95. 95. 用LLVM觀察DCE - 2 (2/2) 95 -constprop define i32 @foo() { entry: %cmp = icmp sgt i32 5, 10 %add = add i32 5, 10 %add1 = add i32 5, 20 %b.0 = select i1 %cmp, i32 %add, i32 %add1 ret i32 %b.0 } opt dce.ll -mem2reg -simplifycfg -constprop -S define i32 @foo() { entry: ret i32 25 }
  96. 96. CSE 96 • CSE:Common subexpression elimination – 把可以共用的部份共用!
  97. 97. CSE 97 a = b * c + g; d = b * c * e; t = b * c; a = t + g; d = t * e;
  98. 98. 用LLVM觀察CSE (1/2) define i32 @foo(i32 %b, i32 %c, i32 %g, i32 %e) { entry: %mul = mul i32 %b, %c %add = add i32 %mul, %g %mul1 = mul i32 %b, %c %mul2 = mul i32 %mul1, %e %add3 = add i32 %add, %mul2 ret i32 %add3 } 98 int foo(int b, int c, int g, int e) { int a = b * c + g; int d = b * c * e; return a + d; } clang -emit-llvm -S cse.c opt cse.ll -mem2reg -S
  99. 99. 用LLVM觀察CSE (2/2) 99 opt cse.ll -mem2reg -early-cse -S define i32 @foo(i32 %b, i32 %c, i32 %g, i32 %e) { entry: %mul = mul i32 %b, %c %add = add i32 %mul, %g %mul1 = mul i32 %b, %c %mul2 = mul i32 %mul1, %e %add3 = add i32 %add, %mul2 ret i32 %add3 } define i32 @foo(i32 %b, i32 %c, i32 %g, i32 %e) { entry: %mul = mul i32 %b, %c %add = add i32 %mul, %g %mul2 = mul i32 %mul, %e %add3 = add i32 %add, %mul2 ret i32 %add3 } -early-cse
  100. 100. Loop Unroll 100 • Loop Unroll:迴圈展開 – 跳躍指令在大多數架構下比一般運算指令貴 – 展開後Loop index可能從變數變成常數 sum = 0; for (i = 0; i < 3; ++i) sum = sum + i sum = 0; sum = sum + 0 sum = sum + 1 sum = sum + 2
  101. 101. 用LLVM觀察Loop Unroll (1/8) 101 int add(int a, int b) { return a + b; }i nt foo() { int sum = 0; int i; for (i = 0; i < 3; ++i) sum = add(sum, i); return sum; } clang -emit-llvm -S for.c opt for.ll -mem2reg -S define i32 @add(i32 %a, i32 %b) { entry: %add = add i32 %a, %b ret i32 %add } define i32 @foo() { entry: br label %for.cond for.cond: %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ] %sum.0 = phi i32 [ 0, %entry ], [ %call, %for.inc ] %cmp = icmp slt i32 %i.0, 3 br i1 %cmp, label %for.body, label %for.end for.body: %call = call i32 @add(i32 %sum.0, i32 %i.0) br label %for.inc for.inc: %inc = add i32 %i.0, 1 br label %for.cond for.end: ret i32 %sum.0 }
  102. 102. 用LLVM觀察Loop Unroll (2/8) 102 define i32 @foo() { entry: br label %for.cond for.cond: %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ] %sum.0 = phi i32 [ 0, %entry ], [ %call, %for.inc ] %cmp = icmp slt i32 %i.0, 3 br i1 %cmp, label %for.body, label %for.end for.body: %call = call i32 @add(i32 %sum.0, i32 %i.0) br label %for.inc for.inc: %inc = add i32 %i.0, 1 br label %for.cond for.end: ret i32 %sum.0 } opt for.ll -mem2reg -loop-unroll -S define i32 @foo() { entry: br label %for.cond for.cond: %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ] %sum.0 = phi i32 [ 0, %entry ], [ %call, %for.inc ] %cmp = icmp slt i32 %i.0, 3 br i1 %cmp, label %for.body, label %for.end for.body: %call = call i32 @add(i32 %sum.0, i32 %i.0) br label %for.inc for.inc: %inc = add i32 %i.0, 1 br label %for.cond for.end: %sum.0.lcssa = phi i32 [ %sum.0, %for.cond ] ret i32 %sum.0.lcssa } -loop-unroll
  103. 103. 用LLVM觀察Loop Unroll (2/8) 103 define i32 @foo() { entry: br label %for.cond for.cond: %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ] %sum.0 = phi i32 [ 0, %entry ], [ %call, %for.inc ] %cmp = icmp slt i32 %i.0, 3 br i1 %cmp, label %for.body, label %for.end for.body: %call = call i32 @add(i32 %sum.0, i32 %i.0) br label %for.inc for.inc: %inc = add i32 %i.0, 1 br label %for.cond for.end: ret i32 %sum.0 } -loop-unroll opt for.ll -mem2reg -loop-unroll -S define i32 @foo() { entry: br label %for.cond for.cond: %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ] %sum.0 = phi i32 [ 0, %entry ], [ %call, %for.inc ] %cmp = icmp slt i32 %i.0, 3 br i1 %cmp, label %for.body, label %for.end for.body: %call = call i32 @add(i32 %sum.0, i32 %i.0) br label %for.inc for.inc: %inc = add i32 %i.0, 1 br label %for.cond for.end: %sum.0.lcssa = phi i32 [ %sum.0, %for.cond ] ret i32 %sum.0.lcssa } 似乎 Unroll 不開????
  104. 104. 用LLVM觀察Loop Unroll (3/8) $ opt -mem2reg -S for.ll -loop-unroll -debug Args: opt -mem2reg -S for.ll -loop-unroll -debug Loop Unroll: F[foo] Loop %for.cond Loop Size = 8 Can't unroll; loop not terminated by a conditional branch. 104 define i32 @foo() { entry: br label %for.cond for.cond: %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ] %sum.0 = phi i32 [ 0, %entry ], [ %call, %for.inc ] %cmp = icmp slt i32 %i.0, 3 br i1 %cmp, label %for.body, label %for.end for.body: %call = call i32 @add(i32 %sum.0, i32 %i.0) br label %for.inc for.inc: %inc = add i32 %i.0, 1 br label %for.cond for.end: ret i32 %sum.0 } opt for.ll -mem2reg -loop-unroll -S -debug 跟你抱怨這個 Loop, Loop Unroll Pass 認不得!? -loop-unroll -debug
  105. 105. 用LLVM觀察Loop Unroll (4/8) 105 define i32 @foo() { entry: br label %for.cond for.cond: %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ] %sum.0 = phi i32 [ 0, %entry ], [ %call, %for.inc ] %cmp = icmp slt i32 %i.0, 3 br i1 %cmp, label %for.body, label %for.end for.body: %call = call i32 @add(i32 %sum.0, i32 %i.0) br label %for.inc for.inc: %inc = add i32 %i.0, 1 br label %for.cond for.end: ret i32 %sum.0 } opt for.ll -mem2reg -loop-rotate -S define i32 @foo() { entry: br label %for.body for.body: %sum.02 = phi i32 [ 0, %entry ], [ %call, %for.inc ] %i.01 = phi i32 [ 0, %entry ], [ %inc, %for.inc ] %call = call i32 @add(i32 %sum.02, i32 %i.01) br label %for.inc for.inc: %inc = add i32 %i.01, 1 %cmp = icmp slt i32 %inc, 3 br i1 %cmp, label %for.body, label %for.end for.end: %sum.0.lcssa = phi i32 [ %call, %for.inc ] ret i32 %sum.0.lcssa } 翻轉吧!迴圈! -loop-rorate
  106. 106. 用LLVM觀察Loop Unroll (5/8) 106 opt for.ll -mem2reg -view-cfg -loop-rotate -view-cfg -S 翻轉吧!迴圈! -loop-rorate
  107. 107. 用LLVM觀察Loop Unroll (6/8) 107 -loop-unroll opt for.ll -mem2reg -loop-rotate -loop-unroll -view-cfg -S
  108. 108. 用LLVM觀察Loop Unroll (7/8) define i32 @foo() { entry: %call = call i32 @add(i32 0, i32 0) %call.1 = call i32 @add(i32 %call, i32 1) %call.2 = call i32 @add(i32 %call.1, i32 2) ret i32 %call.2 } 108 define i32 @foo() { entry: br label %for.body for.body: %call = call i32 @add(i32 0, i32 0) br label %for.inc for.inc: %call.1 = call i32 @add(i32 %call, i32 1) br label %for.inc.1 for.inc.1: %call.2 = call i32 @add(i32 %call.1, i32 2) br label %for.inc.2 for.inc.2: ret i32 %call.2 } -simplifycfg opt for.ll -mem2reg -loop-rotate -loop-unroll -simplifycfg -view-cfg -S
  109. 109. 用LLVM觀察Loop Unroll (8/8) 109 define i32 @add(i32 %a, i32 %b) { entry: %add = add i32 %a, %b ret i32 %add } define i32 @foo() { entry: %call = call i32 @add(i32 0, i32 0) %call.1 = call i32 @add(i32 %call, i32 1) %call.2 = call i32 @add(i32 %call.1, i32 2) ret i32 %call.2 } opt for.ll -mem2reg -loop-rotate -loop-unroll -simplifycfg -inline -constprop -S -inline define i32 @foo() { entry: %add.i = add i32 1, 2 ret i32 %add.i }
  110. 110. 用LLVM觀察Loop Unroll (8/8) -constprop 110 define i32 @add(i32 %a, i32 %b) { entry: %add = add i32 %a, %b ret i32 %add } define i32 @foo() { entry: %call = call i32 @add(i32 0, i32 0) %call.1 = call i32 @add(i32 %call, i32 1) %call.2 = call i32 @add(i32 %call.1, i32 2) ret i32 %call.2 } opt for.ll -mem2reg -loop-rotate -loop-unroll -simplifycfg -inline -constprop -S -inline define i32 @foo() { entry: %add.i = add i32 1, 2 ret i32 %add.i } define i32 @foo() { entry: ret i32 3 }
  111. 111. Compiler Optimization 111 • 編譯器不同最佳化之間可以交互作用 • 順序也會影響最佳化結果
  112. 112. LLVM 112 • 透過 opt -help 可以看到
  113. 113. 113
  114. 114. Overview of GCC Optimization Pass 114 $ gcc a.c -fdump-tree-all -fdump-rtl-all -O3 $ ls a.c.* a.c.001t.tu a.c.059t.forwprop2 a.c.093t.sink a.c.138t.copyrename4 a.c.204r.outof_cfglayout a.c.003t.original a.c.060t.objsz1 a.c.096t.loop a.c.139t.crited2 a.c.205r.split1 a.c.004t.gimple a.c.061t.alias a.c.097t.loopinit a.c.141t.uncprop1 a.c.206r.subreg2 a.c.006t.omplower a.c.062t.retslot a.c.098t.lim1 a.c.142t.local-pure-const2 a.c.208r.mode_sw a.c.007t.lower a.c.063t.fre2 a.c.099t.copyprop5 a.c.168t.nrv a.c.209r.asmcons a.c.010t.eh a.c.064t.copyprop3 a.c.100t.dceloop1 a.c.169t.optimized a.c.213r.ira a.c.011t.cfg a.c.065t.mergephi2 a.c.101t.unswitch a.c.170r.expand a.c.214r.reload a.c.015t.ssa a.c.066t.vrp1 a.c.102t.sccp a.c.171r.vregs a.c.215r.postreload a.c.017t.inline_param1 a.c.067t.dce1 a.c.104t.ldist a.c.172r.into_cfglayout a.c.216r.gcse2 a.c.018t.einline a.c.068t.cdce a.c.105t.copyprop6 a.c.173r.jump a.c.217r.split2 a.c.019t.early_optimizations a.c.069t.cselim a.c.111t.ivcanon a.c.174r.subreg1 a.c.218r.ree a.c.020t.copyrename1 a.c.070t.ifcombine a.c.113t.ifcvt a.c.175r.dfinit a.c.221r.pro_and_epilogue a.c.021t.ccp1 a.c.071t.phiopt1 a.c.114t.vect a.c.176r.cse1 a.c.222r.dse2 a.c.022t.forwprop1 a.c.072t.tailr2 a.c.115t.dceloop3 a.c.177r.fwprop1 a.c.223r.csa a.c.023t.ealias a.c.073t.ch a.c.116t.pcom a.c.178r.cprop1 a.c.224r.jump2 a.c.024t.esra a.c.075t.cplxlower1 a.c.117t.cunroll a.c.179r.pre a.c.225r.peephole2 a.c.025t.fre1 a.c.076t.sra a.c.118t.slp a.c.181r.cprop2 a.c.226r.ce3 a.c.026t.copyprop1 a.c.077t.copyrename3 a.c.120t.ivopts a.c.184r.ce1 a.c.228r.cprop_hardreg a.c.027t.mergephi1 a.c.078t.dom1 a.c.121t.lim3 a.c.185r.reginfo a.c.229r.rtl_dce a.c.028t.cddce1 a.c.079t.isolate-paths a.c.122t.loopdone a.c.186r.loop2 a.c.230r.compgotos a.c.029t.eipa_sra a.c.080t.phicprop1 a.c.123t.veclower21 a.c.187r.loop2_init a.c.231r.bbro a.c.030t.tailr1 a.c.081t.dse1 a.c.125t.reassoc2 a.c.188r.loop2_invariant a.c.233r.split4 a.c.031t.switchconv a.c.082t.reassoc1 a.c.126t.slsr a.c.189r.loop2_unswitch a.c.234r.sched2 a.c.033t.profile_estimate a.c.083t.dce2 a.c.127t.dom2 a.c.192r.loop2_done a.c.236r.stack a.c.034t.local-pure-const1 a.c.084t.forwprop3 a.c.128t.phicprop2 a.c.194r.cprop3 a.c.237r.alignments a.c.035t.fnsplit a.c.085t.phiopt2 a.c.129t.vrp2 a.c.195r.cse2 a.c.239r.mach a.c.036t.release_ssa a.c.086t.strlen a.c.130t.cddce2 a.c.196r.dse1 a.c.240r.barriers a.c.037t.inline_param2 a.c.087t.ccp3 a.c.132t.dse2 a.c.197r.fwprop2 a.c.244r.shorten a.c.054t.copyrename2 a.c.088t.copyprop4 a.c.133t.forwprop4 a.c.199r.init-regs a.c.245r.nothrow a.c.055t.ccp2 a.c.089t.sincos a.c.134t.phiopt3 a.c.200r.ud_dce a.c.246r.dwarf2 a.c.056t.copyprop2 a.c.090t.bswap a.c.135t.fab1 a.c.201r.combine a.c.247r.final a.c.057t.cunrolli a.c.091t.crited1 a.c.136t.widening_mul a.c.202r.ce2 a.c.248r.dfinish a.c.058t.phiprop a.c.092t.pre a.c.137t.tailc a.c.203r.bbpart a.c.249t.statistics 共 165 個pass 的 dump file!
  115. 115. Propagation 115 $ gcc a.c -fdump-tree-all -fdump-rtl-all -O3 $ ls a.c.* a.c.001t.tu a.c.059t.forwprop2 a.c.093t.sink a.c.138t.copyrename4 a.c.204r.outof_cfglayout a.c.003t.original a.c.060t.objsz1 a.c.096t.loop a.c.139t.crited2 a.c.205r.split1 a.c.004t.gimple a.c.061t.alias a.c.097t.loopinit a.c.141t.uncprop1 a.c.206r.subreg2 a.c.006t.omplower a.c.062t.retslot a.c.098t.lim1 a.c.142t.local-pure-const2 a.c.208r.mode_sw a.c.007t.lower a.c.063t.fre2 a.c.099t.copyprop5 a.c.168t.nrv a.c.209r.asmcons a.c.010t.eh a.c.064t.copyprop3 a.c.100t.dceloop1 a.c.169t.optimized a.c.213r.ira a.c.011t.cfg a.c.065t.mergephi2 a.c.101t.unswitch a.c.170r.expand a.c.214r.reload a.c.015t.ssa a.c.066t.vrp1 a.c.102t.sccp a.c.171r.vregs a.c.215r.postreload a.c.017t.inline_param1 a.c.067t.dce1 a.c.104t.ldist a.c.172r.into_cfglayout a.c.216r.gcse2 a.c.018t.einline a.c.068t.cdce a.c.105t.copyprop6 a.c.173r.jump a.c.217r.split2 a.c.019t.early_optimizations a.c.069t.cselim a.c.111t.ivcanon a.c.174r.subreg1 a.c.218r.ree a.c.020t.copyrename1 a.c.070t.ifcombine a.c.113t.ifcvt a.c.175r.dfinit a.c.221r.pro_and_epilogue a.c.021t.ccp1 a.c.071t.phiopt1 a.c.114t.vect a.c.176r.cse1 a.c.222r.dse2 a.c.022t.forwprop1 a.c.072t.tailr2 a.c.115t.dceloop3 a.c.177r.fwprop1 a.c.223r.csa a.c.023t.ealias a.c.073t.ch a.c.116t.pcom a.c.178r.cprop1 a.c.224r.jump2 a.c.024t.esra a.c.075t.cplxlower1 a.c.117t.cunroll a.c.179r.pre a.c.225r.peephole2 a.c.025t.fre1 a.c.076t.sra a.c.118t.slp a.c.181r.cprop2 a.c.226r.ce3 a.c.026t.copyprop1 a.c.077t.copyrename3 a.c.120t.ivopts a.c.184r.ce1 a.c.228r.cprop_hardreg a.c.027t.mergephi1 a.c.078t.dom1 a.c.121t.lim3 a.c.185r.reginfo a.c.229r.rtl_dce a.c.028t.cddce1 a.c.079t.isolate-paths a.c.122t.loopdone a.c.186r.loop2 a.c.230r.compgotos a.c.029t.eipa_sra a.c.080t.phicprop1 a.c.123t.veclower21 a.c.187r.loop2_init a.c.231r.bbro a.c.030t.tailr1 a.c.081t.dse1 a.c.125t.reassoc2 a.c.188r.loop2_invariant a.c.233r.split4 a.c.031t.switchconv a.c.082t.reassoc1 a.c.126t.slsr a.c.189r.loop2_unswitch a.c.234r.sched2 a.c.033t.profile_estimate a.c.083t.dce2 a.c.127t.dom2 a.c.192r.loop2_done a.c.236r.stack a.c.034t.local-pure-const1 a.c.084t.forwprop3 a.c.128t.phicprop2 a.c.194r.cprop3 a.c.237r.alignments a.c.035t.fnsplit a.c.085t.phiopt2 a.c.129t.vrp2 a.c.195r.cse2 a.c.239r.mach a.c.036t.release_ssa a.c.086t.strlen a.c.130t.cddce2 a.c.196r.dse1 a.c.240r.barriers a.c.037t.inline_param2 a.c.087t.ccp3 a.c.132t.dse2 a.c.197r.fwprop2 a.c.244r.shorten a.c.054t.copyrename2 a.c.088t.copyprop4 a.c.133t.forwprop4 a.c.199r.init-regs a.c.245r.nothrow a.c.055t.ccp2 a.c.089t.sincos a.c.134t.phiopt3 a.c.200r.ud_dce a.c.246r.dwarf2 a.c.056t.copyprop2 a.c.090t.bswap a.c.135t.fab1 a.c.201r.combine a.c.247r.final a.c.057t.cunrolli a.c.091t.crited1 a.c.136t.widening_mul a.c.202r.ce2 a.c.248r.dfinish a.c.058t.phiprop a.c.092t.pre a.c.137t.tailc a.c.203r.bbpart a.c.249t.statistics 28 / 165 的 pass 在 Propagation!
  116. 116. Inline 116 $ gcc a.c -fdump-tree-all -fdump-rtl-all -O3 $ ls a.c.* a.c.001t.tu a.c.059t.forwprop2 a.c.093t.sink a.c.138t.copyrename4 a.c.204r.outof_cfglayout a.c.003t.original a.c.060t.objsz1 a.c.096t.loop a.c.139t.crited2 a.c.205r.split1 a.c.004t.gimple a.c.061t.alias a.c.097t.loopinit a.c.141t.uncprop1 a.c.206r.subreg2 a.c.006t.omplower a.c.062t.retslot a.c.098t.lim1 a.c.142t.local-pure-const2 a.c.208r.mode_sw a.c.007t.lower a.c.063t.fre2 a.c.099t.copyprop5 a.c.168t.nrv a.c.209r.asmcons a.c.010t.eh a.c.064t.copyprop3 a.c.100t.dceloop1 a.c.169t.optimized a.c.213r.ira a.c.011t.cfg a.c.065t.mergephi2 a.c.101t.unswitch a.c.170r.expand a.c.214r.reload a.c.015t.ssa a.c.066t.vrp1 a.c.102t.sccp a.c.171r.vregs a.c.215r.postreload a.c.017t.inline_param1 a.c.067t.dce1 a.c.104t.ldist a.c.172r.into_cfglayout a.c.216r.gcse2 a.c.018t.einline a.c.068t.cdce a.c.105t.copyprop6 a.c.173r.jump a.c.217r.split2 a.c.019t.early_optimizations a.c.069t.cselim a.c.111t.ivcanon a.c.174r.subreg1 a.c.218r.ree a.c.020t.copyrename1 a.c.070t.ifcombine a.c.113t.ifcvt a.c.175r.dfinit a.c.221r.pro_and_epilogue a.c.021t.ccp1 a.c.071t.phiopt1 a.c.114t.vect a.c.176r.cse1 a.c.222r.dse2 a.c.022t.forwprop1 a.c.072t.tailr2 a.c.115t.dceloop3 a.c.177r.fwprop1 a.c.223r.csa a.c.023t.ealias a.c.073t.ch a.c.116t.pcom a.c.178r.cprop1 a.c.224r.jump2 a.c.024t.esra a.c.075t.cplxlower1 a.c.117t.cunroll a.c.179r.pre a.c.225r.peephole2 a.c.025t.fre1 a.c.076t.sra a.c.118t.slp a.c.181r.cprop2 a.c.226r.ce3 a.c.026t.copyprop1 a.c.077t.copyrename3 a.c.120t.ivopts a.c.184r.ce1 a.c.228r.cprop_hardreg a.c.027t.mergephi1 a.c.078t.dom1 a.c.121t.lim3 a.c.185r.reginfo a.c.229r.rtl_dce a.c.028t.cddce1 a.c.079t.isolate-paths a.c.122t.loopdone a.c.186r.loop2 a.c.230r.compgotos a.c.029t.eipa_sra a.c.080t.phicprop1 a.c.123t.veclower21 a.c.187r.loop2_init a.c.231r.bbro a.c.030t.tailr1 a.c.081t.dse1 a.c.125t.reassoc2 a.c.188r.loop2_invariant a.c.233r.split4 a.c.031t.switchconv a.c.082t.reassoc1 a.c.126t.slsr a.c.189r.loop2_unswitch a.c.234r.sched2 a.c.033t.profile_estimate a.c.083t.dce2 a.c.127t.dom2 a.c.192r.loop2_done a.c.236r.stack a.c.034t.local-pure-const1 a.c.084t.forwprop3 a.c.128t.phicprop2 a.c.194r.cprop3 a.c.237r.alignments a.c.035t.fnsplit a.c.085t.phiopt2 a.c.129t.vrp2 a.c.195r.cse2 a.c.239r.mach a.c.036t.release_ssa a.c.086t.strlen a.c.130t.cddce2 a.c.196r.dse1 a.c.240r.barriers a.c.037t.inline_param2 a.c.087t.ccp3 a.c.132t.dse2 a.c.197r.fwprop2 a.c.244r.shorten a.c.054t.copyrename2 a.c.088t.copyprop4 a.c.133t.forwprop4 a.c.199r.init-regs a.c.245r.nothrow a.c.055t.ccp2 a.c.089t.sincos a.c.134t.phiopt3 a.c.200r.ud_dce a.c.246r.dwarf2 a.c.056t.copyprop2 a.c.090t.bswap a.c.135t.fab1 a.c.201r.combine a.c.247r.final a.c.057t.cunrolli a.c.091t.crited1 a.c.136t.widening_mul a.c.202r.ce2 a.c.248r.dfinish a.c.058t.phiprop a.c.092t.pre a.c.137t.tailc a.c.203r.bbpart a.c.249t.statistics 3 / 165 的 pass 在 Inline!
  117. 117. DCE 117 $ gcc a.c -fdump-tree-all -fdump-rtl-all -O3 $ ls a.c.* 13 / 165 的 pass 在 DCE! a.c.001t.tu a.c.059t.forwprop2 a.c.093t.sink a.c.138t.copyrename4 a.c.204r.outof_cfglayout a.c.003t.original a.c.060t.objsz1 a.c.096t.loop a.c.139t.crited2 a.c.205r.split1 a.c.004t.gimple a.c.061t.alias a.c.097t.loopinit a.c.141t.uncprop1 a.c.206r.subreg2 a.c.006t.omplower a.c.062t.retslot a.c.098t.lim1 a.c.142t.local-pure-const2 a.c.208r.mode_sw a.c.007t.lower a.c.063t.fre2 a.c.099t.copyprop5 a.c.168t.nrv a.c.209r.asmcons a.c.010t.eh a.c.064t.copyprop3 a.c.100t.dceloop1 a.c.169t.optimized a.c.213r.ira a.c.011t.cfg a.c.065t.mergephi2 a.c.101t.unswitch a.c.170r.expand a.c.214r.reload a.c.015t.ssa a.c.066t.vrp1 a.c.102t.sccp a.c.171r.vregs a.c.215r.postreload a.c.017t.inline_param1 a.c.067t.dce1 a.c.104t.ldist a.c.172r.into_cfglayout a.c.216r.gcse2 a.c.018t.einline a.c.068t.cdce a.c.105t.copyprop6 a.c.173r.jump a.c.217r.split2 a.c.019t.early_optimizations a.c.069t.cselim a.c.111t.ivcanon a.c.174r.subreg1 a.c.218r.ree a.c.020t.copyrename1 a.c.070t.ifcombine a.c.113t.ifcvt a.c.175r.dfinit a.c.221r.pro_and_epilogue a.c.021t.ccp1 a.c.071t.phiopt1 a.c.114t.vect a.c.176r.cse1 a.c.222r.dse2 a.c.022t.forwprop1 a.c.072t.tailr2 a.c.115t.dceloop3 a.c.177r.fwprop1 a.c.223r.csa a.c.023t.ealias a.c.073t.ch a.c.116t.pcom a.c.178r.cprop1 a.c.224r.jump2 a.c.024t.esra a.c.075t.cplxlower1 a.c.117t.cunroll a.c.179r.pre a.c.225r.peephole2 a.c.025t.fre1 a.c.076t.sra a.c.118t.slp a.c.181r.cprop2 a.c.226r.ce3 a.c.026t.copyprop1 a.c.077t.copyrename3 a.c.120t.ivopts a.c.184r.ce1 a.c.228r.cprop_hardreg a.c.027t.mergephi1 a.c.078t.dom1 a.c.121t.lim3 a.c.185r.reginfo a.c.229r.rtl_dce a.c.028t.cddce1 a.c.079t.isolate-paths a.c.122t.loopdone a.c.186r.loop2 a.c.230r.compgotos a.c.029t.eipa_sra a.c.080t.phicprop1 a.c.123t.veclower21 a.c.187r.loop2_init a.c.231r.bbro a.c.030t.tailr1 a.c.081t.dse1 a.c.125t.reassoc2 a.c.188r.loop2_invariant a.c.233r.split4 a.c.031t.switchconv a.c.082t.reassoc1 a.c.126t.slsr a.c.189r.loop2_unswitch a.c.234r.sched2 a.c.033t.profile_estimate a.c.083t.dce2 a.c.127t.dom2 a.c.192r.loop2_done a.c.236r.stack a.c.034t.local-pure-const1 a.c.084t.forwprop3 a.c.128t.phicprop2 a.c.194r.cprop3 a.c.237r.alignments a.c.035t.fnsplit a.c.085t.phiopt2 a.c.129t.vrp2 a.c.195r.cse2 a.c.239r.mach a.c.036t.release_ssa a.c.086t.strlen a.c.130t.cddce2 a.c.196r.dse1 a.c.240r.barriers a.c.037t.inline_param2 a.c.087t.ccp3 a.c.132t.dse2 a.c.197r.fwprop2 a.c.244r.shorten a.c.054t.copyrename2 a.c.088t.copyprop4 a.c.133t.forwprop4 a.c.199r.init-regs a.c.245r.nothrow a.c.055t.ccp2 a.c.089t.sincos a.c.134t.phiopt3 a.c.200r.ud_dce a.c.246r.dwarf2 a.c.056t.copyprop2 a.c.090t.bswap a.c.135t.fab1 a.c.201r.combine a.c.247r.final a.c.057t.cunrolli a.c.091t.crited1 a.c.136t.widening_mul a.c.202r.ce2 a.c.248r.dfinish a.c.058t.phiprop a.c.092t.pre a.c.137t.tailc a.c.203r.bbpart a.c.249t.statistics
  118. 118. CSE 118 $ gcc a.c -fdump-tree-all -fdump-rtl-all -O3 $ ls a.c.* 4 / 165 的 pass 在 CSE! a.c.001t.tu a.c.059t.forwprop2 a.c.093t.sink a.c.138t.copyrename4 a.c.204r.outof_cfglayout a.c.003t.original a.c.060t.objsz1 a.c.096t.loop a.c.139t.crited2 a.c.205r.split1 a.c.004t.gimple a.c.061t.alias a.c.097t.loopinit a.c.141t.uncprop1 a.c.206r.subreg2 a.c.006t.omplower a.c.062t.retslot a.c.098t.lim1 a.c.142t.local-pure-const2 a.c.208r.mode_sw a.c.007t.lower a.c.063t.fre2 a.c.099t.copyprop5 a.c.168t.nrv a.c.209r.asmcons a.c.010t.eh a.c.064t.copyprop3 a.c.100t.dceloop1 a.c.169t.optimized a.c.213r.ira a.c.011t.cfg a.c.065t.mergephi2 a.c.101t.unswitch a.c.170r.expand a.c.214r.reload a.c.015t.ssa a.c.066t.vrp1 a.c.102t.sccp a.c.171r.vregs a.c.215r.postreload a.c.017t.inline_param1 a.c.067t.dce1 a.c.104t.ldist a.c.172r.into_cfglayout a.c.216r.gcse2 a.c.018t.einline a.c.068t.cdce a.c.105t.copyprop6 a.c.173r.jump a.c.217r.split2 a.c.019t.early_optimizations a.c.069t.cselim a.c.111t.ivcanon a.c.174r.subreg1 a.c.218r.ree a.c.020t.copyrename1 a.c.070t.ifcombine a.c.113t.ifcvt a.c.175r.dfinit a.c.221r.pro_and_epilogue a.c.021t.ccp1 a.c.071t.phiopt1 a.c.114t.vect a.c.176r.cse1 a.c.222r.dse2 a.c.022t.forwprop1 a.c.072t.tailr2 a.c.115t.dceloop3 a.c.177r.fwprop1 a.c.223r.csa a.c.023t.ealias a.c.073t.ch a.c.116t.pcom a.c.178r.cprop1 a.c.224r.jump2 a.c.024t.esra a.c.075t.cplxlower1 a.c.117t.cunroll a.c.179r.pre a.c.225r.peephole2 a.c.025t.fre1 a.c.076t.sra a.c.118t.slp a.c.181r.cprop2 a.c.226r.ce3 a.c.026t.copyprop1 a.c.077t.copyrename3 a.c.120t.ivopts a.c.184r.ce1 a.c.228r.cprop_hardreg a.c.027t.mergephi1 a.c.078t.dom1 a.c.121t.lim3 a.c.185r.reginfo a.c.229r.rtl_dce a.c.028t.cddce1 a.c.079t.isolate-paths a.c.122t.loopdone a.c.186r.loop2 a.c.230r.compgotos a.c.029t.eipa_sra a.c.080t.phicprop1 a.c.123t.veclower21 a.c.187r.loop2_init a.c.231r.bbro a.c.030t.tailr1 a.c.081t.dse1 a.c.125t.reassoc2 a.c.188r.loop2_invariant a.c.233r.split4 a.c.031t.switchconv a.c.082t.reassoc1 a.c.126t.slsr a.c.189r.loop2_unswitch a.c.234r.sched2 a.c.033t.profile_estimate a.c.083t.dce2 a.c.127t.dom2 a.c.192r.loop2_done a.c.236r.stack a.c.034t.local-pure-const1 a.c.084t.forwprop3 a.c.128t.phicprop2 a.c.194r.cprop3 a.c.237r.alignments a.c.035t.fnsplit a.c.085t.phiopt2 a.c.129t.vrp2 a.c.195r.cse2 a.c.239r.mach a.c.036t.release_ssa a.c.086t.strlen a.c.130t.cddce2 a.c.196r.dse1 a.c.240r.barriers a.c.037t.inline_param2 a.c.087t.ccp3 a.c.132t.dse2 a.c.197r.fwprop2 a.c.244r.shorten a.c.054t.copyrename2 a.c.088t.copyprop4 a.c.133t.forwprop4 a.c.199r.init-regs a.c.245r.nothrow a.c.055t.ccp2 a.c.089t.sincos a.c.134t.phiopt3 a.c.200r.ud_dce a.c.246r.dwarf2 a.c.056t.copyprop2 a.c.090t.bswap a.c.135t.fab1 a.c.201r.combine a.c.247r.final a.c.057t.cunrolli a.c.091t.crited1 a.c.136t.widening_mul a.c.202r.ce2 a.c.248r.dfinish a.c.058t.phiprop a.c.092t.pre a.c.137t.tailc a.c.203r.bbpart a.c.249t.statistics
  119. 119. Unroll 119 $ gcc a.c -fdump-tree-all -fdump-rtl-all -O3 $ ls a.c.* a.c.001t.tu a.c.059t.forwprop2 a.c.093t.sink a.c.138t.copyrename4 a.c.204r.outof_cfglayout a.c.003t.original a.c.060t.objsz1 a.c.096t.loop a.c.139t.crited2 a.c.205r.split1 a.c.004t.gimple a.c.061t.alias a.c.097t.loopinit a.c.141t.uncprop1 a.c.206r.subreg2 a.c.006t.omplower a.c.062t.retslot a.c.098t.lim1 a.c.142t.local-pure-const2 a.c.208r.mode_sw a.c.007t.lower a.c.063t.fre2 a.c.099t.copyprop5 a.c.168t.nrv a.c.209r.asmcons a.c.010t.eh a.c.064t.copyprop3 a.c.100t.dceloop1 a.c.169t.optimized a.c.213r.ira a.c.011t.cfg a.c.065t.mergephi2 a.c.101t.unswitch a.c.170r.expand a.c.214r.reload a.c.015t.ssa a.c.066t.vrp1 a.c.102t.sccp a.c.171r.vregs a.c.215r.postreload a.c.017t.inline_param1 a.c.067t.dce1 a.c.104t.ldist a.c.172r.into_cfglayout a.c.216r.gcse2 a.c.018t.einline a.c.068t.cdce a.c.105t.copyprop6 a.c.173r.jump a.c.217r.split2 a.c.019t.early_optimizations a.c.069t.cselim a.c.111t.ivcanon a.c.174r.subreg1 a.c.218r.ree a.c.020t.copyrename1 a.c.070t.ifcombine a.c.113t.ifcvt a.c.175r.dfinit a.c.221r.pro_and_epilogue a.c.021t.ccp1 a.c.071t.phiopt1 a.c.114t.vect a.c.176r.cse1 a.c.222r.dse2 a.c.022t.forwprop1 a.c.072t.tailr2 a.c.115t.dceloop3 a.c.177r.fwprop1 a.c.223r.csa a.c.023t.ealias a.c.073t.ch a.c.116t.pcom a.c.178r.cprop1 a.c.224r.jump2 a.c.024t.esra a.c.075t.cplxlower1 a.c.117t.cunroll a.c.179r.pre a.c.225r.peephole2 a.c.025t.fre1 a.c.076t.sra a.c.118t.slp a.c.181r.cprop2 a.c.226r.ce3 a.c.026t.copyprop1 a.c.077t.copyrename3 a.c.120t.ivopts a.c.184r.ce1 a.c.228r.cprop_hardreg a.c.027t.mergephi1 a.c.078t.dom1 a.c.121t.lim3 a.c.185r.reginfo a.c.229r.rtl_dce a.c.028t.cddce1 a.c.079t.isolate-paths a.c.122t.loopdone a.c.186r.loop2 a.c.230r.compgotos a.c.029t.eipa_sra a.c.080t.phicprop1 a.c.123t.veclower21 a.c.187r.loop2_init a.c.231r.bbro a.c.030t.tailr1 a.c.081t.dse1 a.c.125t.reassoc2 a.c.188r.loop2_invariant a.c.233r.split4 a.c.031t.switchconv a.c.082t.reassoc1 a.c.126t.slsr a.c.189r.loop2_unswitch a.c.234r.sched2 a.c.033t.profile_estimate a.c.083t.dce2 a.c.127t.dom2 a.c.192r.loop2_done a.c.236r.stack a.c.034t.local-pure-const1 a.c.084t.forwprop3 a.c.128t.phicprop2 a.c.194r.cprop3 a.c.237r.alignments a.c.035t.fnsplit a.c.085t.phiopt2 a.c.129t.vrp2 a.c.195r.cse2 a.c.239r.mach a.c.036t.release_ssa a.c.086t.strlen a.c.130t.cddce2 a.c.196r.dse1 a.c.240r.barriers a.c.037t.inline_param2 a.c.087t.ccp3 a.c.132t.dse2 a.c.197r.fwprop2 a.c.244r.shorten a.c.054t.copyrename2 a.c.088t.copyprop4 a.c.133t.forwprop4 a.c.199r.init-regs a.c.245r.nothrow a.c.055t.ccp2 a.c.089t.sincos a.c.134t.phiopt3 a.c.200r.ud_dce a.c.246r.dwarf2 a.c.056t.copyprop2 a.c.090t.bswap a.c.135t.fab1 a.c.201r.combine a.c.247r.final a.c.057t.cunrolli a.c.091t.crited1 a.c.136t.widening_mul a.c.202r.ce2 a.c.248r.dfinish a.c.058t.phiprop a.c.092t.pre a.c.137t.tailc a.c.203r.bbpart a.c.249t.statistics 2 / 165 的 pass 在 Unroll!
  120. 120. Propagation + DCE + CSE + Inline + Unroll 120 50 / 165 ! a.c.001t.tu a.c.059t.forwprop2 a.c.093t.sink a.c.138t.copyrename4 a.c.204r.outof_cfglayout a.c.003t.original a.c.060t.objsz1 a.c.096t.loop a.c.139t.crited2 a.c.205r.split1 a.c.004t.gimple a.c.061t.alias a.c.097t.loopinit a.c.141t.uncprop1 a.c.206r.subreg2 a.c.006t.omplower a.c.062t.retslot a.c.098t.lim1 a.c.142t.local-pure-const2 a.c.208r.mode_sw a.c.007t.lower a.c.063t.fre2 a.c.099t.copyprop5 a.c.168t.nrv a.c.209r.asmcons a.c.010t.eh a.c.064t.copyprop3 a.c.100t.dceloop1 a.c.169t.optimized a.c.213r.ira a.c.011t.cfg a.c.065t.mergephi2 a.c.101t.unswitch a.c.170r.expand a.c.214r.reload a.c.015t.ssa a.c.066t.vrp1 a.c.102t.sccp a.c.171r.vregs a.c.215r.postreload a.c.017t.inline_param1 a.c.067t.dce1 a.c.104t.ldist a.c.172r.into_cfglayout a.c.216r.gcse2 a.c.018t.einline a.c.068t.cdce a.c.105t.copyprop6 a.c.173r.jump a.c.217r.split2 a.c.019t.early_optimizations a.c.069t.cselim a.c.111t.ivcanon a.c.174r.subreg1 a.c.218r.ree a.c.020t.copyrename1 a.c.070t.ifcombine a.c.113t.ifcvt a.c.175r.dfinit a.c.221r.pro_and_epilogue a.c.021t.ccp1 a.c.071t.phiopt1 a.c.114t.vect a.c.176r.cse1 a.c.222r.dse2 a.c.022t.forwprop1 a.c.072t.tailr2 a.c.115t.dceloop3 a.c.177r.fwprop1 a.c.223r.csa a.c.023t.ealias a.c.073t.ch a.c.116t.pcom a.c.178r.cprop1 a.c.224r.jump2 a.c.024t.esra a.c.075t.cplxlower1 a.c.117t.cunroll a.c.179r.pre a.c.225r.peephole2 a.c.025t.fre1 a.c.076t.sra a.c.118t.slp a.c.181r.cprop2 a.c.226r.ce3 a.c.026t.copyprop1 a.c.077t.copyrename3 a.c.120t.ivopts a.c.184r.ce1 a.c.228r.cprop_hardreg a.c.027t.mergephi1 a.c.078t.dom1 a.c.121t.lim3 a.c.185r.reginfo a.c.229r.rtl_dce a.c.028t.cddce1 a.c.079t.isolate-paths a.c.122t.loopdone a.c.186r.loop2 a.c.230r.compgotos a.c.029t.eipa_sra a.c.080t.phicprop1 a.c.123t.veclower21 a.c.187r.loop2_init a.c.231r.bbro a.c.030t.tailr1 a.c.081t.dse1 a.c.125t.reassoc2 a.c.188r.loop2_invariant a.c.233r.split4 a.c.031t.switchconv a.c.082t.reassoc1 a.c.126t.slsr a.c.189r.loop2_unswitch a.c.234r.sched2 a.c.033t.profile_estimate a.c.083t.dce2 a.c.127t.dom2 a.c.192r.loop2_done a.c.236r.stack a.c.034t.local-pure-const1 a.c.084t.forwprop3 a.c.128t.phicprop2 a.c.194r.cprop3 a.c.237r.alignments a.c.035t.fnsplit a.c.085t.phiopt2 a.c.129t.vrp2 a.c.195r.cse2 a.c.239r.mach a.c.036t.release_ssa a.c.086t.strlen a.c.130t.cddce2 a.c.196r.dse1 a.c.240r.barriers a.c.037t.inline_param2 a.c.087t.ccp3 a.c.132t.dse2 a.c.197r.fwprop2 a.c.244r.shorten a.c.054t.copyrename2 a.c.088t.copyprop4 a.c.133t.forwprop4 a.c.199r.init-regs a.c.245r.nothrow a.c.055t.ccp2 a.c.089t.sincos a.c.134t.phiopt3 a.c.200r.ud_dce a.c.246r.dwarf2 a.c.056t.copyprop2 a.c.090t.bswap a.c.135t.fab1 a.c.201r.combine a.c.247r.final a.c.057t.cunrolli a.c.091t.crited1 a.c.136t.widening_mul a.c.202r.ce2 a.c.248r.dfinish a.c.058t.phiprop a.c.092t.pre a.c.137t.tailc a.c.203r.bbpart a.c.249t.statistics
  121. 121. Propagation + DCE + CSE + Inline + Unroll 121 50 / 165 ! a.c.001t.tu a.c.059t.forwprop2 a.c.093t.sink a.c.138t.copyrename4 a.c.204r.outof_cfglayout a.c.003t.original a.c.060t.objsz1 a.c.096t.loop a.c.139t.crited2 a.c.205r.split1 a.c.004t.gimple a.c.061t.alias a.c.097t.loopinit a.c.141t.uncprop1 a.c.206r.subreg2 a.c.006t.omplower a.c.062t.retslot a.c.098t.lim1 a.c.142t.local-pure-const2 a.c.208r.mode_sw a.c.007t.lower a.c.063t.fre2 a.c.099t.copyprop5 a.c.168t.nrv a.c.209r.asmcons a.c.010t.eh a.c.064t.copyprop3 a.c.100t.dceloop1 a.c.169t.optimized a.c.213r.ira a.c.011t.cfg a.c.065t.mergephi2 a.c.101t.unswitch a.c.170r.expand a.c.214r.reload a.c.015t.ssa a.c.066t.vrp1 a.c.102t.sccp a.c.171r.vregs a.c.215r.postreload a.c.017t.inline_param1 a.c.067t.dce1 a.c.104t.ldist a.c.172r.into_cfglayout a.c.216r.gcse2 a.c.018t.einline a.c.068t.cdce a.c.105t.copyprop6 a.c.173r.jump a.c.217r.split2 a.c.019t.early_optimizations a.c.069t.cselim a.c.111t.ivcanon a.c.174r.subreg1 a.c.218r.ree a.c.020t.copyrename1 a.c.021t.ccp1 a.c.022t.forwprop1 聽a.c.完070t.ifcombine 這次a.c.113t.的ifcvt 分享a.c.175r.dfinit a.c.221r.pro_and_epilogue a.c.071t.phiopt1 a.c.114t.vect a.c.176r.等cse1 於已a.c.222r.經 dse2 a.c.072t.tailr2 a.c.115t.dceloop3 a.c.177r.fwprop1 a.c.223r.csa a.c.023t.ealias a.c.073t.ch a.c.116t.pcom a.c.178r.cprop1 a.c.224r.jump2 a.c.024t.esra 略懂a.c.075t.cplxlower1 a.c.117t.cunroll a.c.179r.pre a.c.225r.peephole2 a.c.025t.fre1 a.c.076t.sra a.c.118t.slp a.c.181r.cprop2 a.c.226r.ce3 a.c.026t.copyprop1 a.c.約077t.copyrename3 三分a.c.120t.之ivopts 一a.GCCc.184r.ce1 惹!!! a.c.228r.cprop_hardreg a.c.027t.mergephi1 a.c.078t.dom1 a.c.121t.lim3 a.c.185r.reginfo a.c.229r.rtl_dce a.c.028t.cddce1 a.c.079t.isolate-paths a.c.122t.loopdone a.c.186r.loop2 a.c.230r.compgotos a.c.029t.eipa_sra a.c.080t.phicprop1 a.c.123t.veclower21 a.c.187r.loop2_init a.c.231r.bbro a.c.030t.tailr1 a.c.081t.dse1 a.c.125t.reassoc2 a.c.188r.loop2_invariant a.c.233r.split4 a.c.031t.switchconv a.c.082t.reassoc1 a.c.126t.slsr a.c.189r.loop2_unswitch a.c.234r.sched2 a.c.033t.profile_estimate a.c.083t.dce2 a.c.127t.dom2 a.c.192r.loop2_done a.c.236r.stack a.c.034t.local-pure-const1 a.c.084t.forwprop3 a.c.128t.phicprop2 a.c.194r.cprop3 a.c.237r.alignments a.c.035t.fnsplit a.c.085t.phiopt2 a.c.129t.vrp2 a.c.195r.cse2 a.c.239r.mach a.c.036t.release_ssa a.c.086t.strlen a.c.130t.cddce2 a.c.196r.dse1 a.c.240r.barriers a.c.037t.inline_param2 a.c.087t.ccp3 a.c.132t.dse2 a.c.197r.fwprop2 a.c.244r.shorten a.c.054t.copyrename2 a.c.088t.copyprop4 a.c.133t.forwprop4 a.c.199r.init-regs a.c.245r.nothrow a.c.055t.ccp2 a.c.089t.sincos a.c.134t.phiopt3 a.c.200r.ud_dce a.c.246r.dwarf2 a.c.056t.copyprop2 a.c.090t.bswap a.c.135t.fab1 a.c.201r.combine a.c.247r.final a.c.057t.cunrolli a.c.091t.crited1 a.c.136t.widening_mul a.c.202r.ce2 a.c.248r.dfinish a.c.058t.phiprop a.c.092t.pre a.c.137t.tailc a.c.203r.bbpart a.c.249t.statistics
  122. 122. Machine Dependent Compiler Optimization 122 機器相依的編譯器最佳化
  123. 123. Machine Dependent Compiler Optimization • Register Allocation • Instruction Scheduling • Peephole Optimization 123
  124. 124. 124 Advanced Compiler Optimization 高階編譯器最佳化
  125. 125. Advanced Compiler Optimization 125 • Loop Optimization • Inter Procedure Optimization • Auto Vectorization • Auto Parallelization
  126. 126. 總結 126 • Compiler Optimization很有趣, 但開始 玩之前一定要先讀一些基礎理論 • LLVM則是一個相當好的理論與實作的接軌
  127. 127. 安第斯山脈工商工商時時間間 127
  128. 128. 工商時間 安第斯山脈工商時間 128 好山好水好無聊 準時下班氣氛佳
  129. 129. 安第斯山脈工商時間 129 好山好水好無聊 準時下班氣氛佳 Open Source++ 工商時間
  130. 130. 工商時間 安第斯山脈工商時間 130 好山好水好無聊 準時下班氣氛佳 Open Source++ Toolchain 長期徵人中~
  131. 131. 131

×