SlideShare a Scribd company logo
1 of 212
Download to read offline
Design and Implementation of 
GCC Register Allocation 
HelloGCC'2014 
Date : Sep 13th, 2014 
Kito Cheng 
kito.cheng@gmail.com
2 
Register Allocation 
foo: 
addi $sp, $sp, -8 
slli $r1, $r0, 1 
swi $r1, [$sp] 
movi $r1, 0 
movi $r2, 0 
swi $r0, [$sp + (4)] 
j .L2 
.L3: 
mov $r3, $r2 
lwi $r0, [$sp] 
add $r3, $r0, $r3 
add $r1, $r1, $r3 
addi $r2, $r2, 1 
.L2: 
lwi $r0, [$sp + (4)] 
slts $ta, $r2, $r0 
bnez $ta, .L3 
mov $r0, $r1 
addi $sp, $sp, 8 
ret 
int foo(int n) 
{ 
int i, y, t, z; 
int x = n * 2; 
z = x; 
int sum = 0; 
for (i=0; i<n; ++i) { 
y = i; 
t = z + y; 
sum = sum + t; 
} 
return sum; 
}
3 
Register Allocation 
foo: 
addi $sp, $sp, -8 
slli $r1, $r0, 1 
swi $r1, [$sp] 
movi $r1, 0 
movi $r2, 0 
swi $r0, [$sp + (4)] 
j .L2 
.L3: 
mov $r3, $r2 
lwi $r0, [$sp] 
add $r3, $r0, $r3 
add $r1, $r1, $r3 
addi $r2, $r2, 1 
.L2: 
lwi $r0, [$sp + (4)] 
slts $ta, $r2, $r0 
bnez $ta, .L3 
mov $r0, $r1 
addi $sp, $sp, 8 
ret 
int foo(int n) 
{ 
int i, y, t, z; 
int x = n * 2; 
z = x; 
int sum = 0; 
for (i=0; i<n; ++i) { 
y = i; 
t = z + y; 
sum = sum + t; 
} 
return sum; 
}
4 
Graph Coloring 
基礎理論
5 
Graph-Coloring 
Graph (圖) = Vertex/Node (點) + Edge (邊)
6 
Graph-Coloring 
這是點 
Graph (圖) = Vertex/Node (點) + Edge (邊)
7 
Graph-Coloring 
這是邊 
Graph (圖) = Vertex/Node (點) + Edge (邊)
8 
Graph-Coloring 
上色問題就是把點上顏色, 
並且不能與相鄰的點同顏色
9 
Graph-Coloring 
用最經典的演算法上色: 
假設有 N 個顏色, 
若邊小於 N 個則可隨意上色
Graph-Coloring 
10 
假設有四個顏色,若邊小於四則可隨意上色
Graph-Coloring 
11 
假設有四個顏色,若邊小於四則可隨意上色
Graph-Coloring 
12 
假設有四個顏色,若邊小於四則可隨意上色
Graph-Coloring 
13 
假設有四個顏色,若邊小於四則可隨意上色
Graph-Coloring 
14 
假設有四個顏色,若邊小於四則可隨意上色
Graph-Coloring 
15 
假設有四個顏色,若邊小於四則可隨意上色
Graph-Coloring 
16 
那跟 Register Allocation 的關聯?
Interference Graph 
17 
那跟 Register Allocation 的關聯? 
c 
b 
a 
以 c = a + b 為例 
int a = 10, b = 20; 
int c = a + b;
Interference Graph 
18 
那跟 Register Allocation 的關聯? 
c 
b 
a 
a 跟 b 必須同時存活(Live) 
所以不能使用同一個 Reg 
int a = 10, b = 20; 
int c = a + b; 
Live Range 
a b c 
a=10, b=20 
c = a + b
Interference Graph 
19 
c 
b 
a 
顏色 = Register 
n個顏色可用 = n個暫存器可用 
a 跟 b 必須同時存活(Live) 
所以不能使用同一個 Reg 
int a = 10, b = 20; 
int c = a + b;
Interference Graph 
20 
c 
b 
a 
int a = 10, b = 20; 
int c = a + b; 
任意上色一下
Interference Graph 
21 
c 
b 
a 
int a = 10, b = 20; 
int c = a + b; 
假設藍色代表$r0, 綠色代表 $r1 
亦即此上色結果所產生的程式碼為 
add $r0, $r0, $r1
22 
假設有 N 個顏色, 
若邊小於 N 個則可隨意上色
23 
假設有 N 個顏色, 
若邊小於 N 個則可隨意上色 
假設有 N 個Register, 
若邊小於 N 個則可隨意Allocation
24 
以上就是 
Graph Coloring-based 
Register Allocation 
的概念
25 
經典 
Graph Coloring 
Register Allocation
Chaitin-Briggs 
Algorithm 
26 
• 在深入 GCC 前先複習下最經典的演算法:
Chaitin-Briggs 
Algorithm 
27 
spill 
code 
build coalesce simplify select 
• 在深入 GCC 前先複習下最經典的演算法: 
spill code 
build coalesce simplify select 
簡化成這幾個 
步驟方便講解
Example 
28 
build coalesce simplify select 
int foo(int n) 
{ 
int i, y, t, z; 
int x = n * 2; 
z = x; 
int sum = 0; 
for (i=0; i<n; ++i) { 
y = i; 
t = z + y; 
sum = sum + t; 
} 
return sum; 
} 
spill 
code
CFG 
29 
spill 
code 
int foo(int n) 
{ 
int i, y, t, z; 
int x = n * 2; 
z = x; 
int sum = 0; 
for (i=0; i<n; ++i) { 
y = i; 
t = z + y; 
sum = sum + t; 
} 
return sum; 
} 
n = arg(n) 
x = n * 2 
z = x 
sum = 0 
i = 0 
i < n? 
y = i 
t = z + y 
sum = sum + t 
i = i + 1 
return sum 
build coalesce simplify select
Live Range 
30 
n x z sum i y t 
build coalesce simplify select 
n = arg (n) 
x = n * 2 
z = x 
sum = 0 
i = 0 
i < n? 
y = i 
t = z + y 
sum = sum + t 
i = i + 1 
return sum 
spill 
code 
n = arg(n) 
x = n * 2 
z = x 
sum = 0 
i = 0 
i < n? 
y = i 
t = z + y 
sum = sum + t 
i = i + 1 
return sum
Live Range 
31 
n x z sum i y t 
build coalesce simplify select 
n = arg (n) 
x = n * 2 
z = x 
sum = 0 
i = 0 
i < n? 
y = i 
t = z + y 
sum = sum + t 
i = i + 1 
return sum 
spill 
code 
n = arg(n) 
x = n * 2 
z = x 
sum = 0 
i = 0 
i < n? 
y = i 
t = z + y 
sum = sum + t 
i = i + 1 
return sum 
每道指令有兩個 
Program point, 
Read, Write
Live Range 
32 
n x z sum i y t 
build coalesce simplify select 
n = arg (n) 
x = n * 2 
z = x 
sum = 0 
i = 0 
i < n? 
y = i 
t = z + y 
sum = sum + t 
i = i + 1 
return sum 
spill 
code 
n = arg(n) 
x = n * 2 
z = x 
sum = 0 
i = 0 
i < n? 
y = i 
t = z + y 
sum = sum + t 
i = i + 1 
return sum 
y 於 y = i 
的 Write 時 
Live 
於 t = z + y 
的 Read 時 
最後使用
Live Range 
33 
n x z sum i y t 
build coalesce simplify select 
n = arg (n) 
x = n * 2 
z = x 
sum = 0 
i = 0 
i < n? 
y = i 
t = z + y 
sum = sum + t 
i = i + 1 
return sum 
spill 
code 
n = arg(n) 
x = n * 2 
z = x 
sum = 0 
i = 0 
i < n? 
y = i 
t = z + y 
sum = sum + t 
i = i + 1 
return sum
Interference 
build coalesce simplify select 
Graph n x z sum i y t 
34 
n = arg (n) 
x = n * 2 
z = x 
sum = 0 
i = 0 
i < n? 
y = i 
t = z + y 
sum = sum + t 
i = i + 1 
return sum 
spill 
code 
z 
x 
sum i 
y 
t 
n 
x 跟 n 必須 
同時存活 
又稱 
x 跟 n 與 
有Interference
Interference 
build coalesce simplify select 
Graph n x z sum i y t 
35 
n = arg (n) 
x = n * 2 
z = x 
sum = 0 
i = 0 
i < n? 
y = i 
t = z + y 
sum = sum + t 
i = i + 1 
return sum 
spill 
code 
z 
x 
sum i 
y 
t 
n
Interference 
build coalesce simplify select 
Graph n x z sum i y t 
36 
n = arg (n) 
x = n * 2 
z = x 
sum = 0 
i = 0 
i < n? 
y = i 
t = z + y 
sum = sum + t 
i = i + 1 
return sum 
spill 
code 
z 
x 
sum i 
y 
t 
n
Coalesce 
37 
spill 
code 
build coalesce simplify select 
• Coalesce: 合併 
• 若一道 move 指令的來源與目的無互相干 
擾則可進行 Coalesce, 並將圖上對應的 
點合併 
–保證兩者分配到相同 Register 
– move $r0, $r0 
•明顯可移除的指令
Coalesce 
38 
build coalesce simplify select 
n x z sum i y t 
n = arg (n) 
x = n * 2 
zz == xx 
sum = 0 
i = 0 
i < n? 
y = i 
t = z + y 
sum = sum + t 
i = i + 1 
return sum 
spill 
code 
z 
x 
sum i 
y 
t 
n
Coalesce 
39 
build coalesce simplify select 
n x z sum i y t 
n = arg (n) 
x = n * 2 
zz == xx 
sum = 0 
i = 0 
i < n? 
y = i 
t = z + y 
sum = sum + t 
i = i + 1 
return sum 
spill 
code 
x/z 
sum i 
y 
t 
n
build coalesce simplify select Simplify 
40 
spill 
code 
x/z 
sum i 
y 
t 
n 
假設有4個 Register 可用
build coalesce simplify select Simplify 
41 
spill 
code 
x/z 
sum i 
y 
t 
n 
假設有4個 Register 可用 
=> 邊少於 4 都可隨意著色
build coalesce simplify select Simplify 
42 
spill 
code 
假設有4個 Register 可用 
=> 邊少於 4 都可隨意著色5 
x/z 
sum i 
y 
t 
n 
4 
4 
5 5 
5
build coalesce simplify select Simplify 
43 
spill 
code 
假設有4個 Register 可用 
=> 邊少於 4 都可隨意著色5 
x/z 
sum i 
y 
t 
n 
4 
4 
5 5 
5 
發現沒邊少於4的點!
build coalesce simplify select Simplify 
44 
spill 
code 
假設有4個 Register 可用 
=> 邊少於 4 都可隨意著色5 
x/z 
sum i 
y 
t 
n 
4 
4 
5 5 
5 
發現沒邊少於4的點! 
=>挑一個Cost最低的拿
build coalesce simplify select Simplify 
45 
spill 
code 
x/z 
sum i 
y 
t 
假設有4個 Register 可用 
=> 邊少於 4 都可隨意著色 
n* 
3 
3 
4 4 
4 
發現沒邊少於4的點! 
=>挑一個Cost最低的拿拿到旁邊的Stack 
並標上記號 
更新每個點的邊數
build coalesce simplify select Simplify 
46 
spill 
code 
x/z 
sum i 
y 
假設有4個 Register 可用 
=> 邊少於 4 都可隨意著色 
t 
n* 
3 
3 3 
3
build coalesce simplify select Simplify 
47 
spill 
code 
x/z 
假設有4個 Register 可用 
=> 邊少於 4 都可隨意著色 
y sum i 
t 
n* 
2 2 
2
build coalesce simplify select Simplify 
48 
spill 
code 
假設有4個 Register 可用 
=> 邊少於 4 都可隨意著色 
x/z 
y sum i 
t 
n* 
1 1
build coalesce simplify select Simplify 
49 
spill 
code 
假設有4個 Register 可用 
=> 邊少於 4 都可隨意著色 
sum 
x/z 
y i 
t 
n* 
0
build coalesce simplify select Simplify 
50 
spill 
code 
假設有4個 Register 可用 
=> 邊少於 4 都可隨意著色 
i 
sum 
x/z 
y 
t 
n*
build coalesce simplify select Select 
51 
spill 
code 
進入著色階段, 從Stack中Pop 
出來並著上隨意顏色 
sum 
x/z 
y 
t 
n* 
i 
$r0 
$r1 
$r2 
$r3
build coalesce simplify select Select 
52 
spill 
code 
進入著色階段, 從Stack中Pop 
出來並著上隨意顏色 
x/z 
y 
t 
n* 
sum i 
$r0 
$r1 
$r2 
$r3
build coalesce simplify select Select 
53 
spill 
code 
進入著色階段, 從Stack中Pop 
出來並著上隨意顏色 
y 
t 
n* 
x/z 
sum i 
$r0 
$r1 
$r2 
$r3
build coalesce simplify select Select 
54 
spill 
code 
進入著色階段, 從Stack中Pop 
出來並著上隨意顏色 
t 
n* 
x/z 
sum i 
y 
$r0 
$r1 
$r2 
$r3
build coalesce simplify select Select 
55 
spill 
code 
進入著色階段, 從Stack中Pop 
出來並著上隨意顏色 
n* 
x/z 
sum i 
y 
$r0 t 
$r1 
$r2 
$r3
build coalesce simplify select Select 
56 
spill 
code 
進入著色階段, 從Stack中Pop 
出來並著上隨意顏色 
n* 
x/z 
sum i 
y 
t 
n 
$r0 
$r1 
$r2 
$r3 
發現 n 得不到任何可用顏色! 
進入 Spill 階段
Spill 
57 
• 當 Register 不足時, 則必須將值暫存 
到記憶體中, 必要時再從記憶體中取回
build coalesce simplify select Spill 
58 
spill 
code 
n = arg(n) 
x = n * 2 
store n 
z = x 
sum = 0 
i = 0 
relaod n 
i < n? 
y = i 
t = z + y 
sum = sum + t 
i = i + 1 
return sum 
n = arg(n) 
x = n * 2 
z = x 
sum = 0 
i = 0 
i < n? 
y = i 
t = z + y 
sum = sum + t 
i = i + 1 
return sum 
插入 
Spill Code
build coalesce simplify select Live Range 
59 
spill 
code 
n = arg(n) 
x = n * 2 
store n 
z = x 
sum = 0 
i = 0 
relaod n 
i < n? 
y = i 
t = z + y 
sum = sum + t 
i = i + 1 
return sum 
n x z sum i y t 
n = arg (n) 
x = n * 2 
store n 
z = x 
sum = 0 
i = 0 
reload n 
i < n? 
y = i 
t = z + y 
sum = sum + t 
i = i +1 
return sum 
n1 
n2
Interference 
Graph 
60 
spill 
code 
build coalesce simplify select 
n x z sum i y t 
n = arg (n) 
x = n * 2 
store n 
z = x 
sum = 0 
i = 0 
reload n 
i < n? 
y = i 
t = z + y 
sum = sum + t 
i = i +1 
return sum 
z 
x 
sum i 
y 
t 
n1 n2 
n1 
n2
build coalesce simplify select Coalesce 
61 
spill 
code 
n x z sum i y t 
n = arg (n) 
x = n * 2 
store n 
z = x 
sum = 0 
i = 0 
reload n 
i < n? 
y = i 
t = z + y 
sum = sum + t 
i = i +1 
return sum 
x/z 
sum i 
y 
t 
n1 n2 
n1 
n2
build coalesce simplify select Simplify 
62 
spill 
code 
1 3 
x/z 
sum i 
y 
t 
n1 n2 
3 
3 
5 5 
6
build coalesce simplify select Simplify 
63 
spill 
code 
x/z 
sum i 
y 
t 
n1 
n2 
3 
3 
3 
5 5 
5
build coalesce simplify select Simplify 
64 
spill 
code 
x/z 
sum i 
y 
t 
n1 
n2 
3 
3 
4 4 
4
build coalesce simplify select Simplify 
65 
spill 
code 
x/z 
n2 
3 
y sum i 
t 
n1 
3 3 
3
build coalesce simplify select Simplify 
66 
spill 
code 
x/z 
n2 
y sum i 
t 
n1 
2 2 
2
build coalesce simplify select Simplify 
67 
spill 
code 
x/z 
n2 
y sum i 
t 
n1 
1 1
build coalesce simplify select Simplify 
68 
spill 
code 
sum 
x/z 
n2 
y i 
t 
n1 
1
build coalesce simplify select Simplify 
69 
spill 
code 
i 
sum 
x/z 
n2 
y 
t 
n1
build coalesce simplify select Select 
70 
spill 
code 
i 
sum 
x/z 
n2 
y 
t 
n1 
$r0 
$r1 
$r2 
$r3
build coalesce simplify select Select 
71 
spill 
code 
sum 
x/z 
n2 
y 
t 
n1 
$r0 
$r1 
$r2 
$r3 
i
build coalesce simplify select Select 
72 
spill 
code 
x/z 
n2 
y 
t 
n1 
$r0 
$r1 
$r2 
$r3 
sum i
build coalesce simplify select Select 
73 
spill 
code 
n2 
y 
t 
n1 
$r0 
$r1 
$r2 
$r3 
x/z 
sum i
build coalesce simplify select Select 
74 
spill 
code 
y 
t 
n1 
$r0 
$r1 
$r2 
$r3 
x/z 
n2 
sum i
build coalesce simplify select Select 
75 
spill 
code 
t 
n1 
$r0 
$r1 
$r2 
$r3 
x/z 
sum i 
y 
n2
build coalesce simplify select Select 
76 
spill 
code 
n1 
$r0 
$r1 
$r2 
$r3 
x/z 
sum i 
y 
t 
n2
build coalesce simplify select Select 
77 
spill 
code 
$r0 
$r1 
$r2 
$r3 
x/z 
sum i 
y 
t 
n1 n2
build coalesce simplify select Select 
78 
spill 
code 
$r0 
$r1 
$r2 
$r3 
x/z 
sum i 
y 
t 
n1 n2 
n1 = $r0 
n2 = $r3 
x = $r2 
z = $r2 
sum = $r1 
t = $r3 
y = $r3 
i = $r0
Code Gen 
79 
n1 = $r0 
n2 = $r3 
x = $r2 
z = $r2 
sum = $r1 
t = $r3 
y = $r3 
i = $r0 
$r0 = arg(n) 
$r2 = $r0 * 2 
store $r0->n 
$r2 = $r2 
$r1 = 0 
$r0 = 0 
relaod $r3<-n 
$r0 < $r3? 
$r3 = $r0 
$r3 = $r2+$r3 
$r1 = $r1+$r3 
$r0 = $r0 + 1 
return $r1 
n = arg(n) 
x = n * 2 
store n 
z = x 
sum = 0 
i = 0 
relaod n 
i < n? 
y = i 
t = z + y 
sum = sum + t 
i = i + 1 
return sum
Code Gen 
80 
n1 = $r0 
n2 = $r3 
x = $r2 
z = $r2 
sum = $r1 
t = $r3 
y = $r3 
i = $r0 
$r0 = arg(n) 
$r2 = $r0 * 2 
store $r0->n 
$r2 = $r2 
$r1 = 0 
$r0 = 0 
relaod $r3<-n 
$r0 < $r3? 
$r3 = $r0 
$r3 = $r2+$r3 
$r1 = $r1+$r3 
$r0 = $r0 + 1 
return $r1 
n = arg(n) 
x = n * 2 
store n 
z = x 
sum = 0 
i = 0 
relaod n 
i < n? 
y = i 
t = z + y 
sum = sum + t 
i = i + 1 
return sum 
因Coalesce分配到同 
Register!
Code Gen 
81 
n1 = $r0 
n2 = $r3 
x = $r2 
z = $r2 
sum = $r1 
t = $r3 
y = $r3 
i = $r0 
$r0 = arg(n) 
$r2 = $r0 * 2 
store $r0->n 
$r1 = 0 
$r0 = 0 
relaod $r3<-n 
$r0 < $r3? 
$r3 = $r0 
$r3 = $r2+$r3 
$r1 = $r1+$r3 
$r0 = $r0 + 1 
return $r1 
n = arg(n) 
x = n * 2 
store n 
z = x 
sum = 0 
i = 0 
relaod n 
i < n? 
y = i 
t = z + y 
sum = sum + t 
i = i + 1 
return sum
Rematerialization 
82 
• Materialization: 實現化、具體化
Rematerialization 
83 
• Materialization: 實現化、具體化 
• ReMaterialization: 重現化
Rematerialization 
84 
• Materialization: 實現化、具體化 
• ReMaterialization: 重現化 
• 在 Briggs 論文中所提出的技術: 
–若其值可以很便宜的算出來, 則不進行 
Spill, 取而代之直接重算其結果 
– *便宜的定義因目標不同而意義不同: 
•-O3: Cycle 數少即便宜 
•-Os: Code size 小即便宜
Rematerialization 
85 
$r0 = arg(n) 
$r2 = $r0 * 2 
store $r0->n 
$r1 = 0 
$r0 = 0 
relaod $r3<-n 
$r0 < $r3? 
$r3 = $r0 
$r3 = $r2+$r3 
$r1 = $r1+$r3 
$r0 = $r0 + 1 
return $r1 
n = arg(n) 
x = n * 2 
store n 
z = x 
sum = 0 
i = 0 
relaod n 
i < n? 
y = i 
t = z + y 
sum = sum + t 
i = i + 1 
return sum 
$r0 = arg(n) 
$r2 = $r0 * 2 
$r1 = 0 
$r0 = 0 
$r3 = $r2 / 2 
$r0 < $r3? 
$r3 = $r0 
$r3 = $r2+$r3 
$r1 = $r1+$r3 
$r0 = $r0 + 1 
return $r1 
除以 2 可替 
換為 Shift 
=> 
比存取記憶體 
便宜 
x = n * 2 
=> 
n = x / 2 
=> 
n = z / 2
Split 
4 
86 
• 跟Coalesce相反, 也有將一個點拆成兩 
個的方法 
g f 
b a 
d c 
e 
4 
4 
4 
4 
4 
4 
6 
g f 
a' 
b 
d c 
e 
4 
4 
4 
4 
4 
a 3 
3 
變得容易用 
4個顏色上色
反思Coalesce 
4 
87 
• 不好的Coalesce決策會造成圖難以著色, 
並生出額外的 Spill Code 
g f 
b z/y 
d c 
e 
4 
4 
4 
4 
4 
4 
6 
g f 
x 
b 
d c 
e 
4 
4 
4 
4 
4 
y 3 
3 
變得不容易用 
4個顏色上色
反思Coalesce 
4 
88 
• 不好的Coalesce決策會造成圖難以著色, 
並生出額外的 Spill Code 
• 但不好的Split決策相對會造成多餘Move 
g f 
b z/y 
d c 
e 
4 
4 
4 
4 
4 
4 
6 
g f 
x 
b 
d c 
e 
4 
4 
4 
4 
4 
y 3 
3 
變得不容易用 
4個顏色上色
Split or Coalesce 
To be or not to be, that is the question 
89 
• RA的學術研究上, 如何Split及 
Coalesce都可單獨成為一篇論文... 
• Coalesce在CGO'07[1]時被證明本身也是 
NP-Complete 
[1] Florent Bouchez, Alain Darte, and Fabrice Rastello. On the Complexity of Register 
Coalescing. In Proceedings of the International Symposium on Code Generation 
and Optimization, CGO ’07, pages 102–114, Washington, DC, USA, 2007. IEEE 
Computer Society.
90 
從理論到現實
從理論到現實 
91 
• 演算法面看似很容易理解 
– 帶入現實情況就會發現基礎的Graph Coloring 
沒涵蓋一些問題... 
•Caller-save reg, Callee-save reg 
•參數/回傳值放置位置 
•不同的Register Class 
•Register Pair 
•累加器(Accumulator) 
•Memory Operand
Caller-save Register or 
Callee-save Register 
92 
x = 10 
... 
foo() 
... 
y = 20 
z = x + y 
若 x 被分配到 
Caller-save Register x = 10 
... 
store x 
foo() 
reload x 
... 
y = 20 
z = x + y 
跨越 Function Call 
就必須store/reload
Caller-save Register or 
Callee-save Register 
93 
但用到 
Callee-save Register 
也必須在 Function 開頭 
及結尾儲存跟復原 
foo: 
store $r3 
... 
... 
... 
... 
restore $r3 
return
Caller-save Register or 
Callee-save Register 
94 
• 每個 Register 的使用成本隨著使用情境 
不同 
• 著色時並非只看能否成功著色, 選合適的 
Register 也是相當重要的一件事
參數/回傳值放置位置 
95 
• ABI 會規定參數與回傳值放置的方式 
– 以nds32為例:第一個參數放 $r0, 回傳值也放 $r0 
z 
x 
sum i 
y 
t 
n = arg(n) n 
x = n * 2 
z = x 
sum = 0 
i = 0 
i < n? 
y = i 
t = z + y 
sum = sum + t 
i = i + 1 
return sum 
以前面範例來看: 
間接的使得 n 以及 sum 
都必須使用 $r0, 否則需 
要額外 move 指令 
這項限制將會使得上色更 
為困難, 以這張圖為例, 
目前已經是不合法的著色
不同的Register Class 
96 
• 最常見是分為兩大類: 
– Floating Point Register 
– General Purpose Register 
int x = 10; 
float y = 3.144444; 
... 
... 
y 
x 
看起來好像有 
Interference 
但因為 Class 不同 
所以沒影響 
衍生思考: 只算帳面 Edge 數 
無法正確判定是否可簡單著色
Register Pair 
97 
• 在許多架構中兩個 Floating Point Register 
可合併成一個 double precision floating 
point 
double x = 1.6188888; 
float y = 3.144444; 
... 
... 
y 
x 
看起來好像只有一條 
Edge 但 x 會吃掉兩個 
Register 
再次思考: 只算帳面 Edge 數 
真的不太可靠
累加器(Accumulator) 
98 
• 在 x86 架構中有累加器, 其常見格式為 
– a = a op b 
–回想前面例子, 其分配結果無法直接套 
用在累加器架構下... 
– CB 演算法提出背景是RISC GPR架構!
Memory Operand 
99 
• 在 CISC 架構中有許多指令 Operand 可 
間接定址並且運算 
– addl %eax, -4(%ebp) ! *(%ebp-4) += %eax 
n = arg(n) 
x = n * 2 
z = x 
sum = 0 
i = 0 
i < n? 
y = i 
t = z + y 
sum = sum + t 
i = i + 1 
return sum 
sum 若採用 memory operand, 建 
構出來的 Graph 會相當不一樣
Graph Coloring 外的選擇 
100 
• Linear Scan 
• PBQP 
• Puzzle 
• Network Flow 
• ILP
101 
Register Allocation 
in 
GCC
GCC RA演進 
102 
• Local + Global + Reload 
• IRA + Reload 
– GCC 4.4 
• IRA + LRA 
– GCC 4.8
Local + Global + Reload 
103 
Ref: Vladimir. N. .Makarov “Fighting register pressure in GCC”, GCC Submit 2004
Reload 
104 
• http://gcc.gnu.org/wiki/reload
Reload 
105 
• http://gcc.gnu.org/wiki/reload
Reload 
• Spill code generation 
• Instruction/register constraint validation 
• Constant pool building 
• Turning non-strict RTL into strict RTL (doing more 
of the above in evil ways) 
• Register elimination--changing frame pointer 
references to stack pointer references 
• Reload inheritance--essentially a builtin CSE pass 
on spill code 
106
Reload 
107 
• 產生 Spill code 
• 驗證所有指令跟暫存器的 Constraint 
• 建立 Constant pool 
• 把所有 non-strict RTL 轉成 strict RTL 
– 用超多邪惡的方法!!! 
• 針對 frame pointer 跟 stack pointer 進行大融合 
• 內建小型 CSE
Reload 
108 
• 產生 Spill code 
• 驗證所有指令跟暫存器的 Constraint 
• 建立 Constant pool 
• 把所有 non-strict RTL 轉成 strict RTL 
– 用超多邪惡的方法!!! 
• 針對 frame pointer 跟 stack pointer 進行大融合 
• 內建小型 CSE 
Reload 最大問題在於所有東西黏在一團, 
難以修改維護...加新功能?別鬧了... 
git blame reload1.c |awk '{ print $3}' | awk -F - '{ print $1}' | sort | uniq -c 
git blame reload.c |awk '{ print $3}' | awk -F - '{ print $1}' | sort | uniq -c 
可透過這兩道指令觀察到 reload 大約都是199x年遺留下來的....
IRA + Reload 
109 
Ref: Vladimir. N. .Makarov, “The top-down regional register allocator for irregular register file architectures”
IRA 
110 
Ref: Vladimir. N. .Makarov, “The top-down regional register allocator for irregular register file architectures”
IRA 
111 
Ref: Vladimir. N. .Makarov, “The top-down regional register allocator for irregular register file architectures”
IRA 
112 
Ref: Vladimir. N. .Makarov, “The top-down regional register allocator for irregular register file architectures”
IRA 
113 
• CB 演算法為迭代式(Iterative Style), 
但 IRA 整個流程只走一次 
– 時間與品質的取捨 
– 尚未完整整合Reload
IRA 
114 
• Region-based Graph Coloring Register 
allocation 
– 主要參考 C ALLAHAN , D., AND KOBLENZ , B. 
Register allocation via hierarchical graph 
coloring. SIGPLAN 26, 6 (1991),192–203. 
– 要掌握 IRA 理論面, 至少要讀完 ira.c 註解 
上列的幾篇參考文獻
深入觀察 IRA 
115 
• 以 nds32 target 為實驗平台 
• 透過其 dump 資訊來研究 IRA: 
– nds32le-elf-gcc foo.c -O0 -fdump-rtl-ira 
-fira-verbose=9 -fomit-frame-pointer 
•-O0:避免程式在Middle-End時, 被最佳化 
打亂 
•-fdump-rtl-ira: 吐出 IRA 的 dump 
•-fira-verbose=9: 吐出最多的 IRA 資訊 
•-fomit-frame-pointer: 不用fp, 以簡化 
輸出
Let's hack IRA! 
避免IRA便宜 
行事, 導致無 
法觀察 
116 
diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c 
index f6da5d6..9f96d25 100644 
--- a/gcc/cfgexpand.c 
+++ b/gcc/cfgexpand.c 
@@ -5626,7 +5626,8 @@ pass_expand::execute (function *fun) 
edge e; 
rtx var_seq, var_ret_seq; 
unsigned i; 
+ int saved_optimize = optimize; 
+ optimize = 2; 
timevar_push (TV_OUT_OF_SSA); 
rewrite_out_of_ssa (&SA); 
timevar_pop (TV_OUT_OF_SSA); 
@@ -5999,6 +6000,7 @@ pass_expand::execute (function *fun) 
timevar_pop (TV_POST_EXPAND); 
+ optimize = saved_optimize; 
return 0; 
} 
diff --git a/gcc/ira.c b/gcc/ira.c 
index ccc6c79..16d3e55 100644 
--- a/gcc/ira.c 
+++ b/gcc/ira.c 
@@ -5032,7 +5032,8 @@ ira (FILE *f) 
int rebuild_p; 
bool saved_flag_caller_saves = flag_caller_saves; 
enum ira_region saved_flag_ira_region = flag_ira_region; 
+ optimize = 2; 
+ flag_ira_region = IRA_REGION_ALL; 
+ flag_expensive_optimizations = true; 
ira_conflicts_p = optimize > 0; 
ira_use_lra_p = targetm.lra_p (); 
diff --git a/gcc/ira-build.c b/gcc/ira-build.c 
index ee20c09..1a64748 100644 
--- a/gcc/ira-build.c 
+++ b/gcc/ira-build.c 
@@ -2608,6 +2608,8 @@ remove_low_level_allocnos (void) 
static void 
remove_unnecessary_regions (bool all_p) 
{ 
+ if (!all_p) 
+ return; 
if (current_loops == NULL) 
return; 
if (all_p) 
避免GCC, GIMPLE->RTL 因-O0 
而產生過度冗餘的 Code 
避免IRA進行 
Region 融合
Let's hack IRA! diff --git a/gcc/config/nds32/nds32.h b/gcc/config/nds32/nds32.h 
117 
index bbcf100..a345638 100644 
--- a/gcc/config/nds32/nds32.h 
+++ b/gcc/config/nds32/nds32.h 
@@ -503,13 +503,13 @@ enum nds32_builtins 
reserved for other use : $r24, $r25, $r26, $r27 */ 
#define FIXED_REGISTERS  
{ /* r0 r1 r2 r3 r4 r5 r6 r7 */  
- 0, 0, 0, 0, 0, 0, 0, 0,  
+ 0, 0, 0, 0, 1, 1, 1, 1,  
/* r8 r9 r10 r11 r12 r13 r14 r15 */  
- 0, 0, 0, 0, 0, 0, 0, 1,  
+ 1, 1, 1, 1, 1, 1, 1, 1,  
/* r16 r17 r18 r19 r20 r21 r22 r23 */  
- 0, 0, 0, 0, 0, 0, 0, 0,  
+ 1, 1, 1, 1, 1, 1, 1, 1,  
/* r24 r25 r26 r27 r28 r29 r30 r31 */  
- 1, 1, 1, 1, 0, 1, 0, 1,  
+ 1, 1, 1, 1, 1, 1, 1, 1,  
/* ARG_POINTER:32 */  
1,  
/* FRAME_POINTER:33 */  
@@ -524,13 +524,13 @@ enum nds32_builtins 
1 : caller-save registers */ 
#define CALL_USED_REGISTERS  
{ /* r0 r1 r2 r3 r4 r5 r6 r7 */  
- 1, 1, 1, 1, 1, 1, 0, 0,  
+ 1, 1, 1, 1, 1, 1, 1, 1,  
/* r8 r9 r10 r11 r12 r13 r14 r15 */  
- 0, 0, 0, 0, 0, 0, 0, 1,  
+ 1, 1, 1, 1, 1, 1, 1, 1,  
/* r16 r17 r18 r19 r20 r21 r22 r23 */  
1, 1, 1, 1, 1, 1, 1, 1,  
/* r24 r25 r26 r27 r28 r29 r30 r31 */  
- 1, 1, 1, 1, 0, 1, 0, 1,  
+ 1, 1, 1, 1, 1, 1, 1, 1,  
/* ARG_POINTER:32 */  
1,  
/* FRAME_POINTER:33 */  
為方便小程式就會 Spill, 將 
nds32 改成只有 4 個 Register
實驗環境準備 
118 
• git clone git://gcc.gnu.org/git/gcc.git ~/gcc-ra-test/ 
src 
• # 套上前兩頁的修改 
• mkdir ~/build-gcc-ra-test 
• cd ~/build-gcc-ra-test 
• ~/gcc-ra-test/src/configure --prefix=$HOME/gcc-ra-test 
--target=nds32le-elf 
• make all-gcc -j8 && make install-gcc
IRA 
119 
int foo(int n) 
{ 
int i, y, t, z; 
int x = n * 2; 
z = x; 
int sum = 0; 
for (i=0; i<n; ++i) { 
y = i; 
t = z + y; 
sum = sum + t; 
} 
return sum; 
}
RTL 
120 
(insn 2 4 3 2 (set (reg/v:SI 48 [ n ]) 
(reg:SI 0 $r0 [ n ]))) 
(insn 6 3 7 2 (set (reg/v:SI 42 [ x ]) 
(ashift:SI (reg/v:SI 48 [ n ]) 
(const_int 1 [0x1])))) 
(insn 7 6 8 2 (set (reg/v:SI 43 [ z ]) 
(reg/v:SI 42 [ x ]))) 
(insn 8 7 9 2 (set (reg/v:SI 41 [ sum ]) 
(const_int 0 [0]))) 
(insn 9 8 33 2 (set (reg/v:SI 40 [ i ]) 
(const_int 0 [0]))) 
(jump_insn 33 9 34 2 (set (pc) 
(label_ref 17))) 
(code_label 19 34 12 3 3) 
(insn 13 12 14 3 (set (reg/v:SI 44 [ y ]) 
(reg/v:SI 40 [ i ]))) 
(insn 14 13 15 3 (set (reg/v:SI 45 [ t ]) 
(plus:SI (reg/v:SI 43 [ z ]) 
(reg/v:SI 44 [ y ])))) 
(insn 15 14 16 3 (set (reg/v:SI 41 [ sum ]) 
(plus:SI (reg/v:SI 41 [ sum ]) 
(reg/v:SI 45 [ t ])))) 
(insn 16 15 17 3 (set (reg/v:SI 40 [ i ]) 
(plus:SI (reg/v:SI 40 [ i ]) 
(const_int 1 [0x1])))) 
(code_label 17 16 18 4 2) 
(insn 20 18 21 4 (set (reg:SI 15 $ta) 
(lt:SI (reg/v:SI 40 [ i ]) 
(reg/v:SI 48 [ n ])))) 
(jump_insn 21 20 22 4 (set (pc) 
(if_then_else (ne (reg:SI 15 $ta) 
(const_int 0 [0])) 
(label_ref 19) 
(pc)))) 
(insn 23 22 26 5 (set (reg:SI 46 [ D.1385 ]) 
(reg/v:SI 41 [ sum ]))) 
(insn 26 23 30 5 (set (reg:SI 47 [ <retval> ]) 
(reg:SI 46 [ D.1385 ]))) 
(insn 30 26 31 5 (set (reg/i:SI 0 $r0) 
(reg:SI 47 [ <retval> ]))) 
(insn 31 30 0 5 (use (reg/i:SI 0 $r0)))
RTL 
121 
n(r48) = n($r0) 
x(r42) = n(r8) * 2 
z(r43) = x(r42) 
sum(r41) = 0 
i(r40) = 0 
i(r40) < n(r48)? 
y(r44) = i(r40) 
t(r45) = z(r43) + y(r44) 
sum(r41) = sum(r41) + t(r45) 
i(r40) = i(r40) + 1 
D.1385(r46) = sum(r41) 
<retval>(r47) = D.1385(r46) 
$r0 = <retval>(r47) 
上一頁那沱 RTL 大致等價 
於右邊的 CFG
Pseudo Register 
122 
• GCC RTL中在RA前有無限多 Pseudo Register 
– GCC Pseudo Register 不是 SSA Form 
(set (reg:SI 45 [ t ]) 
(plus:SI (reg:SI 43 [ z ]) 
(reg:SI 44 [ y ]))) 
• 相對應的就是 Hard Register 
會保留在上層 
Pseudo Register Number IR(GIMPLE)的變數名稱 
(set (reg/v:SI 48 [ n ]) 
(reg:SI 0 $r0 [ n ]))
Pseudo<->Var 
123 
Variable Name Pseudo Register 
Number 
Comment 
i 40 
sum 41 
x 42 
y 44 
z 43 
t 45 
n 48 
D.1385 46 Temp variable create by expand 
retval 47 Return Value
ABI 
124 
• Expand (Gimple->RTL) 階段時, 會針對參數及 
回傳值插入額外 move 指令 
# 參數 
(insn 2 4 3 2 (set (reg/v:SI 48 [ n ]) 
(reg:SI 0 $r0 [ n ])) foo.c:2 27 {*movsi} 
(nil)) 
... 
# 回傳值 
(insn 23 22 26 5 (set (reg:SI 46 [ D.1385 ]) 
(reg/v:SI 41 [ sum ])) foo.c:12 27 {*movsi} 
(nil)) 
(insn 26 23 30 5 (set (reg:SI 47 [ <retval> ]) 
(reg:SI 46 [ D.1385 ])) foo.c:12 27 {*movsi} 
(nil)) 
(insn 30 26 31 5 (set (reg/i:SI 0 $r0) 
(reg:SI 47 [ <retval> ])) foo.c:13 27 {*movsi} 
(nil)) 
第一個參數放 $r0 
回傳值也放 $r0
Region 
125 
int foo(int n) 
{ 
int i, y, t, z; 
int x = n * 2; 
z = x; 
int sum = 0; 
for (i=0; i<n; ++i) { 
y = i; 
t = z + y; 
sum = sum + t; 
} 
Region 1 
return sum; 
} Region 0 
t(r45) = z(r43) + y(r44) 
sum(r41) = sum(r41) + t(r45) 
根據 Loop Structure 建立 Region, 
若該 Loop Register Pressure 較低則會與 
上層 Region 合併 
n(r48) = n($r0) 
x(r42) = n(r8) * 2 
z(r43) = x(r42) 
sum(r41) = 0 
i(r40) = 0 
i(r40) < n(r48)? 
y(r44) = i(r40) 
i(r40) = i(r40) + 1 
D.1385(r46) = sum(r41) 
<retval>(r47) = D.1385(r46) 
$r0 = <retval>(r47)
Allocno 
126 
• 在IRA 中 Pseudo Register 並不是點的 
單位, Allocno 在IRA才是等同於圖中的 
點– 
ira_allcno_t 
• 在不同 Region 中一個 Pseudo Register 都有 
各自的 Allocno 
– Split by region
Allocno 
127 
• 一個 Allocno 由數個 Live Range 組成 
• Live Range 在 IRA 中相對應的結構是 
live_range_t 
– live_range_t = start/end program point
Build Allocno 
128 
IRA DUMP 
... 
a0(r47,l0) costs: LOW_REGS:0,0 MIDDLE_REGS:0,0 GENERAL_REGS:0,0 ALL_REGS:0,0 MEM:2,2 
a1(r46,l0) costs: LOW_REGS:0,0 MIDDLE_REGS:0,0 GENERAL_REGS:0,0 ALL_REGS:0,0 MEM:2,2 
a2(r41,l0) costs: LOW_REGS:0,0 MIDDLE_REGS:0,0 GENERAL_REGS:0,0 ALL_REGS:0,0 MEM:6,22 
a3(r40,l0) costs: LOW_REGS:0,0 MIDDLE_REGS:0,0 GENERAL_REGS:0,0 ALL_REGS:0,0 MEM:5,30 
a4(r43,l0) costs: LOW_REGS:0,0 MIDDLE_REGS:0,0 GENERAL_REGS:0,0 ALL_REGS:0,0 MEM:1,9 
a5(r42,l0) costs: LOW_REGS:0,0 MIDDLE_REGS:0,0 GENERAL_REGS:0,0 ALL_REGS:0,0 MEM:9,9 
a6(r48,l0) costs: LOW_REGS:0,0 MIDDLE_REGS:0,0 GENERAL_REGS:0,0 ALL_REGS:0,0 MEM:9,17 
a7(r40,l1) costs: LOW_REGS:0,0 MIDDLE_REGS:0,0 GENERAL_REGS:0,0 ALL_REGS:0,0 MEM:25,25 
a8(r41,l1) costs: LOW_REGS:0,0 MIDDLE_REGS:0,0 GENERAL_REGS:0,0 ALL_REGS:0,0 MEM:16,16 
a9(r43,l1) costs: LOW_REGS:0,0 MIDDLE_REGS:0,0 GENERAL_REGS:0,0 ALL_REGS:0,0 MEM:8,8 
a10(r48,l1) costs: LOW_REGS:0,0 MIDDLE_REGS:0,0 GENERAL_REGS:0,0 ALL_REGS:0,0 MEM:8,8 
a11(r45,l1) costs: LOW_REGS:0,0 MIDDLE_REGS:0,0 GENERAL_REGS:0,0 ALL_REGS:0,0 MEM:16,16 
a12(r44,l1) costs: LOW_REGS:0,0 MIDDLE_REGS:0,0 GENERAL_REGS:0,0 ALL_REGS:0,0 MEM:9,9 
... 
a12(r44,l1) 
Allocno 
Pseudo Register 
Region
Build Allocno 
129 
n(r48/a3) = n($r0) 
x(r42/a5) = n(r8/a6) * 2 
z(r43/a4) = x(r42/a5) 
sum(r41/a2) = 0 
i(r40/a3) = 0 
i(r40/a7) < n(r48/a10)? 
y(r44/a12) = i(r40/a7) 
t(r45/a11) = z(r43/a9) + y(r44/a12) 
sum(r41/a8) = 
sum(r41/a8) + t(r45/a11) 
i(r40/a7) = i(r40/a7) + 1 
D.1385(r46/a1) = sum(r41/a2) 
<retval>(r47/a0) = D.1385(r46/a1) 
$r0 = <retval>(r47/a0) 
Variable 
Name 
Pseudo 
Register 
Number 
Region 0 Region 1 
i 40 a3 a7 
sum 41 a2 a8 
x 42 a5 
y 44 a12 
z 43 a4 a9 
t 45 a11 
n 48 a6 a10 
D.1385 46 a1 
retval 47 a0
Build Cap 
130 
Variable 
Name 
Pseudo 
Register 
Number 
Region 0 Region 1 
i 40 a3 a7 
sum 41 a2 a8 
x 42 a5 
y 44 a13 a12 
z 43 a4 a9 
t 45 a14 a11 
n 48 a6 a10 
D.1385 46 a1 
retval 47 a0 
n(r48/a6) = n($r0) 
x(r42/a5) = n(r8/a6) * 2 
z(r43/a4) = x(r42/a5) 
sum(r41/a2) = 0 
i(r40/a3) = 0 
i(r40/a7) < n(r48/a10)? 
y(r44/a12) = i(r40/a7) 
t(r45/a11) = z(r43/a9) + y(r44/a12) 
sum(r41/a8) = 
sum(r41/a8) + t(r45/a11) 
i(r40/a7) = i(r40/a7) + 1 
D.1385(r46/a1) = sum(r41/a2) 
<retval>(r47/a0) = D.1385(r46/a1) 
$r0 = <retval>(r47/a0)
Cap 
131 
• Allocno 的一種, 代表內層迴圈的變數 
–用來輔助計算外層迴圈的Allocno Cost
Build Copy 
132 
n(r48/a6) = n($r0) 
x(r42/a5) = n(r8/a6) * 2 
z(r43/a4) = x(r42/a5) 
sum(r41/a2) = 0 
i(r40/a3) = 0 
i(r40/a7) < n(r48/a10)? 
y(r44/a12) = i(r40/a7) 
t(r45/a11) = z(r43/a9) + y(r44/a12) 
sum(r41/a8) = 
sum(r41/a8) + t(r45/a11) 
i(r40/a7) = i(r40/a7) + 1 
D.1385(r46/a1) = sum(r41/a2) 
<retval>(r47/a0) = D.1385(r46/a1) 
$r0 = <retval>(r47/a0) 
... 
cp0:a1(r46)<->a2(r41)@1:move 
cp1:a0(r47)<->a1(r46)@1:move 
cp2:a4(r43)<->a5(r42)@1:move 
cp3:a11(r45)<->a12(r44)@1:shuffle 
cp4:a13(r45)<->a14(r44)@1:shuffle 
...
Copy 
133 
• 除了原本 `a = b` 這類指令外 IRA 也 
引進另一類型的coalesce 
– `a = b op c` 
–若a與b或c無Interference 則也可 
coalesce 
–可減少 Register Pressure
Register 
Preference 
134 
n(r48/a6) = n($r0) 
x(r42/a5) = n(r8/a6) * 2 
z(r43/a4) = x(r42/a5) 
sum(r41/a2) = 0 
i(r40/a3) = 0 
i(r40/a7) < n(r48/a10)? 
y(r44/a12) = i(r40/a7) 
t(r45/a11) = z(r43/a9) + y(r44/a12) 
sum(r41/a8) = 
sum(r41/a8) + t(r45/a11) 
i(r40/a7) = i(r40/a7) + 1 
D.1385(r46/a1) = sum(r41/a2) 
<retval>(r47/a0) = D.1385(r46/a1) 
$r0 = <retval>(r47/a0) 
... 
pref0:a0(r47)<-hr0@2 
pref1:a6(r48)<-hr0@2 
...
Register 
Preference 
135 
• 某些 Allocno 若分配到特定暫存器則可 
減少 Move 指令 
–主要來自ABI的限制
Cost 
136 
ira-costs.c 
算 Cost 的邏輯都放這!
Cost 
137 
• 參考 Profile feed back 資訊 
– 沒Profile資訊GCC會自己猜機率(predict.c) 
• [1] "Branch Prediction for Free" Ball and Larus; PLDI '93. 
• [2] "Static Branch Frequency and Program Profile Analysis" Wu and Larus; MICRO-27. 
• [3] "Corpus-based Static Branch Prediction" Calder, Grunwald, Lindsay, Martin, 
Mozer, and Zorn; PLDI '95. */ 
• 由上往下(Top-Down)的Region算下去
Preferred 
Register Class 
138 
• 在記算 Cost 階段時也會計算每個 
Allocno 所偏好的 Register Class 
–參考Register Preference
Preferred 
Register Class 
139 
• IRA 在這邊有作一些特殊處理 
– 以 x86 為例: 
•若 Allocno a 有 8 use, 1 def:若其中 
只有一個地方一定要 EBX, 而其它地方需 
要 EAX, 那該 Allocno 一樣會偏好 EAX 
•需要 EBX 的地方則留給 Reload 處理 
•在一般文獻中會採用所有使用到的Register 
Class 的交集(Intersect)
Coloring 
Region 0 
140 
Loop 0 (parent -1, header bb2, depth 0) 
... 
Forming thread by copy 0:a1r46-a2r41 (freq=1): 
Result (freq=6): a1r46(2) a2r41(4) 
Forming thread by copy 1:a0r47-a1r46 (freq=1): 
Result (freq=8): a0r47(2) a1r46(2) a2r41(4) 
Pushing a5(r42,l0)(cost 0) 
Pushing a1(r46,l0)(cost 0) 
Pushing a0(r47,l0)(cost 0) 
Pushing a4(r43,l0)(potential spill: pri=3, cost=16) 
Making a13(r45,l0: a11(r45,l1)) colorable 
Forming thread by copy 4:a13r45-a14r44 (freq=1): 
Result (freq=4): a13r45(2) a14r44(2) 
Making a14(r44,l0: a12(r44,l1)) colorable 
Pushing a14(r44,l0: a12(r44,l1))(cost 16) 
Making a2(r41,l0) colorable 
Making a3(r40,l0) colorable 
Making a6(r48,l0) colorable 
Pushing a6(r48,l0)(cost 28) 
Pushing a13(r45,l0: a11(r45,l1))(cost 16) 
Pushing a3(r40,l0)(cost 40) 
Pushing a2(r41,l0)(cost 32) 
Popping a2(r41,l0) -- assign reg 1 
Popping a3(r40,l0) -- assign reg 2 
Popping a13(r45,l0: a11(r45,l1)) -- assign reg 3 
Popping a6(r48,l0) -- assign reg 0 
Popping a14(r44,l0: a12(r44,l1)) -- assign reg 3 
Popping a4(r43,l0) -- spill 
Popping a0(r47,l0) -- assign reg 0 
Popping a1(r46,l0) -- assign reg 0 
Popping a5(r42,l0) -- assign reg 1 
n(r48/a6) = n($r0) 
x(r42/a5) = n(r8/a6) * 2 
z(r43/a4) = x(r42/a5) 
sum(r41/a2) = 0 
i(r40/a3) = 0 
i(r40/a7) < n(r48/a10)? 
y(r44/a12) = i(r40/a7) 
t(r45/a11) = z(r43/a9) + y(r44/a12) 
sum(r41/a8) = 
sum(r41/a8) + t(r45/a11) 
i(r40/a7) = i(r40/a7) + 1 
D.1385(r46/a1) = sum(r41/a2) 
<retval>(r47/a0) = D.1385(r46/a1) 
$r0 = <retval>(r47/a0)
Coloring 
Region 0 
141 
n(r48/a6) = n($r0) 
x(r42/a5) = n(r8/a6) * 2 
z(r43/a4) = x(r42/a5) 
sum(r41/a2) = 0 
i(r40/a3) = 0 
i(r40/a7) < n(r48/a10)? 
y(r44/a12) = i(r40/a7) 
t(r45/a11) = z(r43/a9) + y(r44/a12) 
sum(r41/a8) = 
sum(r41/a8) + t(r45/a11) 
i(r40/a7) = i(r40/a7) + 1 
D.1385(r46/a1) = sum(r41/a2) 
<retval>(r47/a0) = D.1385(r46/a1) 
$r0 = <retval>(r47/a0) 
x/a5 
z/a4 
t/ 
a14 
sum/ 
a2 i/a3 
y/ 
a13 
n/a6 
D.1385/ 
a1 
retval/ 
a0
Coloring 
Region 0 
142 
x/a5 
z/a4 
t/ 
a14 
sum/ 
a2 i/a3 
y/ 
a13 
n/a6 
D.1385/ 
a1 
retval/ 
a0 
ira中coalesce並不會將點 
合併而是用一種稱為Thread 
的方式把copy相關的點一起 
推到 Stack
Colorable 
143 
• 計算Colorable的方式並非計算邊數: 
–以x86為例: 
•若Allocno a可用EAX及EBX,且有3個邊, 
但其它相鄰的點都僅可使用EAX, 這樣一樣 
是 Colorable 
a 
{EAX,EBX} 
b 
{EAX} 
c 
{EAX} 
d 
{EAX}
Coloring 
Region 0 
144 
x/a5 
z/a4 
t/ 
a14 
sum/ 
a2 i/a3 
y/ 
a13 
n/a6 
D.1385/ 
a1 
retval/ 
a0 
IRA 會先依照初始的邊數 
將所有 Allocno 分兩堆 
可簡單上色的一堆(邊小於4的) 
不可簡單上色的一堆(邊大於等於4的) 
colorable 
bucket 
uncolorable 
bucket
Coloring 
Region 0 
145 
t/ 
a14 
y/ 
a13 
n/a6 
x/a5 z/a4 
i/a3 
sum/ 
a2 
D.1385/ 
a1 
retval/ 
a0 
colorable 
bucket 
uncolorable 
bucket 
x/a5 
z/a4 
t/ 
a14 
sum/ 
a2 i/a3 
y/ 
a13 
n/a6 
D.1385/ 
a1 
retval/ 
a0
Coloring 
Region 0 
146 
z/a4 
x/a5 
n/a6 
sum/ 
a2 
i/a3 
y/ 
a13 
t/ 
a14 
D.1385/ 
a1 
retval/ 
a0 
colorable 
bucket 
uncolorable 
bucket 
sort uncolorable bucket by spill cost 
x/a5 
z/a4 
t/ 
a14 
sum/ 
a2 i/a3 
y/ 
a13 
n/a6 
D.1385/ 
a1 
retval/ 
a0
Coloring 
Region 0 
147 
先將所有 Colorable bucket 的推進 stack 
z/a4 
x/a5 
n/a6 
sum/ 
a2 
i/a3 
y/ 
a13 
t/ 
a14 
D.1385/ 
a1 
retval/ 
a0 
colorable 
bucket 
uncolorable 
bucket 
Stack 
x/a5 
z/a4 
t/ 
a14 
sum/ 
a2 i/a3 
y/ 
a13 
n/a6 
D.1385/ 
a1 
retval/ 
a0 
1 
5 
6 
4 
4 
5 5 
0 0
Coloring 
Region 0 
148 
先將所有 Colorable bucket 的推進 stack 
z/a4 
x/a5 
n/a6 
sum/ 
a2 
i/a3 
y/ 
a13 
t/ 
a14 
D.1385/ 
a1 
retval/ 
a0 
colorable 
bucket 
uncolorable 
bucket 
Stack 
z/a4 
t/ 
a14 
sum/ 
a2 i/a3 
y/ 
a13 
n/a6 
D.1385/ 
a1 
retval/ 
a0 
5 
5 
4 
4 
5 5 
0 0
Coloring 
Region 0 
149 
先將所有 Colorable bucket 的推進 stack 
z/a4 
x/a5 
n/a6 
sum/ 
a2 
i/a3 
y/ 
a13 
t/ 
a14 
D.1385/ 
a1 
retval/ 
a0 
colorable 
bucket 
uncolorable 
bucket 
Stack 
z/a4 
t/ 
a14 
sum/ 
a2 i/a3 
y/ 
a13 
n/a6 
retval/ 
a0 
5 
5 
4 
4 
5 5 
0
Coloring 
Region 0 
150 
先將所有 Colorable bucket 的推進 stack 
z/a4 
retval/ 
a0 
x/a5 
n/a6 
sum/ 
a2 
i/a3 
y/ 
a13 
t/ 
a14 
D.1385/ 
a1 
colorable 
bucket 
uncolorable 
bucket 
Stack 
z/a4 
t/ 
a14 
sum/ 
a2 i/a3 
y/ 
a13 
n/a6 
5 
5 
4 
4 
5 5
Coloring 
Region 0 
151 
先將所有 Colorable bucket 的推進 stack 
z/a4標為 
Spill 候選人 
z/a4 
retval/ 
a0 
x/a5 
n/a6 
sum/ 
a2 
i/a3 
y/ 
a13 
t/ 
a14 
D.1385/ 
a1 
colorable 
bucket 
uncolorable 
bucket 
Stack 
t/ 
a14 
sum/ 
a2 i/a3 
y/ 
a13 
n/a6 
4 
3 
3 
4 4
Coloring 
Region 0 
152 
先將所有 Colorable bucket 的推進 stack 
z/a4標為 
Spill 候選人 
z/a4 
retval/ 
a0 
x/a5 
n/a6 
sum/ 
a2 
t/ 
a14 
y/ i/a3 
a13 
D.1385/ 
a1 
colorable 
bucket 
uncolorable 
bucket 
Stack 
t/ 
a14 
sum/ 
a2 i/a3 
y/ 
a13 
n/a6 
4 
3 
3 
4 4 
y/a13及t/a14 
因此變成 
colorable
Coloring 
Region 0 
153 
先將所有 Colorable bucket 的推進 stack 
z/a4 
retval/ 
a0 
x/a5 
y/ 
a13 
i/a3 
sum/ 
a2 
t/ 
a14 
n/a6 
D.1385/ 
a1 
colorable 
bucket 
uncolorable 
bucket 
Stack 
sum/ 
a2 i/a3 
y/ 
a13 
n/a6 
3 
3 
3 3 
uncolorable 
移到colorable 
會照某種順序排入
Coloring 
Region 0 
154 
先將所有 Colorable bucket 的推進 stack 
n/a6 
z/a4 
retval/ 
a0 
x/a5 
y/ 
a13 
i/a3 
sum/ 
a2 
t/ 
a14 
D.1385/ 
a1 
colorable 
bucket 
uncolorable 
bucket 
Stack 
sum/ 
a2 i/a3 
y/ 
a13 
2 
2 2
Coloring 
Region 0 
155 
先將所有 Colorable bucket 的推進 stack 
n/a6 
z/a4 
retval/ 
a0 
i/a3 y/ 
x/a5 
sum/ 
a2 
a13 
t/ 
a14 
D.1385/ 
a1 
colorable 
bucket 
uncolorable 
bucket 
Stack 
sum/ 
a2 i/a3 1 1
Coloring 
Region 0 
156 
先將所有 Colorable bucket 的推進 stack 
n/a6 
z/a4 
retval/ 
a0 
x/a5 
sum/ 
a2 
i/a3 
y/ 
a13 
t/ 
a14 
D.1385/ 
a1 
colorable 
bucket 
uncolorable 
bucket 
Stack 
sum/ 
0 a2
Coloring 
Region 0 
157 
先將所有 Colorable bucket 的推進 stack 
n/a6 
z/a4 
retval/ 
a0 
x/a5 
sum/ 
a2 
i/a3 
y/ 
a13 
t/ 
a14 
D.1385/ 
a1 
colorable 
bucket 
uncolorable 
bucket 
Stack
Coloring 
Region 0 
158 
n/a6 
z/a4 
retval/ 
a0 
x/a5 
i/a3 
y/ 
a13 
t/ 
a14 
D.1385/ 
a1 
Stack 
$r0 
$r1 
$r2 
$r3 
sum/ 
a2
Coloring 
Region 0 
159 
n/a6 
z/a4 
retval/ 
a0 
x/a5 
y/ 
a13 
t/ 
a14 
D.1385/ 
a1 
Stack 
$r0 
$r1 
$r2 
$r3 
sum/ 
a2 i/a3
Coloring 
Region 0 
160 
n/a6 
t/ 
a14 
z/a4 
retval/ 
a0 
D.1385/ 
a1 
x/a5 
Stack 
$r0 
$r1 
$r2 
$r3 
sum/ 
a2 i/a3 
y/ 
a13
Coloring 
Region 0 
161 
t/ 
a14 
z/a4 
retval/ 
a0 
D.1385/ 
a1 
x/a5 
Stack 
$r0 
$r1 
$r2 
$r3 
sum/ 
a2 i/a3 
y/ 
a13 
n/a6 
D.1385/ 
a1 
retval/ 
a0
Coloring 
Region 0 
162 
z/a4 
retval/ 
a0 
D.1385/ 
a1 
x/a5 
Stack 
$r0 
$r1 
$r2 
$r3 
t/ 
a14 
sum/ 
a2 i/a3 
y/ 
a13 
n/a6
Coloring 
Region 0 
163 
retval/ 
a0 
D.1385/ 
a1 
x/a5 
Stack 
$r0 
$r1 
$r2 
$r3 
z/a4 
t/ 
a14 
sum/ 
a2 i/a3 
y/ 
a13 
n/a6 
z/a4拿不到 
Register 
標為Spill 
但繼續著色
Coloring 
Region 0 
164 
D.1385/ 
a1 
x/a5 
Stack 
$r0 
$r1 
$r2 
$r3 
z/a4 
t/ 
a14 
sum/ 
a2 i/a3 
y/ 
a13 
n/a6 
retval/ 
a0
Coloring 
Region 0 
165 
x/a5 
Stack 
$r0 
$r1 
$r2 
$r3 
z/a4 
t/ 
a14 
sum/ 
a2 i/a3 
y/ 
a13 
n/a6 
D.1385/ 
a1 
retval/ 
a0
Coloring 
Region 0 
166 
Stack 
$r0 
$r1 
$r2 
$r3 
x/a5 
z/a4 
t/ 
a14 
sum/ 
a2 i/a3 
y/ 
a13 
n/a6 
D.1385/ 
a1 
retval/ 
a0
Coloring 
Region 1 
167 
n(r48/a6) = n($r0) 
x(r42/a5) = n(r8/a6) * 2 
z(r43/a4) = x(r42/a5) 
sum(r41/a2) = 0 
i(r40/a3) = 0 
i(r40/a7) < n(r48/a10)? 
y(r44/a12) = i(r40/a7) 
t(r45/a11) = z(r43/a9) + y(r44/a12) 
sum(r41/a8) = 
sum(r41/a8) + t(r45/a11) 
i(r40/a7) = i(r40/a7) + 1 
D.1385(r46/a1) = sum(r41/a2) 
<retval>(r47/a0) = D.1385(r46/a1) 
$r0 = <retval>(r47/a0) 
Loop 1 (parent 0, header bb4, depth 1) 
... 
Forming thread by copy 3:a11r45-a12r44 (freq=1): 
Result (freq=4): a11r45(2) a12r44(2) 
Pushing a12(r44,l1)(cost 0) 
Making a7(r40,l1) colorable 
Making a8(r41,l1) colorable 
Making a10(r48,l1) colorable 
Pushing a10(r48,l1)(cost 40) 
Pushing a8(r41,l1)(cost 48) 
Pushing a11(r45,l1)(cost 0) 
Pushing a7(r40,l1)(cost 64) 
Popping a7(r40,l1) -- assign reg 2 
Popping a11(r45,l1) -- assign reg 3 
Popping a8(r41,l1) -- assign reg 1 
Popping a10(r48,l1) -- assign reg 0 
Popping a12(r44,l1) -- assign reg 3
Coloring 
Region 1 
168 
n(r48/a6) = n($r0) 
x(r42/a5) = n(r8/a6) * 2 
z(r43/a4) = x(r42/a5) 
sum(r41/a2) = 0 
i(r40/a3) = 0 
i(r40/a7) < n(r48/a10)? 
y(r44/a12) = i(r40/a7) 
t(r45/a11) = z(r43/a9) + y(r44/a12) 
sum(r41/a8) = 
sum(r41/a8) + t(r45/a11) 
i(r40/a7) = i(r40/a7) + 1 
D.1385(r46/a1) = sum(r41/a2) 
<retval>(r47/a0) = D.1385(r46/a1) 
$r0 = <retval>(r47/a0) 
z/a9 
t/ 
a11 
sum/ 
a8 i/a7 
y/ 
a12 
n/ 
a10
Coloring 
Region 1 
169 
z/a9 
t/ 
a11 
sum/ 
a8 i/a7 
y/ 
a12 
n/ 
a10 
$r0 
$r1 
$r2 
$r3 z/a4拿不到 
Register, 
z/a9 
繼續拿不到
Coloring 
Region 1 
170 
z/a9 
t/ 
a11 
sum/ 
a8 i/a7 
y/ 
a12 
n/ 
a10 
$r0 
$r1 
$r2 
$r3 z/a4拿不到 
Register, 
z/a9 
繼續拿不到 
所有Spill的Register會在下階段處理
Insert 
Move 
171 
n(r48) = n($r0) 
x(r42) = n(r8) * 2 
z(r43) = x(r42) 
sum(r41) = 0 
i(r40) = 0 
i(r50) = i(r40) 
sum(r51) = sum(r41) 
n(r52) = n(r48) 
i(r50) < n(r52)? 
y(r44) = i(r50) 
t(r45) = z(r43) + y(r44) 
sum(r51) = sum(r51) + t(r45) 
i(r50) = i(r50) + 1 
sum(r41) = sum(r51) 
D.1385(r46) = sum(r41) 
<retval>(r47) = D.1385(r46) 
$r0 = <retval>(r47)
RELOAD 
172 
n(r48) = n($r0) 
x(r42) = n(r8) * 2 
z(r43) = x(r42) 
sum(r41) = 0 
i(r40) = 0 
i(r50) = i(r40) 
sum(r51) = sum(r41) 
n(r52) = n(r48) 
RELOAD 
i(r50) < n(r52)? 
y(r44) = i(r50) 
t(r45) = z(r43) + y(r44) 
sum(r51) = sum(r51) + t(r45) 
i(r50) = i(r50) + 1 
sum(r41) = sum(r51) 
D.1385(r46) = sum(r41) 
<retval>(r47) = D.1385(r46) 
$r0 = <retval>(r47)
RELOAD 
• nds32無實作Reload相關Hook因此不展示 
Reload相關流程 
173
RELOAD 
• nds32無實作Reload相關Hook因此不展示 
Reload相關流程 
• IRA中使用LRA或Reload, 流程會有所不 
同 
174
IRA + LRA 
175 
LRA 
Ref: Vladimir. N. .Makarov, “The top-down regional register allocator for irregular register file architectures”
LRA 
New (and old) 
pseudo assignment 
176 
Remove 
Scratches 
Update virtual 
register 
displacements 
No Change 
Constraints: 
RTL 
transformations 
Inheritance/split 
transformations 
in EBB scope 
Undo inheritance 
for 
spilled pseudos 
Memory-Memory 
move coalesce 
Spilled pseudo 
to memory 
substitution 
Hard reg 
substitution, 
devirtalization 
and restoring scratches 
New pseudos 
or insns 
Start 
Finish 
Ref: Vladimir. N. .Makarov, “The Local Register Allocator Project”, GNU Tools Cauldron 2012
LRA 
New (and old) 
pseudo assignment 
177 
Remove 
Scratches 
Update virtual 
register 
displacements 
No Change 
Constraints: 
RTL 
transformations 
Inheritance/split 
transformations 
in EBB scope 
Undo inheritance 
for 
spilled pseudos 
Memory-Memory 
move coalesce 
Spilled pseudo 
to memory 
substitution 
Hard reg 
substitution, 
devirtalization 
and restoring scratches 
New pseudos 
or insns 
Start 
Finish
Remove Scratches 
178 
處理 match_scratch 
讓它暫時先變 pseudo register 
以利後續處理
LRA 
New (and old) 
pseudo assignment 
179 
Remove 
Scratches 
Update virtual 
register 
displacements 
No Change 
Constraints: 
RTL 
transformations 
Inheritance/split 
transformations 
in EBB scope 
Undo inheritance 
for 
spilled pseudos 
Memory-Memory 
move coalesce 
Spilled pseudo 
to memory 
substitution 
Hard reg 
substitution, 
devirtalization 
and restoring scratches 
New pseudos 
or insns 
Start 
Finish
Update virtual register 
displacements 
180 
• 分配 Stack slot 空間 
– eg: $sp + 0, $sp + 4, ... 
• 融合 $sp 跟 $fp
LRA 
New (and old) 
pseudo assignment 
181 
Remove 
Scratches 
Update virtual 
register 
displacements 
No Change 
Constraints: 
RTL 
transformations 
Inheritance/split 
transformations 
in EBB scope 
Undo inheritance 
for 
spilled pseudos 
Memory-Memory 
move coalesce 
Spilled pseudo 
to memory 
substitution 
Hard reg 
substitution, 
devirtalization 
and restoring scratches 
New pseudos 
or insns 
Start 
Finish
Constraint 
182 
(define_insn "addsi" 
[(set (match_operand:SI 0 "register_operand", "r, r") 
(plus:SI (match_operand:SI 1 "register_operand", "r, r") 
(match_operand:SI 2 "register_operand", "r, Is15")))] 
"" 
... 
Constraint是gcc在md 中用來 
描述指令中Operand的資訊 
以上面為例, 上面描述add可接受兩種格式: 
add $ra, $rb, $rc 
add $ra, $rb, #simm15
LRA Constraints #0 
183 
... 
alt=0,overall=0,losers=0,rld_nregs=0 
Choosing alt 0 in insn 6: (0) =l (1) l (2) Iu03 {ashlsi3} 
0 Non input pseudo reload: reject++ 
alt=0,overall=607,losers=1,rld_nregs=1 
0 Non input pseudo reload: reject++ 
alt=1,overall=607,losers=1,rld_nregs=1 
0 Non pseudo reload: reject++ 
alt=2,overall=1,losers=0,rld_nregs=0 
Choosing alt 2 in insn 7: (0) U45 (1) l {*movsi} 
... 
檢查每道指令是否符合任何一組 Constraint
LRA Constraints #0 ... 
t(r45/r3) = z(r53/X) + y(r44/r3) 
sum(r51/r1) = sum(r51/r1) + t(r45/r3) 
184 
1 Matching alt: reject+=2 
alt=0: Bad operand -- refuse 
alt=1: Bad operand -- refuse 
1 Matching alt: reject+=2 
alt=2: Bad operand -- refuse 
alt=3: Bad operand -- refuse 
1 Matching alt: reject+=2 
alt=4,overall=8,losers=1,rld_nregs=1 
alt=5,overall=6,losers=1,rld_nregs=1 
alt=6: Bad operand -- refuse 
alt=7: Bad operand -- refuse 
alt=8: Bad operand -- refuse 
alt=9,overall=6,losers=1,rld_nregs=1 
alt=0: Bad operand -- refuse 
alt=1: Bad operand -- refuse 
alt=2: Bad operand -- refuse 
alt=3: Bad operand -- refuse 
alt=4,overall=6,losers=1,rld_nregs=1 
alt=5,overall=6,losers=1,rld_nregs=1 
alt=6: Bad operand -- refuse 
alt=7: Bad operand -- refuse 
alt=8: Bad operand -- refuse 
alt=9,overall=6,losers=1,rld_nregs=1 
找不到一組 Constraint 可用 
時會進行 Split 
Commutative operand exchange in insn 14 
Choosing alt 4 in insn 14: (0) d (1) 0 (2) r {addsi3} 
y(r44/r3) = i(r50/r2) 
z(r53/X) = z(r43/M) 
Creating newreg=53 from oldreg=43, assigning class GENERAL_REGS to r53 
14: r45:SI=r44:SI+r53:SI 
REG_DEAD r44:SI 
Inserting insn reload before: 
40: r53:SI=r43:SI 
... 
n(r48/r0) = n($r0) 
x(r42/r1) = n(r8/r0) * 2 
z(r43/M) = x(r42/r1) 
sum(r41/r1) = 0 
i(r40/r2) = 0 
i(r50/r2) = i(r40/r2) 
sum(r51/r1) = sum(r41/r1) 
n(r52/r0) = n(r48/r0) 
i(r50/r2) < n(r52/r0)? 
i(r50/r2) = i(r50/r2) + 1 
sum(r41/r1) = sum(r51/r1) 
D.1385(r46/r1) = sum(r41/r1) 
<retval>(r47/r1) = D.1385(r46/r1) 
$r0 = <retval>(r47/r1)
LRA 
New (and old) 
pseudo assignment 
185 
Remove 
Scratches 
Update virtual 
register 
displacements 
No Change 
Constraints: 
RTL 
transformations 
Inheritance/split 
transformations 
in EBB scope 
Undo inheritance 
for 
spilled pseudos 
Memory-Memory 
move coalesce 
Spilled pseudo 
to memory 
substitution 
Hard reg 
substitution, 
devirtalization 
and restoring scratches 
New pseudos 
or insns 
Start 
Finish 
lra-constraints.c:lra_inheritance()
Inheritance/Split 
186 
• 這部份比較複雜, LRA嘗試最佳化的部份 
–時間關係略過 
– lra-constraints.c:lra_inheritance()
LRA 
New (and old) 
pseudo assignment 
187 
Remove 
Scratches 
Update virtual 
register 
displacements 
No Change 
Constraints: 
RTL 
transformations 
Inheritance/split 
transformations 
in EBB scope 
Undo inheritance 
for 
spilled pseudos 
Memory-Memory 
move coalesce 
Spilled pseudo 
to memory 
substitution 
Hard reg 
substitution, 
devirtalization 
and restoring scratches 
New pseudos 
or insns 
Start 
Finish
LRA Assign #1 
Assigning to 53 (cl=GENERAL_REGS, orig=43, freq=2, tfirst=53, tfreq=2)... 
Trying 0: spill 52(freq=2) Now best 0(cost=0) 
Trying 1: spill 51(freq=3) 
Trying 2: spill 50(freq=4) 
Trying 3: spill 44(freq=2) 
188 
Spill r52(hr=0, freq=2) for r53 
Assign 0 to reload r53 (freq=2) 
要Spill時會先嘗試有沒有 Register, 
沒有的時候只好找個替死鬼...
LRA Assign #1 
Assigning to 53 (cl=GENERAL_REGS, orig=43, freq=2, tfirst=53, tfreq=2)... 
Trying 0: spill 52(freq=2) Now best 0(cost=0) 
Trying 1: spill 51(freq=3) 
Trying 2: spill 50(freq=4) 
Trying 3: spill 44(freq=2) 
t(r45/r3) = z(r53/X) + y(r44/r3) 
sum(r51/r1) = sum(r51/r1) + t(r45/r3) 
189 
Spill r52(hr=0, freq=2) for r53 
Assign 0 to reload r53 (freq=2) 
n(r48/r0) = n($r0) 
x(r42/r1) = n(r8/r0) * 2 
z(r43/M) = x(r42/r1) 
sum(r41/r1) = 0 
i(r40/r2) = 0 
i(r50/r2) = i(r40/r2) 
sum(r51/r1) = sum(r41/r1) 
n(r52/r0) = n(r48/r0) 
i(r50/r2) < n(r52/r0)? 
y(r44/r3) = i(r50/r2) 
z(r53/X) = z(r43/M) 
i(r50/r2) = i(r50/r2) + 1 
sum(r41/r1) = sum(r51/r1) 
D.1385(r46/r1) = sum(r41/r1) 
<retval>(r47/r1) = D.1385(r46/r1) 
$r0 = <retval>(r47/r1) 
分到$r0, 踢掉r52
LRA Assign #1 
Assigning to 53 (cl=GENERAL_REGS, orig=43, freq=2, tfirst=53, tfreq=2)... 
Trying 0: spill 52(freq=2) Now best 0(cost=0) 
Trying 1: spill 51(freq=3) 
Trying 2: spill 50(freq=4) 
Trying 3: spill 44(freq=2) 
t(r45/r3) = z(r53/r0) + y(r44/r3) 
sum(r51/r1) = sum(r51/r1) + t(r45/r3) 
190 
Spill r52(hr=0, freq=2) for r53 
Assign 0 to reload r53 (freq=2) 
n(r48/r0) = n($r0) 
x(r42/r1) = n(r8/r0) * 2 
z(r43/M) = x(r42/r1) 
sum(r41/r1) = 0 
i(r40/r2) = 0 
i(r50/r2) = i(r40/r2) 
sum(r51/r1) = sum(r41/r1) 
n(r52/M) = n(r48/r0) 
i(r50/r2) < n(r52/M)? 
y(r44/r3) = i(r50/r2) 
z(r53/r0) = z(r43/M) 
i(r50/r2) = i(r50/r2) + 1 
sum(r41/r1) = sum(r51/r1) 
D.1385(r46/r1) = sum(r41/r1) 
<retval>(r47/r1) = D.1385(r46/r1) 
$r0 = <retval>(r47/r1) 
分到$r0, 踢掉r52
LRA 
New (and old) 
pseudo assignment 
191 
Remove 
Scratches 
Update virtual 
register 
displacements 
No Change 
Constraints: 
RTL 
transformations 
Inheritance/split 
transformations 
in EBB scope 
Undo inheritance 
for 
spilled pseudos 
Memory-Memory 
move coalesce 
Spilled pseudo 
to memory 
substitution 
Hard reg 
substitution, 
devirtalization 
and restoring scratches 
New pseudos 
or insns 
Start 
Finish
LRA 
New (and old) 
pseudo assignment 
192 
Remove 
Scratches 
Update virtual 
register 
displacements 
No Change 
Constraints: 
RTL 
transformations 
Inheritance/split 
transformations 
in EBB scope 
Undo inheritance 
for 
spilled pseudos 
Memory-Memory 
move coalesce 
Spilled pseudo 
to memory 
substitution 
Hard reg 
substitution, 
devirtalization 
and restoring scratches 
New pseudos 
or insns 
Start 
Finish
Memory-Memory move coalesce 
• 假設一道Move指令的來源與目的的 
Allocno皆分配到Memory (Spill) 
193 
a(r53/Ma) = b(r43/Mb) 
... 
a(r55/$r1) = a(r53/Ma) 
... 
a(r55/$r1) = b(r43/Mb)
LRA 
New (and old) 
pseudo assignment 
194 
Remove 
Scratches 
Update virtual 
register 
displacements 
No Change 
Constraints: 
RTL 
transformations 
Inheritance/split 
transformations 
in EBB scope 
Undo inheritance 
for 
spilled pseudos 
Memory-Memory 
move coalesce 
Spilled pseudo 
to memory 
substitution 
Hard reg 
substitution, 
devirtalization 
and restoring scratches 
New pseudos 
or insns 
Start 
Finish
LRA 
New (and old) 
pseudo assignment 
195 
Remove 
Scratches 
Update virtual 
register 
displacements 
No Change 
Constraints: 
RTL 
transformations 
Inheritance/split 
transformations 
in EBB scope 
Undo inheritance 
for 
spilled pseudos 
Memory-Memory 
move coalesce 
Spilled pseudo 
to memory 
substitution 
Hard reg 
substitution, 
devirtalization 
and restoring scratches 
New pseudos 
or insns 
Start 
Finish
LRA Constraint #2 
t(r45/r3) = z(r53/r0) + y(r44/r3) 
sum(r51/r1) = sum(r51/r1) + t(r45/r3) 
196 
n(r48/r0) = n($r0) 
x(r42/r1) = n(r8/r0) * 2 
z(r43/M) = x(r42/r1) 
sum(r41/r1) = 0 
i(r40/r2) = 0 
i(r50/r2) = i(r40/r2) 
sum(r51/r1) = sum(r41/r1) 
n(r52/M) = n(r48/r0) 
i(r50/r2) < n(r52/M)? 
y(r44/r3) = i(r50/r2) 
z(r53/r0) = z(r43/M) 
i(r50/r2) = i(r50/r2) + 1 
sum(r41/r1) = sum(r51/r1) 
D.1385(r46/r1) = sum(r41/r1) 
<retval>(r47/r1) = D.1385(r46/r1) 
$r0 = <retval>(r47/r1) 
比較指令只能Reg跟Reg比 
不符合Constraint 
nds32中,比較指令只有 
Reg 與 Reg相比 
無 Reg 與 Mem 比
LRA Constraint #2 
t(r45/r3) = z(r53/r0) + y(r44/r3) 
sum(r51/r1) = sum(r51/r1) + t(r45/r3) 
197 
n(r48/r0) = n($r0) 
x(r42/r1) = n(r8/r0) * 2 
z(r43/M) = x(r42/r1) 
sum(r41/r1) = 0 
i(r40/r2) = 0 
i(r50/r2) = i(r40/r2) 
sum(r51/r1) = sum(r41/r1) 
n(r52/M) = n(r48/r0) 
n(r54/X) = n(r52/M) 
i(r50/r2) < n(r54/X)? 
y(r44/r3) = i(r50/r2) 
z(r53/r0) = z(r43/M) 
i(r50/r2) = i(r50/r2) + 1 
sum(r41/r1) = sum(r51/r1) 
D.1385(r46/r1) = sum(r41/r1) 
<retval>(r47/r1) = D.1385(r46/r1) 
$r0 = <retval>(r47/r1) 
比較指令只能Reg跟Reg比 
不符合Constraint
LRA 
New (and old) 
pseudo assignment 
198 
Remove 
Scratches 
Update virtual 
register 
displacements 
No Change 
Constraints: 
RTL 
transformations 
Inheritance/split 
transformations 
in EBB scope 
Undo inheritance 
for 
spilled pseudos 
Memory-Memory 
move coalesce 
Spilled pseudo 
to memory 
substitution 
Hard reg 
substitution, 
devirtalization 
and restoring scratches 
New pseudos 
or insns 
Start 
Finish
LRA 
New (and old) 
pseudo assignment 
199 
Remove 
Scratches 
Update virtual 
register 
displacements 
No Change 
Constraints: 
RTL 
transformations 
Inheritance/split 
transformations 
in EBB scope 
Undo inheritance 
for 
spilled pseudos 
Memory-Memory 
move coalesce 
Spilled pseudo 
to memory 
substitution 
Hard reg 
substitution, 
devirtalization 
and restoring scratches 
New pseudos 
or insns 
Start 
Finish
LRA Assign #2 
t(r45/r3) = z(r53/r0) + y(r44/r3) 
sum(r51/r1) = sum(r51/r1) + t(r45/r3) 
200 
n(r48/r0) = n($r0) 
x(r42/r1) = n(r8/r0) * 2 
z(r43/M) = x(r42/r1) 
sum(r41/r1) = 0 
i(r40/r2) = 0 
i(r50/r2) = i(r40/r2) 
sum(r51/r1) = sum(r41/r1) 
n(r52/M) = n(r48/r0) 
n(r54/r0) = n(r52/M) 
i(r50/r2) < n(r54/r0)? 
y(r44/r3) = i(r50/r2) 
z(r53/r0) = z(r43/M) 
i(r50/r2) = i(r50/r2) + 1 
sum(r41/r1) = sum(r51/r1) 
D.1385(r46/r1) = sum(r41/r1) 
<retval>(r47/r1) = D.1385(r46/r1) 
$r0 = <retval>(r47/r1) 
很幸運直接撿到剛被 
搶走的r0可以用
LRA 
New (and old) 
pseudo assignment 
201 
Remove 
Scratches 
Update virtual 
register 
displacements 
No Change 
Constraints: 
RTL 
transformations 
Inheritance/split 
transformations 
in EBB scope 
Undo inheritance 
for 
spilled pseudos 
Memory-Memory 
move coalesce 
Spilled pseudo 
to memory 
substitution 
Hard reg 
substitution, 
devirtalization 
and restoring scratches 
New pseudos 
or insns 
Start 
Finish
LRA 
New (and old) 
pseudo assignment 
202 
Remove 
Scratches 
Update virtual 
register 
displacements 
No Change 
Constraints: 
RTL 
transformations 
Inheritance/split 
transformations 
in EBB scope 
Undo inheritance 
for 
spilled pseudos 
Memory-Memory 
move coalesce 
Spilled pseudo 
to memory 
substitution 
Hard reg 
substitution, 
devirtalization 
and restoring scratches 
New pseudos 
or insns 
Start 
Finish
LRA 
New (and old) 
pseudo assignment 
203 
Remove 
Scratches 
Update virtual 
register 
displacements 
No Change 
Constraints: 
RTL 
transformations 
Inheritance/split 
transformations 
in EBB scope 
Undo inheritance 
for 
spilled pseudos 
Memory-Memory 
move coalesce 
Spilled pseudo 
to memory 
substitution 
Hard reg 
substitution, 
devirtalization 
and restoring scratches 
New pseudos 
or insns 
Start 
Finish
LRA 
New (and old) 
pseudo assignment 
204 
Remove 
Scratches 
Update virtual 
register 
displacements 
No Change 
Constraints: 
RTL 
transformations 
Inheritance/split 
transformations 
in EBB scope 
Undo inheritance 
for 
spilled pseudos 
Memory-Memory 
move coalesce 
Spilled pseudo 
to memory 
substitution 
Hard reg 
substitution, 
devirtalization 
and restoring scratches 
New pseudos 
or insns 
Start 
Finish
LRA Constraint #3 
t(r45/r3) = z(r53/r0) + y(r44/r3) 
sum(r51/r1) = sum(r51/r1) + t(r45/r3) 
205 
n(r48/r0) = n($r0) 
x(r42/r1) = n(r8/r0) * 2 
z(r43/M) = x(r42/r1) 
sum(r41/r1) = 0 
i(r40/r2) = 0 
i(r50/r2) = i(r40/r2) 
sum(r51/r1) = sum(r41/r1) 
n(r52/M) = n(r48/r0) 
n(r54/r0) = n(r52/M) 
i(r50/r2) < n(r54/r0)? 
y(r44/r3) = i(r50/r2) 
z(r53/r0) = z(r43/M) 
i(r50/r2) = i(r50/r2) + 1 
sum(r41/r1) = sum(r51/r1) 
D.1385(r46/r1) = sum(r41/r1) 
<retval>(r47/r1) = D.1385(r46/r1) 
$r0 = <retval>(r47/r1) 
整個掃描過後發現所有指令 
符合Constraint!
LRA 
New (and old) 
pseudo assignment 
206 
Remove 
Scratches 
Update virtual 
register 
displacements 
No Change 
Constraints: 
RTL 
transformations 
Inheritance/split 
transformations 
in EBB scope 
Undo inheritance 
for 
spilled pseudos 
Memory-Memory 
move coalesce 
Spilled pseudo 
to memory 
substitution 
Hard reg 
substitution, 
devirtalization 
and restoring scratches 
New pseudos 
or insns 
Start 
Finish
LRA Memory Substitution 
t(r45/r3) = z(r53/r0) + y(r44/r3) 
sum(r51/r1) = sum(r51/r1) + t(r45/r3) 
207 
n(r48/r0) = n($r0) 
x(r42/r1) = n(r8/r0) * 2 
z([$sp + 0]) = x(r42/r1) 
sum(r41/r1) = 0 
i(r40/r2) = 0 
i(r50/r2) = i(r40/r2) 
sum(r51/r1) = sum(r41/r1) 
n([$sp + 4]) = n(r48/r0) 
n(r54/r0) = n([$sp + 4]) 
i(r50/r2) < n(r54/r0)? 
y(r44/r3) = i(r50/r2) 
z(r53/r0) = z([$sp + 0]) 
i(r50/r2) = i(r50/r2) + 1 
sum(r41/r1) = sum(r51/r1) 
D.1385(r46/r1) = sum(r41/r1) 
<retval>(r47/r1) = D.1385(r46/r1) 
$r0 = <retval>(r47/r1) 
將所有 Spill 的 Pseudo 
Register 替換成 Memory!
LRA 
New (and old) 
pseudo assignment 
208 
Remove 
Scratches 
Update virtual 
register 
displacements 
No Change 
Constraints: 
RTL 
transformations 
Inheritance/split 
transformations 
in EBB scope 
Undo inheritance 
for 
spilled pseudos 
Memory-Memory 
move coalesce 
Spilled pseudo 
to memory 
substitution 
Hard reg 
substitution, 
devirtalization 
and restoring scratches 
New pseudos 
or insns 
Start 
Finish
LRA Register Substitution 
209 
n(r0) = n($r0) 
x(r1) = n(r0) * 2 
z([$sp + 0]) = x(r1) 
sum(r1) = 0 
i(r2) = 0 
i(r2) = i(r2) 
sum(r1) = sum(r1) 
n([$sp + 4]) = n(rr0) 
n(r0) = n([$sp + 4]) 
i(r2) < n(r0)? 
y(r3) = i(r2) 
z(r0) = z([$sp + 0]) 
t(r3) = z(r0) + y(r3) 
sum(r1) = sum(r1) + t(r3) 
i(r2) = i(r2) + 1 
sum(r1) = sum(r1) 
D.1385(r1) = sum(r1) 
<retval>(r1) = D.1385(r1) 
$r0 = <retval>(r1) 
將所有 Pseudo Register 
替換成 Hard Register!
210 
reload_completed 
!= 0 
在 gcc 中, 當reload_complete不等於0時 
代表Register Allocation 完畢, 
並且所有RTL都必須是Strict RTL 
隨時都可以輸出成 Assembly Language!
總結 
• IRA/LRA 雖然龐大, 但循著理論走會容易 
理解很多 
• RA在整個編譯階段末段, 但RA結果的優 
劣對於效能與程式大小有相當大的影響 
211
212

More Related Content

What's hot

High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringScyllaDB
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at NetflixBrendan Gregg
 
LLVM Register Allocation
LLVM Register AllocationLLVM Register Allocation
LLVM Register AllocationWang Hsiangkai
 
Intel DPDK Step by Step instructions
Intel DPDK Step by Step instructionsIntel DPDK Step by Step instructions
Intel DPDK Step by Step instructionsHisaki Ohara
 
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtKernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtAnne Nicolas
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)Brendan Gregg
 
Debug dpdk process bottleneck & painpoints
Debug dpdk process bottleneck & painpointsDebug dpdk process bottleneck & painpoints
Debug dpdk process bottleneck & painpointsVipin Varghese
 
2015 FOSDEM - OVS Stateful Services
2015 FOSDEM - OVS Stateful Services2015 FOSDEM - OVS Stateful Services
2015 FOSDEM - OVS Stateful ServicesThomas Graf
 
淺談編譯器最佳化技術
淺談編譯器最佳化技術淺談編譯器最佳化技術
淺談編譯器最佳化技術Kito Cheng
 
icecream / icecc:分散式編譯系統簡介
icecream / icecc:分散式編譯系統簡介icecream / icecc:分散式編譯系統簡介
icecream / icecc:分散式編譯系統簡介Kito Cheng
 
以 eBPF 構建一個更為堅韌的 Kubernetes 叢集
以 eBPF 構建一個更為堅韌的 Kubernetes 叢集以 eBPF 構建一個更為堅韌的 Kubernetes 叢集
以 eBPF 構建一個更為堅韌的 Kubernetes 叢集HungWei Chiu
 
Using eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in CiliumUsing eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in CiliumScyllaDB
 
DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingMichelle Holley
 
Instruction Combine in LLVM
Instruction Combine in LLVMInstruction Combine in LLVM
Instruction Combine in LLVMWang Hsiangkai
 
Introduction to eBPF
Introduction to eBPFIntroduction to eBPF
Introduction to eBPFRogerColl2
 
UM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareUM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareBrendan Gregg
 
How Linux Processes Your Network Packet - Elazar Leibovich
How Linux Processes Your Network Packet - Elazar LeibovichHow Linux Processes Your Network Packet - Elazar Leibovich
How Linux Processes Your Network Packet - Elazar LeibovichDevOpsDays Tel Aviv
 
Introduction to Polyhedral Compilation
Introduction to Polyhedral CompilationIntroduction to Polyhedral Compilation
Introduction to Polyhedral CompilationAkihiro Hayashi
 

What's hot (20)

High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uring
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 
LLVM Register Allocation
LLVM Register AllocationLLVM Register Allocation
LLVM Register Allocation
 
Intel DPDK Step by Step instructions
Intel DPDK Step by Step instructionsIntel DPDK Step by Step instructions
Intel DPDK Step by Step instructions
 
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtKernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)
 
Debug dpdk process bottleneck & painpoints
Debug dpdk process bottleneck & painpointsDebug dpdk process bottleneck & painpoints
Debug dpdk process bottleneck & painpoints
 
2015 FOSDEM - OVS Stateful Services
2015 FOSDEM - OVS Stateful Services2015 FOSDEM - OVS Stateful Services
2015 FOSDEM - OVS Stateful Services
 
淺談編譯器最佳化技術
淺談編譯器最佳化技術淺談編譯器最佳化技術
淺談編譯器最佳化技術
 
icecream / icecc:分散式編譯系統簡介
icecream / icecc:分散式編譯系統簡介icecream / icecc:分散式編譯系統簡介
icecream / icecc:分散式編譯系統簡介
 
以 eBPF 構建一個更為堅韌的 Kubernetes 叢集
以 eBPF 構建一個更為堅韌的 Kubernetes 叢集以 eBPF 構建一個更為堅韌的 Kubernetes 叢集
以 eBPF 構建一個更為堅韌的 Kubernetes 叢集
 
How A Compiler Works: GNU Toolchain
How A Compiler Works: GNU ToolchainHow A Compiler Works: GNU Toolchain
How A Compiler Works: GNU Toolchain
 
Using eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in CiliumUsing eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in Cilium
 
llvm入門
llvm入門llvm入門
llvm入門
 
DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet Processing
 
Instruction Combine in LLVM
Instruction Combine in LLVMInstruction Combine in LLVM
Instruction Combine in LLVM
 
Introduction to eBPF
Introduction to eBPFIntroduction to eBPF
Introduction to eBPF
 
UM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareUM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of Software
 
How Linux Processes Your Network Packet - Elazar Leibovich
How Linux Processes Your Network Packet - Elazar LeibovichHow Linux Processes Your Network Packet - Elazar Leibovich
How Linux Processes Your Network Packet - Elazar Leibovich
 
Introduction to Polyhedral Compilation
Introduction to Polyhedral CompilationIntroduction to Polyhedral Compilation
Introduction to Polyhedral Compilation
 

Similar to Design and Implementation of GCC Register Allocation

Matlab complex numbers
Matlab complex numbersMatlab complex numbers
Matlab complex numbersAmeen San
 
06 Recursion in C.pptx
06 Recursion in C.pptx06 Recursion in C.pptx
06 Recursion in C.pptxMouDhara1
 
101 math short cuts [www.onlinebcs.com]
101 math short cuts [www.onlinebcs.com]101 math short cuts [www.onlinebcs.com]
101 math short cuts [www.onlinebcs.com]Itmona
 
101-maths short cut tips and tricks for apptitude
101-maths short cut tips and tricks for apptitude101-maths short cut tips and tricks for apptitude
101-maths short cut tips and tricks for apptitudePODILAPRAVALLIKA0576
 
Basic python programs
Basic python programsBasic python programs
Basic python programsRaginiJain21
 
PVS-Studio team experience: checking various open source projects, or mistake...
PVS-Studio team experience: checking various open source projects, or mistake...PVS-Studio team experience: checking various open source projects, or mistake...
PVS-Studio team experience: checking various open source projects, or mistake...Andrey Karpov
 
C programs
C programsC programs
C programsMinu S
 
Traveling salesman problem: Game Scheduling Problem Solution: Ant Colony Opti...
Traveling salesman problem: Game Scheduling Problem Solution: Ant Colony Opti...Traveling salesman problem: Game Scheduling Problem Solution: Ant Colony Opti...
Traveling salesman problem: Game Scheduling Problem Solution: Ant Colony Opti...Soumen Santra
 
Python for Scientific Computing -- Ricardo Cruz
Python for Scientific Computing -- Ricardo CruzPython for Scientific Computing -- Ricardo Cruz
Python for Scientific Computing -- Ricardo Cruzrpmcruz
 
Asymptotic analysis
Asymptotic analysisAsymptotic analysis
Asymptotic analysisSoujanya V
 
Unit-1 DAA_Notes.pdf
Unit-1 DAA_Notes.pdfUnit-1 DAA_Notes.pdf
Unit-1 DAA_Notes.pdfAmayJaiswal4
 
dynamic programming Rod cutting class
dynamic programming Rod cutting classdynamic programming Rod cutting class
dynamic programming Rod cutting classgiridaroori
 
openMP loop parallelization
openMP loop parallelizationopenMP loop parallelization
openMP loop parallelizationAlbert DeFusco
 
When Computers Don't Compute and Other Fun with Numbers
When Computers Don't Compute and Other Fun with NumbersWhen Computers Don't Compute and Other Fun with Numbers
When Computers Don't Compute and Other Fun with NumbersPDE1D
 
Asymptotic Notation and Complexity
Asymptotic Notation and ComplexityAsymptotic Notation and Complexity
Asymptotic Notation and ComplexityRajandeep Gill
 

Similar to Design and Implementation of GCC Register Allocation (20)

Matlab complex numbers
Matlab complex numbersMatlab complex numbers
Matlab complex numbers
 
06 Recursion in C.pptx
06 Recursion in C.pptx06 Recursion in C.pptx
06 Recursion in C.pptx
 
101 math short cuts [www.onlinebcs.com]
101 math short cuts [www.onlinebcs.com]101 math short cuts [www.onlinebcs.com]
101 math short cuts [www.onlinebcs.com]
 
Appt and reasoning
Appt and reasoningAppt and reasoning
Appt and reasoning
 
Boosting Developer Productivity with Clang
Boosting Developer Productivity with ClangBoosting Developer Productivity with Clang
Boosting Developer Productivity with Clang
 
101-maths short cut tips and tricks for apptitude
101-maths short cut tips and tricks for apptitude101-maths short cut tips and tricks for apptitude
101-maths short cut tips and tricks for apptitude
 
Basic python programs
Basic python programsBasic python programs
Basic python programs
 
PVS-Studio team experience: checking various open source projects, or mistake...
PVS-Studio team experience: checking various open source projects, or mistake...PVS-Studio team experience: checking various open source projects, or mistake...
PVS-Studio team experience: checking various open source projects, or mistake...
 
C programs
C programsC programs
C programs
 
Counting trees.pptx
Counting trees.pptxCounting trees.pptx
Counting trees.pptx
 
Traveling salesman problem: Game Scheduling Problem Solution: Ant Colony Opti...
Traveling salesman problem: Game Scheduling Problem Solution: Ant Colony Opti...Traveling salesman problem: Game Scheduling Problem Solution: Ant Colony Opti...
Traveling salesman problem: Game Scheduling Problem Solution: Ant Colony Opti...
 
Time complexity
Time complexityTime complexity
Time complexity
 
Python for Scientific Computing -- Ricardo Cruz
Python for Scientific Computing -- Ricardo CruzPython for Scientific Computing -- Ricardo Cruz
Python for Scientific Computing -- Ricardo Cruz
 
SCIPY-SYMPY.pdf
SCIPY-SYMPY.pdfSCIPY-SYMPY.pdf
SCIPY-SYMPY.pdf
 
Asymptotic analysis
Asymptotic analysisAsymptotic analysis
Asymptotic analysis
 
Unit-1 DAA_Notes.pdf
Unit-1 DAA_Notes.pdfUnit-1 DAA_Notes.pdf
Unit-1 DAA_Notes.pdf
 
dynamic programming Rod cutting class
dynamic programming Rod cutting classdynamic programming Rod cutting class
dynamic programming Rod cutting class
 
openMP loop parallelization
openMP loop parallelizationopenMP loop parallelization
openMP loop parallelization
 
When Computers Don't Compute and Other Fun with Numbers
When Computers Don't Compute and Other Fun with NumbersWhen Computers Don't Compute and Other Fun with Numbers
When Computers Don't Compute and Other Fun with Numbers
 
Asymptotic Notation and Complexity
Asymptotic Notation and ComplexityAsymptotic Notation and Complexity
Asymptotic Notation and Complexity
 

Recently uploaded

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Recently uploaded (20)

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

Design and Implementation of GCC Register Allocation

  • 1. Design and Implementation of GCC Register Allocation HelloGCC'2014 Date : Sep 13th, 2014 Kito Cheng kito.cheng@gmail.com
  • 2. 2 Register Allocation foo: addi $sp, $sp, -8 slli $r1, $r0, 1 swi $r1, [$sp] movi $r1, 0 movi $r2, 0 swi $r0, [$sp + (4)] j .L2 .L3: mov $r3, $r2 lwi $r0, [$sp] add $r3, $r0, $r3 add $r1, $r1, $r3 addi $r2, $r2, 1 .L2: lwi $r0, [$sp + (4)] slts $ta, $r2, $r0 bnez $ta, .L3 mov $r0, $r1 addi $sp, $sp, 8 ret int foo(int n) { int i, y, t, z; int x = n * 2; z = x; int sum = 0; for (i=0; i<n; ++i) { y = i; t = z + y; sum = sum + t; } return sum; }
  • 3. 3 Register Allocation foo: addi $sp, $sp, -8 slli $r1, $r0, 1 swi $r1, [$sp] movi $r1, 0 movi $r2, 0 swi $r0, [$sp + (4)] j .L2 .L3: mov $r3, $r2 lwi $r0, [$sp] add $r3, $r0, $r3 add $r1, $r1, $r3 addi $r2, $r2, 1 .L2: lwi $r0, [$sp + (4)] slts $ta, $r2, $r0 bnez $ta, .L3 mov $r0, $r1 addi $sp, $sp, 8 ret int foo(int n) { int i, y, t, z; int x = n * 2; z = x; int sum = 0; for (i=0; i<n; ++i) { y = i; t = z + y; sum = sum + t; } return sum; }
  • 4. 4 Graph Coloring 基礎理論
  • 5. 5 Graph-Coloring Graph (圖) = Vertex/Node (點) + Edge (邊)
  • 6. 6 Graph-Coloring 這是點 Graph (圖) = Vertex/Node (點) + Edge (邊)
  • 7. 7 Graph-Coloring 這是邊 Graph (圖) = Vertex/Node (點) + Edge (邊)
  • 8. 8 Graph-Coloring 上色問題就是把點上顏色, 並且不能與相鄰的點同顏色
  • 9. 9 Graph-Coloring 用最經典的演算法上色: 假設有 N 個顏色, 若邊小於 N 個則可隨意上色
  • 16. Graph-Coloring 16 那跟 Register Allocation 的關聯?
  • 17. Interference Graph 17 那跟 Register Allocation 的關聯? c b a 以 c = a + b 為例 int a = 10, b = 20; int c = a + b;
  • 18. Interference Graph 18 那跟 Register Allocation 的關聯? c b a a 跟 b 必須同時存活(Live) 所以不能使用同一個 Reg int a = 10, b = 20; int c = a + b; Live Range a b c a=10, b=20 c = a + b
  • 19. Interference Graph 19 c b a 顏色 = Register n個顏色可用 = n個暫存器可用 a 跟 b 必須同時存活(Live) 所以不能使用同一個 Reg int a = 10, b = 20; int c = a + b;
  • 20. Interference Graph 20 c b a int a = 10, b = 20; int c = a + b; 任意上色一下
  • 21. Interference Graph 21 c b a int a = 10, b = 20; int c = a + b; 假設藍色代表$r0, 綠色代表 $r1 亦即此上色結果所產生的程式碼為 add $r0, $r0, $r1
  • 22. 22 假設有 N 個顏色, 若邊小於 N 個則可隨意上色
  • 23. 23 假設有 N 個顏色, 若邊小於 N 個則可隨意上色 假設有 N 個Register, 若邊小於 N 個則可隨意Allocation
  • 24. 24 以上就是 Graph Coloring-based Register Allocation 的概念
  • 25. 25 經典 Graph Coloring Register Allocation
  • 26. Chaitin-Briggs Algorithm 26 • 在深入 GCC 前先複習下最經典的演算法:
  • 27. Chaitin-Briggs Algorithm 27 spill code build coalesce simplify select • 在深入 GCC 前先複習下最經典的演算法: spill code build coalesce simplify select 簡化成這幾個 步驟方便講解
  • 28. Example 28 build coalesce simplify select int foo(int n) { int i, y, t, z; int x = n * 2; z = x; int sum = 0; for (i=0; i<n; ++i) { y = i; t = z + y; sum = sum + t; } return sum; } spill code
  • 29. CFG 29 spill code int foo(int n) { int i, y, t, z; int x = n * 2; z = x; int sum = 0; for (i=0; i<n; ++i) { y = i; t = z + y; sum = sum + t; } return sum; } n = arg(n) x = n * 2 z = x sum = 0 i = 0 i < n? y = i t = z + y sum = sum + t i = i + 1 return sum build coalesce simplify select
  • 30. Live Range 30 n x z sum i y t build coalesce simplify select n = arg (n) x = n * 2 z = x sum = 0 i = 0 i < n? y = i t = z + y sum = sum + t i = i + 1 return sum spill code n = arg(n) x = n * 2 z = x sum = 0 i = 0 i < n? y = i t = z + y sum = sum + t i = i + 1 return sum
  • 31. Live Range 31 n x z sum i y t build coalesce simplify select n = arg (n) x = n * 2 z = x sum = 0 i = 0 i < n? y = i t = z + y sum = sum + t i = i + 1 return sum spill code n = arg(n) x = n * 2 z = x sum = 0 i = 0 i < n? y = i t = z + y sum = sum + t i = i + 1 return sum 每道指令有兩個 Program point, Read, Write
  • 32. Live Range 32 n x z sum i y t build coalesce simplify select n = arg (n) x = n * 2 z = x sum = 0 i = 0 i < n? y = i t = z + y sum = sum + t i = i + 1 return sum spill code n = arg(n) x = n * 2 z = x sum = 0 i = 0 i < n? y = i t = z + y sum = sum + t i = i + 1 return sum y 於 y = i 的 Write 時 Live 於 t = z + y 的 Read 時 最後使用
  • 33. Live Range 33 n x z sum i y t build coalesce simplify select n = arg (n) x = n * 2 z = x sum = 0 i = 0 i < n? y = i t = z + y sum = sum + t i = i + 1 return sum spill code n = arg(n) x = n * 2 z = x sum = 0 i = 0 i < n? y = i t = z + y sum = sum + t i = i + 1 return sum
  • 34. Interference build coalesce simplify select Graph n x z sum i y t 34 n = arg (n) x = n * 2 z = x sum = 0 i = 0 i < n? y = i t = z + y sum = sum + t i = i + 1 return sum spill code z x sum i y t n x 跟 n 必須 同時存活 又稱 x 跟 n 與 有Interference
  • 35. Interference build coalesce simplify select Graph n x z sum i y t 35 n = arg (n) x = n * 2 z = x sum = 0 i = 0 i < n? y = i t = z + y sum = sum + t i = i + 1 return sum spill code z x sum i y t n
  • 36. Interference build coalesce simplify select Graph n x z sum i y t 36 n = arg (n) x = n * 2 z = x sum = 0 i = 0 i < n? y = i t = z + y sum = sum + t i = i + 1 return sum spill code z x sum i y t n
  • 37. Coalesce 37 spill code build coalesce simplify select • Coalesce: 合併 • 若一道 move 指令的來源與目的無互相干 擾則可進行 Coalesce, 並將圖上對應的 點合併 –保證兩者分配到相同 Register – move $r0, $r0 •明顯可移除的指令
  • 38. Coalesce 38 build coalesce simplify select n x z sum i y t n = arg (n) x = n * 2 zz == xx sum = 0 i = 0 i < n? y = i t = z + y sum = sum + t i = i + 1 return sum spill code z x sum i y t n
  • 39. Coalesce 39 build coalesce simplify select n x z sum i y t n = arg (n) x = n * 2 zz == xx sum = 0 i = 0 i < n? y = i t = z + y sum = sum + t i = i + 1 return sum spill code x/z sum i y t n
  • 40. build coalesce simplify select Simplify 40 spill code x/z sum i y t n 假設有4個 Register 可用
  • 41. build coalesce simplify select Simplify 41 spill code x/z sum i y t n 假設有4個 Register 可用 => 邊少於 4 都可隨意著色
  • 42. build coalesce simplify select Simplify 42 spill code 假設有4個 Register 可用 => 邊少於 4 都可隨意著色5 x/z sum i y t n 4 4 5 5 5
  • 43. build coalesce simplify select Simplify 43 spill code 假設有4個 Register 可用 => 邊少於 4 都可隨意著色5 x/z sum i y t n 4 4 5 5 5 發現沒邊少於4的點!
  • 44. build coalesce simplify select Simplify 44 spill code 假設有4個 Register 可用 => 邊少於 4 都可隨意著色5 x/z sum i y t n 4 4 5 5 5 發現沒邊少於4的點! =>挑一個Cost最低的拿
  • 45. build coalesce simplify select Simplify 45 spill code x/z sum i y t 假設有4個 Register 可用 => 邊少於 4 都可隨意著色 n* 3 3 4 4 4 發現沒邊少於4的點! =>挑一個Cost最低的拿拿到旁邊的Stack 並標上記號 更新每個點的邊數
  • 46. build coalesce simplify select Simplify 46 spill code x/z sum i y 假設有4個 Register 可用 => 邊少於 4 都可隨意著色 t n* 3 3 3 3
  • 47. build coalesce simplify select Simplify 47 spill code x/z 假設有4個 Register 可用 => 邊少於 4 都可隨意著色 y sum i t n* 2 2 2
  • 48. build coalesce simplify select Simplify 48 spill code 假設有4個 Register 可用 => 邊少於 4 都可隨意著色 x/z y sum i t n* 1 1
  • 49. build coalesce simplify select Simplify 49 spill code 假設有4個 Register 可用 => 邊少於 4 都可隨意著色 sum x/z y i t n* 0
  • 50. build coalesce simplify select Simplify 50 spill code 假設有4個 Register 可用 => 邊少於 4 都可隨意著色 i sum x/z y t n*
  • 51. build coalesce simplify select Select 51 spill code 進入著色階段, 從Stack中Pop 出來並著上隨意顏色 sum x/z y t n* i $r0 $r1 $r2 $r3
  • 52. build coalesce simplify select Select 52 spill code 進入著色階段, 從Stack中Pop 出來並著上隨意顏色 x/z y t n* sum i $r0 $r1 $r2 $r3
  • 53. build coalesce simplify select Select 53 spill code 進入著色階段, 從Stack中Pop 出來並著上隨意顏色 y t n* x/z sum i $r0 $r1 $r2 $r3
  • 54. build coalesce simplify select Select 54 spill code 進入著色階段, 從Stack中Pop 出來並著上隨意顏色 t n* x/z sum i y $r0 $r1 $r2 $r3
  • 55. build coalesce simplify select Select 55 spill code 進入著色階段, 從Stack中Pop 出來並著上隨意顏色 n* x/z sum i y $r0 t $r1 $r2 $r3
  • 56. build coalesce simplify select Select 56 spill code 進入著色階段, 從Stack中Pop 出來並著上隨意顏色 n* x/z sum i y t n $r0 $r1 $r2 $r3 發現 n 得不到任何可用顏色! 進入 Spill 階段
  • 57. Spill 57 • 當 Register 不足時, 則必須將值暫存 到記憶體中, 必要時再從記憶體中取回
  • 58. build coalesce simplify select Spill 58 spill code n = arg(n) x = n * 2 store n z = x sum = 0 i = 0 relaod n i < n? y = i t = z + y sum = sum + t i = i + 1 return sum n = arg(n) x = n * 2 z = x sum = 0 i = 0 i < n? y = i t = z + y sum = sum + t i = i + 1 return sum 插入 Spill Code
  • 59. build coalesce simplify select Live Range 59 spill code n = arg(n) x = n * 2 store n z = x sum = 0 i = 0 relaod n i < n? y = i t = z + y sum = sum + t i = i + 1 return sum n x z sum i y t n = arg (n) x = n * 2 store n z = x sum = 0 i = 0 reload n i < n? y = i t = z + y sum = sum + t i = i +1 return sum n1 n2
  • 60. Interference Graph 60 spill code build coalesce simplify select n x z sum i y t n = arg (n) x = n * 2 store n z = x sum = 0 i = 0 reload n i < n? y = i t = z + y sum = sum + t i = i +1 return sum z x sum i y t n1 n2 n1 n2
  • 61. build coalesce simplify select Coalesce 61 spill code n x z sum i y t n = arg (n) x = n * 2 store n z = x sum = 0 i = 0 reload n i < n? y = i t = z + y sum = sum + t i = i +1 return sum x/z sum i y t n1 n2 n1 n2
  • 62. build coalesce simplify select Simplify 62 spill code 1 3 x/z sum i y t n1 n2 3 3 5 5 6
  • 63. build coalesce simplify select Simplify 63 spill code x/z sum i y t n1 n2 3 3 3 5 5 5
  • 64. build coalesce simplify select Simplify 64 spill code x/z sum i y t n1 n2 3 3 4 4 4
  • 65. build coalesce simplify select Simplify 65 spill code x/z n2 3 y sum i t n1 3 3 3
  • 66. build coalesce simplify select Simplify 66 spill code x/z n2 y sum i t n1 2 2 2
  • 67. build coalesce simplify select Simplify 67 spill code x/z n2 y sum i t n1 1 1
  • 68. build coalesce simplify select Simplify 68 spill code sum x/z n2 y i t n1 1
  • 69. build coalesce simplify select Simplify 69 spill code i sum x/z n2 y t n1
  • 70. build coalesce simplify select Select 70 spill code i sum x/z n2 y t n1 $r0 $r1 $r2 $r3
  • 71. build coalesce simplify select Select 71 spill code sum x/z n2 y t n1 $r0 $r1 $r2 $r3 i
  • 72. build coalesce simplify select Select 72 spill code x/z n2 y t n1 $r0 $r1 $r2 $r3 sum i
  • 73. build coalesce simplify select Select 73 spill code n2 y t n1 $r0 $r1 $r2 $r3 x/z sum i
  • 74. build coalesce simplify select Select 74 spill code y t n1 $r0 $r1 $r2 $r3 x/z n2 sum i
  • 75. build coalesce simplify select Select 75 spill code t n1 $r0 $r1 $r2 $r3 x/z sum i y n2
  • 76. build coalesce simplify select Select 76 spill code n1 $r0 $r1 $r2 $r3 x/z sum i y t n2
  • 77. build coalesce simplify select Select 77 spill code $r0 $r1 $r2 $r3 x/z sum i y t n1 n2
  • 78. build coalesce simplify select Select 78 spill code $r0 $r1 $r2 $r3 x/z sum i y t n1 n2 n1 = $r0 n2 = $r3 x = $r2 z = $r2 sum = $r1 t = $r3 y = $r3 i = $r0
  • 79. Code Gen 79 n1 = $r0 n2 = $r3 x = $r2 z = $r2 sum = $r1 t = $r3 y = $r3 i = $r0 $r0 = arg(n) $r2 = $r0 * 2 store $r0->n $r2 = $r2 $r1 = 0 $r0 = 0 relaod $r3<-n $r0 < $r3? $r3 = $r0 $r3 = $r2+$r3 $r1 = $r1+$r3 $r0 = $r0 + 1 return $r1 n = arg(n) x = n * 2 store n z = x sum = 0 i = 0 relaod n i < n? y = i t = z + y sum = sum + t i = i + 1 return sum
  • 80. Code Gen 80 n1 = $r0 n2 = $r3 x = $r2 z = $r2 sum = $r1 t = $r3 y = $r3 i = $r0 $r0 = arg(n) $r2 = $r0 * 2 store $r0->n $r2 = $r2 $r1 = 0 $r0 = 0 relaod $r3<-n $r0 < $r3? $r3 = $r0 $r3 = $r2+$r3 $r1 = $r1+$r3 $r0 = $r0 + 1 return $r1 n = arg(n) x = n * 2 store n z = x sum = 0 i = 0 relaod n i < n? y = i t = z + y sum = sum + t i = i + 1 return sum 因Coalesce分配到同 Register!
  • 81. Code Gen 81 n1 = $r0 n2 = $r3 x = $r2 z = $r2 sum = $r1 t = $r3 y = $r3 i = $r0 $r0 = arg(n) $r2 = $r0 * 2 store $r0->n $r1 = 0 $r0 = 0 relaod $r3<-n $r0 < $r3? $r3 = $r0 $r3 = $r2+$r3 $r1 = $r1+$r3 $r0 = $r0 + 1 return $r1 n = arg(n) x = n * 2 store n z = x sum = 0 i = 0 relaod n i < n? y = i t = z + y sum = sum + t i = i + 1 return sum
  • 82. Rematerialization 82 • Materialization: 實現化、具體化
  • 83. Rematerialization 83 • Materialization: 實現化、具體化 • ReMaterialization: 重現化
  • 84. Rematerialization 84 • Materialization: 實現化、具體化 • ReMaterialization: 重現化 • 在 Briggs 論文中所提出的技術: –若其值可以很便宜的算出來, 則不進行 Spill, 取而代之直接重算其結果 – *便宜的定義因目標不同而意義不同: •-O3: Cycle 數少即便宜 •-Os: Code size 小即便宜
  • 85. Rematerialization 85 $r0 = arg(n) $r2 = $r0 * 2 store $r0->n $r1 = 0 $r0 = 0 relaod $r3<-n $r0 < $r3? $r3 = $r0 $r3 = $r2+$r3 $r1 = $r1+$r3 $r0 = $r0 + 1 return $r1 n = arg(n) x = n * 2 store n z = x sum = 0 i = 0 relaod n i < n? y = i t = z + y sum = sum + t i = i + 1 return sum $r0 = arg(n) $r2 = $r0 * 2 $r1 = 0 $r0 = 0 $r3 = $r2 / 2 $r0 < $r3? $r3 = $r0 $r3 = $r2+$r3 $r1 = $r1+$r3 $r0 = $r0 + 1 return $r1 除以 2 可替 換為 Shift => 比存取記憶體 便宜 x = n * 2 => n = x / 2 => n = z / 2
  • 86. Split 4 86 • 跟Coalesce相反, 也有將一個點拆成兩 個的方法 g f b a d c e 4 4 4 4 4 4 6 g f a' b d c e 4 4 4 4 4 a 3 3 變得容易用 4個顏色上色
  • 87. 反思Coalesce 4 87 • 不好的Coalesce決策會造成圖難以著色, 並生出額外的 Spill Code g f b z/y d c e 4 4 4 4 4 4 6 g f x b d c e 4 4 4 4 4 y 3 3 變得不容易用 4個顏色上色
  • 88. 反思Coalesce 4 88 • 不好的Coalesce決策會造成圖難以著色, 並生出額外的 Spill Code • 但不好的Split決策相對會造成多餘Move g f b z/y d c e 4 4 4 4 4 4 6 g f x b d c e 4 4 4 4 4 y 3 3 變得不容易用 4個顏色上色
  • 89. Split or Coalesce To be or not to be, that is the question 89 • RA的學術研究上, 如何Split及 Coalesce都可單獨成為一篇論文... • Coalesce在CGO'07[1]時被證明本身也是 NP-Complete [1] Florent Bouchez, Alain Darte, and Fabrice Rastello. On the Complexity of Register Coalescing. In Proceedings of the International Symposium on Code Generation and Optimization, CGO ’07, pages 102–114, Washington, DC, USA, 2007. IEEE Computer Society.
  • 91. 從理論到現實 91 • 演算法面看似很容易理解 – 帶入現實情況就會發現基礎的Graph Coloring 沒涵蓋一些問題... •Caller-save reg, Callee-save reg •參數/回傳值放置位置 •不同的Register Class •Register Pair •累加器(Accumulator) •Memory Operand
  • 92. Caller-save Register or Callee-save Register 92 x = 10 ... foo() ... y = 20 z = x + y 若 x 被分配到 Caller-save Register x = 10 ... store x foo() reload x ... y = 20 z = x + y 跨越 Function Call 就必須store/reload
  • 93. Caller-save Register or Callee-save Register 93 但用到 Callee-save Register 也必須在 Function 開頭 及結尾儲存跟復原 foo: store $r3 ... ... ... ... restore $r3 return
  • 94. Caller-save Register or Callee-save Register 94 • 每個 Register 的使用成本隨著使用情境 不同 • 著色時並非只看能否成功著色, 選合適的 Register 也是相當重要的一件事
  • 95. 參數/回傳值放置位置 95 • ABI 會規定參數與回傳值放置的方式 – 以nds32為例:第一個參數放 $r0, 回傳值也放 $r0 z x sum i y t n = arg(n) n x = n * 2 z = x sum = 0 i = 0 i < n? y = i t = z + y sum = sum + t i = i + 1 return sum 以前面範例來看: 間接的使得 n 以及 sum 都必須使用 $r0, 否則需 要額外 move 指令 這項限制將會使得上色更 為困難, 以這張圖為例, 目前已經是不合法的著色
  • 96. 不同的Register Class 96 • 最常見是分為兩大類: – Floating Point Register – General Purpose Register int x = 10; float y = 3.144444; ... ... y x 看起來好像有 Interference 但因為 Class 不同 所以沒影響 衍生思考: 只算帳面 Edge 數 無法正確判定是否可簡單著色
  • 97. Register Pair 97 • 在許多架構中兩個 Floating Point Register 可合併成一個 double precision floating point double x = 1.6188888; float y = 3.144444; ... ... y x 看起來好像只有一條 Edge 但 x 會吃掉兩個 Register 再次思考: 只算帳面 Edge 數 真的不太可靠
  • 98. 累加器(Accumulator) 98 • 在 x86 架構中有累加器, 其常見格式為 – a = a op b –回想前面例子, 其分配結果無法直接套 用在累加器架構下... – CB 演算法提出背景是RISC GPR架構!
  • 99. Memory Operand 99 • 在 CISC 架構中有許多指令 Operand 可 間接定址並且運算 – addl %eax, -4(%ebp) ! *(%ebp-4) += %eax n = arg(n) x = n * 2 z = x sum = 0 i = 0 i < n? y = i t = z + y sum = sum + t i = i + 1 return sum sum 若採用 memory operand, 建 構出來的 Graph 會相當不一樣
  • 100. Graph Coloring 外的選擇 100 • Linear Scan • PBQP • Puzzle • Network Flow • ILP
  • 102. GCC RA演進 102 • Local + Global + Reload • IRA + Reload – GCC 4.4 • IRA + LRA – GCC 4.8
  • 103. Local + Global + Reload 103 Ref: Vladimir. N. .Makarov “Fighting register pressure in GCC”, GCC Submit 2004
  • 104. Reload 104 • http://gcc.gnu.org/wiki/reload
  • 105. Reload 105 • http://gcc.gnu.org/wiki/reload
  • 106. Reload • Spill code generation • Instruction/register constraint validation • Constant pool building • Turning non-strict RTL into strict RTL (doing more of the above in evil ways) • Register elimination--changing frame pointer references to stack pointer references • Reload inheritance--essentially a builtin CSE pass on spill code 106
  • 107. Reload 107 • 產生 Spill code • 驗證所有指令跟暫存器的 Constraint • 建立 Constant pool • 把所有 non-strict RTL 轉成 strict RTL – 用超多邪惡的方法!!! • 針對 frame pointer 跟 stack pointer 進行大融合 • 內建小型 CSE
  • 108. Reload 108 • 產生 Spill code • 驗證所有指令跟暫存器的 Constraint • 建立 Constant pool • 把所有 non-strict RTL 轉成 strict RTL – 用超多邪惡的方法!!! • 針對 frame pointer 跟 stack pointer 進行大融合 • 內建小型 CSE Reload 最大問題在於所有東西黏在一團, 難以修改維護...加新功能?別鬧了... git blame reload1.c |awk '{ print $3}' | awk -F - '{ print $1}' | sort | uniq -c git blame reload.c |awk '{ print $3}' | awk -F - '{ print $1}' | sort | uniq -c 可透過這兩道指令觀察到 reload 大約都是199x年遺留下來的....
  • 109. IRA + Reload 109 Ref: Vladimir. N. .Makarov, “The top-down regional register allocator for irregular register file architectures”
  • 110. IRA 110 Ref: Vladimir. N. .Makarov, “The top-down regional register allocator for irregular register file architectures”
  • 111. IRA 111 Ref: Vladimir. N. .Makarov, “The top-down regional register allocator for irregular register file architectures”
  • 112. IRA 112 Ref: Vladimir. N. .Makarov, “The top-down regional register allocator for irregular register file architectures”
  • 113. IRA 113 • CB 演算法為迭代式(Iterative Style), 但 IRA 整個流程只走一次 – 時間與品質的取捨 – 尚未完整整合Reload
  • 114. IRA 114 • Region-based Graph Coloring Register allocation – 主要參考 C ALLAHAN , D., AND KOBLENZ , B. Register allocation via hierarchical graph coloring. SIGPLAN 26, 6 (1991),192–203. – 要掌握 IRA 理論面, 至少要讀完 ira.c 註解 上列的幾篇參考文獻
  • 115. 深入觀察 IRA 115 • 以 nds32 target 為實驗平台 • 透過其 dump 資訊來研究 IRA: – nds32le-elf-gcc foo.c -O0 -fdump-rtl-ira -fira-verbose=9 -fomit-frame-pointer •-O0:避免程式在Middle-End時, 被最佳化 打亂 •-fdump-rtl-ira: 吐出 IRA 的 dump •-fira-verbose=9: 吐出最多的 IRA 資訊 •-fomit-frame-pointer: 不用fp, 以簡化 輸出
  • 116. Let's hack IRA! 避免IRA便宜 行事, 導致無 法觀察 116 diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c index f6da5d6..9f96d25 100644 --- a/gcc/cfgexpand.c +++ b/gcc/cfgexpand.c @@ -5626,7 +5626,8 @@ pass_expand::execute (function *fun) edge e; rtx var_seq, var_ret_seq; unsigned i; + int saved_optimize = optimize; + optimize = 2; timevar_push (TV_OUT_OF_SSA); rewrite_out_of_ssa (&SA); timevar_pop (TV_OUT_OF_SSA); @@ -5999,6 +6000,7 @@ pass_expand::execute (function *fun) timevar_pop (TV_POST_EXPAND); + optimize = saved_optimize; return 0; } diff --git a/gcc/ira.c b/gcc/ira.c index ccc6c79..16d3e55 100644 --- a/gcc/ira.c +++ b/gcc/ira.c @@ -5032,7 +5032,8 @@ ira (FILE *f) int rebuild_p; bool saved_flag_caller_saves = flag_caller_saves; enum ira_region saved_flag_ira_region = flag_ira_region; + optimize = 2; + flag_ira_region = IRA_REGION_ALL; + flag_expensive_optimizations = true; ira_conflicts_p = optimize > 0; ira_use_lra_p = targetm.lra_p (); diff --git a/gcc/ira-build.c b/gcc/ira-build.c index ee20c09..1a64748 100644 --- a/gcc/ira-build.c +++ b/gcc/ira-build.c @@ -2608,6 +2608,8 @@ remove_low_level_allocnos (void) static void remove_unnecessary_regions (bool all_p) { + if (!all_p) + return; if (current_loops == NULL) return; if (all_p) 避免GCC, GIMPLE->RTL 因-O0 而產生過度冗餘的 Code 避免IRA進行 Region 融合
  • 117. Let's hack IRA! diff --git a/gcc/config/nds32/nds32.h b/gcc/config/nds32/nds32.h 117 index bbcf100..a345638 100644 --- a/gcc/config/nds32/nds32.h +++ b/gcc/config/nds32/nds32.h @@ -503,13 +503,13 @@ enum nds32_builtins reserved for other use : $r24, $r25, $r26, $r27 */ #define FIXED_REGISTERS { /* r0 r1 r2 r3 r4 r5 r6 r7 */ - 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 1, 1, 1, 1, /* r8 r9 r10 r11 r12 r13 r14 r15 */ - 0, 0, 0, 0, 0, 0, 0, 1, + 1, 1, 1, 1, 1, 1, 1, 1, /* r16 r17 r18 r19 r20 r21 r22 r23 */ - 0, 0, 0, 0, 0, 0, 0, 0, + 1, 1, 1, 1, 1, 1, 1, 1, /* r24 r25 r26 r27 r28 r29 r30 r31 */ - 1, 1, 1, 1, 0, 1, 0, 1, + 1, 1, 1, 1, 1, 1, 1, 1, /* ARG_POINTER:32 */ 1, /* FRAME_POINTER:33 */ @@ -524,13 +524,13 @@ enum nds32_builtins 1 : caller-save registers */ #define CALL_USED_REGISTERS { /* r0 r1 r2 r3 r4 r5 r6 r7 */ - 1, 1, 1, 1, 1, 1, 0, 0, + 1, 1, 1, 1, 1, 1, 1, 1, /* r8 r9 r10 r11 r12 r13 r14 r15 */ - 0, 0, 0, 0, 0, 0, 0, 1, + 1, 1, 1, 1, 1, 1, 1, 1, /* r16 r17 r18 r19 r20 r21 r22 r23 */ 1, 1, 1, 1, 1, 1, 1, 1, /* r24 r25 r26 r27 r28 r29 r30 r31 */ - 1, 1, 1, 1, 0, 1, 0, 1, + 1, 1, 1, 1, 1, 1, 1, 1, /* ARG_POINTER:32 */ 1, /* FRAME_POINTER:33 */ 為方便小程式就會 Spill, 將 nds32 改成只有 4 個 Register
  • 118. 實驗環境準備 118 • git clone git://gcc.gnu.org/git/gcc.git ~/gcc-ra-test/ src • # 套上前兩頁的修改 • mkdir ~/build-gcc-ra-test • cd ~/build-gcc-ra-test • ~/gcc-ra-test/src/configure --prefix=$HOME/gcc-ra-test --target=nds32le-elf • make all-gcc -j8 && make install-gcc
  • 119. IRA 119 int foo(int n) { int i, y, t, z; int x = n * 2; z = x; int sum = 0; for (i=0; i<n; ++i) { y = i; t = z + y; sum = sum + t; } return sum; }
  • 120. RTL 120 (insn 2 4 3 2 (set (reg/v:SI 48 [ n ]) (reg:SI 0 $r0 [ n ]))) (insn 6 3 7 2 (set (reg/v:SI 42 [ x ]) (ashift:SI (reg/v:SI 48 [ n ]) (const_int 1 [0x1])))) (insn 7 6 8 2 (set (reg/v:SI 43 [ z ]) (reg/v:SI 42 [ x ]))) (insn 8 7 9 2 (set (reg/v:SI 41 [ sum ]) (const_int 0 [0]))) (insn 9 8 33 2 (set (reg/v:SI 40 [ i ]) (const_int 0 [0]))) (jump_insn 33 9 34 2 (set (pc) (label_ref 17))) (code_label 19 34 12 3 3) (insn 13 12 14 3 (set (reg/v:SI 44 [ y ]) (reg/v:SI 40 [ i ]))) (insn 14 13 15 3 (set (reg/v:SI 45 [ t ]) (plus:SI (reg/v:SI 43 [ z ]) (reg/v:SI 44 [ y ])))) (insn 15 14 16 3 (set (reg/v:SI 41 [ sum ]) (plus:SI (reg/v:SI 41 [ sum ]) (reg/v:SI 45 [ t ])))) (insn 16 15 17 3 (set (reg/v:SI 40 [ i ]) (plus:SI (reg/v:SI 40 [ i ]) (const_int 1 [0x1])))) (code_label 17 16 18 4 2) (insn 20 18 21 4 (set (reg:SI 15 $ta) (lt:SI (reg/v:SI 40 [ i ]) (reg/v:SI 48 [ n ])))) (jump_insn 21 20 22 4 (set (pc) (if_then_else (ne (reg:SI 15 $ta) (const_int 0 [0])) (label_ref 19) (pc)))) (insn 23 22 26 5 (set (reg:SI 46 [ D.1385 ]) (reg/v:SI 41 [ sum ]))) (insn 26 23 30 5 (set (reg:SI 47 [ <retval> ]) (reg:SI 46 [ D.1385 ]))) (insn 30 26 31 5 (set (reg/i:SI 0 $r0) (reg:SI 47 [ <retval> ]))) (insn 31 30 0 5 (use (reg/i:SI 0 $r0)))
  • 121. RTL 121 n(r48) = n($r0) x(r42) = n(r8) * 2 z(r43) = x(r42) sum(r41) = 0 i(r40) = 0 i(r40) < n(r48)? y(r44) = i(r40) t(r45) = z(r43) + y(r44) sum(r41) = sum(r41) + t(r45) i(r40) = i(r40) + 1 D.1385(r46) = sum(r41) <retval>(r47) = D.1385(r46) $r0 = <retval>(r47) 上一頁那沱 RTL 大致等價 於右邊的 CFG
  • 122. Pseudo Register 122 • GCC RTL中在RA前有無限多 Pseudo Register – GCC Pseudo Register 不是 SSA Form (set (reg:SI 45 [ t ]) (plus:SI (reg:SI 43 [ z ]) (reg:SI 44 [ y ]))) • 相對應的就是 Hard Register 會保留在上層 Pseudo Register Number IR(GIMPLE)的變數名稱 (set (reg/v:SI 48 [ n ]) (reg:SI 0 $r0 [ n ]))
  • 123. Pseudo<->Var 123 Variable Name Pseudo Register Number Comment i 40 sum 41 x 42 y 44 z 43 t 45 n 48 D.1385 46 Temp variable create by expand retval 47 Return Value
  • 124. ABI 124 • Expand (Gimple->RTL) 階段時, 會針對參數及 回傳值插入額外 move 指令 # 參數 (insn 2 4 3 2 (set (reg/v:SI 48 [ n ]) (reg:SI 0 $r0 [ n ])) foo.c:2 27 {*movsi} (nil)) ... # 回傳值 (insn 23 22 26 5 (set (reg:SI 46 [ D.1385 ]) (reg/v:SI 41 [ sum ])) foo.c:12 27 {*movsi} (nil)) (insn 26 23 30 5 (set (reg:SI 47 [ <retval> ]) (reg:SI 46 [ D.1385 ])) foo.c:12 27 {*movsi} (nil)) (insn 30 26 31 5 (set (reg/i:SI 0 $r0) (reg:SI 47 [ <retval> ])) foo.c:13 27 {*movsi} (nil)) 第一個參數放 $r0 回傳值也放 $r0
  • 125. Region 125 int foo(int n) { int i, y, t, z; int x = n * 2; z = x; int sum = 0; for (i=0; i<n; ++i) { y = i; t = z + y; sum = sum + t; } Region 1 return sum; } Region 0 t(r45) = z(r43) + y(r44) sum(r41) = sum(r41) + t(r45) 根據 Loop Structure 建立 Region, 若該 Loop Register Pressure 較低則會與 上層 Region 合併 n(r48) = n($r0) x(r42) = n(r8) * 2 z(r43) = x(r42) sum(r41) = 0 i(r40) = 0 i(r40) < n(r48)? y(r44) = i(r40) i(r40) = i(r40) + 1 D.1385(r46) = sum(r41) <retval>(r47) = D.1385(r46) $r0 = <retval>(r47)
  • 126. Allocno 126 • 在IRA 中 Pseudo Register 並不是點的 單位, Allocno 在IRA才是等同於圖中的 點– ira_allcno_t • 在不同 Region 中一個 Pseudo Register 都有 各自的 Allocno – Split by region
  • 127. Allocno 127 • 一個 Allocno 由數個 Live Range 組成 • Live Range 在 IRA 中相對應的結構是 live_range_t – live_range_t = start/end program point
  • 128. Build Allocno 128 IRA DUMP ... a0(r47,l0) costs: LOW_REGS:0,0 MIDDLE_REGS:0,0 GENERAL_REGS:0,0 ALL_REGS:0,0 MEM:2,2 a1(r46,l0) costs: LOW_REGS:0,0 MIDDLE_REGS:0,0 GENERAL_REGS:0,0 ALL_REGS:0,0 MEM:2,2 a2(r41,l0) costs: LOW_REGS:0,0 MIDDLE_REGS:0,0 GENERAL_REGS:0,0 ALL_REGS:0,0 MEM:6,22 a3(r40,l0) costs: LOW_REGS:0,0 MIDDLE_REGS:0,0 GENERAL_REGS:0,0 ALL_REGS:0,0 MEM:5,30 a4(r43,l0) costs: LOW_REGS:0,0 MIDDLE_REGS:0,0 GENERAL_REGS:0,0 ALL_REGS:0,0 MEM:1,9 a5(r42,l0) costs: LOW_REGS:0,0 MIDDLE_REGS:0,0 GENERAL_REGS:0,0 ALL_REGS:0,0 MEM:9,9 a6(r48,l0) costs: LOW_REGS:0,0 MIDDLE_REGS:0,0 GENERAL_REGS:0,0 ALL_REGS:0,0 MEM:9,17 a7(r40,l1) costs: LOW_REGS:0,0 MIDDLE_REGS:0,0 GENERAL_REGS:0,0 ALL_REGS:0,0 MEM:25,25 a8(r41,l1) costs: LOW_REGS:0,0 MIDDLE_REGS:0,0 GENERAL_REGS:0,0 ALL_REGS:0,0 MEM:16,16 a9(r43,l1) costs: LOW_REGS:0,0 MIDDLE_REGS:0,0 GENERAL_REGS:0,0 ALL_REGS:0,0 MEM:8,8 a10(r48,l1) costs: LOW_REGS:0,0 MIDDLE_REGS:0,0 GENERAL_REGS:0,0 ALL_REGS:0,0 MEM:8,8 a11(r45,l1) costs: LOW_REGS:0,0 MIDDLE_REGS:0,0 GENERAL_REGS:0,0 ALL_REGS:0,0 MEM:16,16 a12(r44,l1) costs: LOW_REGS:0,0 MIDDLE_REGS:0,0 GENERAL_REGS:0,0 ALL_REGS:0,0 MEM:9,9 ... a12(r44,l1) Allocno Pseudo Register Region
  • 129. Build Allocno 129 n(r48/a3) = n($r0) x(r42/a5) = n(r8/a6) * 2 z(r43/a4) = x(r42/a5) sum(r41/a2) = 0 i(r40/a3) = 0 i(r40/a7) < n(r48/a10)? y(r44/a12) = i(r40/a7) t(r45/a11) = z(r43/a9) + y(r44/a12) sum(r41/a8) = sum(r41/a8) + t(r45/a11) i(r40/a7) = i(r40/a7) + 1 D.1385(r46/a1) = sum(r41/a2) <retval>(r47/a0) = D.1385(r46/a1) $r0 = <retval>(r47/a0) Variable Name Pseudo Register Number Region 0 Region 1 i 40 a3 a7 sum 41 a2 a8 x 42 a5 y 44 a12 z 43 a4 a9 t 45 a11 n 48 a6 a10 D.1385 46 a1 retval 47 a0
  • 130. Build Cap 130 Variable Name Pseudo Register Number Region 0 Region 1 i 40 a3 a7 sum 41 a2 a8 x 42 a5 y 44 a13 a12 z 43 a4 a9 t 45 a14 a11 n 48 a6 a10 D.1385 46 a1 retval 47 a0 n(r48/a6) = n($r0) x(r42/a5) = n(r8/a6) * 2 z(r43/a4) = x(r42/a5) sum(r41/a2) = 0 i(r40/a3) = 0 i(r40/a7) < n(r48/a10)? y(r44/a12) = i(r40/a7) t(r45/a11) = z(r43/a9) + y(r44/a12) sum(r41/a8) = sum(r41/a8) + t(r45/a11) i(r40/a7) = i(r40/a7) + 1 D.1385(r46/a1) = sum(r41/a2) <retval>(r47/a0) = D.1385(r46/a1) $r0 = <retval>(r47/a0)
  • 131. Cap 131 • Allocno 的一種, 代表內層迴圈的變數 –用來輔助計算外層迴圈的Allocno Cost
  • 132. Build Copy 132 n(r48/a6) = n($r0) x(r42/a5) = n(r8/a6) * 2 z(r43/a4) = x(r42/a5) sum(r41/a2) = 0 i(r40/a3) = 0 i(r40/a7) < n(r48/a10)? y(r44/a12) = i(r40/a7) t(r45/a11) = z(r43/a9) + y(r44/a12) sum(r41/a8) = sum(r41/a8) + t(r45/a11) i(r40/a7) = i(r40/a7) + 1 D.1385(r46/a1) = sum(r41/a2) <retval>(r47/a0) = D.1385(r46/a1) $r0 = <retval>(r47/a0) ... cp0:a1(r46)<->a2(r41)@1:move cp1:a0(r47)<->a1(r46)@1:move cp2:a4(r43)<->a5(r42)@1:move cp3:a11(r45)<->a12(r44)@1:shuffle cp4:a13(r45)<->a14(r44)@1:shuffle ...
  • 133. Copy 133 • 除了原本 `a = b` 這類指令外 IRA 也 引進另一類型的coalesce – `a = b op c` –若a與b或c無Interference 則也可 coalesce –可減少 Register Pressure
  • 134. Register Preference 134 n(r48/a6) = n($r0) x(r42/a5) = n(r8/a6) * 2 z(r43/a4) = x(r42/a5) sum(r41/a2) = 0 i(r40/a3) = 0 i(r40/a7) < n(r48/a10)? y(r44/a12) = i(r40/a7) t(r45/a11) = z(r43/a9) + y(r44/a12) sum(r41/a8) = sum(r41/a8) + t(r45/a11) i(r40/a7) = i(r40/a7) + 1 D.1385(r46/a1) = sum(r41/a2) <retval>(r47/a0) = D.1385(r46/a1) $r0 = <retval>(r47/a0) ... pref0:a0(r47)<-hr0@2 pref1:a6(r48)<-hr0@2 ...
  • 135. Register Preference 135 • 某些 Allocno 若分配到特定暫存器則可 減少 Move 指令 –主要來自ABI的限制
  • 136. Cost 136 ira-costs.c 算 Cost 的邏輯都放這!
  • 137. Cost 137 • 參考 Profile feed back 資訊 – 沒Profile資訊GCC會自己猜機率(predict.c) • [1] "Branch Prediction for Free" Ball and Larus; PLDI '93. • [2] "Static Branch Frequency and Program Profile Analysis" Wu and Larus; MICRO-27. • [3] "Corpus-based Static Branch Prediction" Calder, Grunwald, Lindsay, Martin, Mozer, and Zorn; PLDI '95. */ • 由上往下(Top-Down)的Region算下去
  • 138. Preferred Register Class 138 • 在記算 Cost 階段時也會計算每個 Allocno 所偏好的 Register Class –參考Register Preference
  • 139. Preferred Register Class 139 • IRA 在這邊有作一些特殊處理 – 以 x86 為例: •若 Allocno a 有 8 use, 1 def:若其中 只有一個地方一定要 EBX, 而其它地方需 要 EAX, 那該 Allocno 一樣會偏好 EAX •需要 EBX 的地方則留給 Reload 處理 •在一般文獻中會採用所有使用到的Register Class 的交集(Intersect)
  • 140. Coloring Region 0 140 Loop 0 (parent -1, header bb2, depth 0) ... Forming thread by copy 0:a1r46-a2r41 (freq=1): Result (freq=6): a1r46(2) a2r41(4) Forming thread by copy 1:a0r47-a1r46 (freq=1): Result (freq=8): a0r47(2) a1r46(2) a2r41(4) Pushing a5(r42,l0)(cost 0) Pushing a1(r46,l0)(cost 0) Pushing a0(r47,l0)(cost 0) Pushing a4(r43,l0)(potential spill: pri=3, cost=16) Making a13(r45,l0: a11(r45,l1)) colorable Forming thread by copy 4:a13r45-a14r44 (freq=1): Result (freq=4): a13r45(2) a14r44(2) Making a14(r44,l0: a12(r44,l1)) colorable Pushing a14(r44,l0: a12(r44,l1))(cost 16) Making a2(r41,l0) colorable Making a3(r40,l0) colorable Making a6(r48,l0) colorable Pushing a6(r48,l0)(cost 28) Pushing a13(r45,l0: a11(r45,l1))(cost 16) Pushing a3(r40,l0)(cost 40) Pushing a2(r41,l0)(cost 32) Popping a2(r41,l0) -- assign reg 1 Popping a3(r40,l0) -- assign reg 2 Popping a13(r45,l0: a11(r45,l1)) -- assign reg 3 Popping a6(r48,l0) -- assign reg 0 Popping a14(r44,l0: a12(r44,l1)) -- assign reg 3 Popping a4(r43,l0) -- spill Popping a0(r47,l0) -- assign reg 0 Popping a1(r46,l0) -- assign reg 0 Popping a5(r42,l0) -- assign reg 1 n(r48/a6) = n($r0) x(r42/a5) = n(r8/a6) * 2 z(r43/a4) = x(r42/a5) sum(r41/a2) = 0 i(r40/a3) = 0 i(r40/a7) < n(r48/a10)? y(r44/a12) = i(r40/a7) t(r45/a11) = z(r43/a9) + y(r44/a12) sum(r41/a8) = sum(r41/a8) + t(r45/a11) i(r40/a7) = i(r40/a7) + 1 D.1385(r46/a1) = sum(r41/a2) <retval>(r47/a0) = D.1385(r46/a1) $r0 = <retval>(r47/a0)
  • 141. Coloring Region 0 141 n(r48/a6) = n($r0) x(r42/a5) = n(r8/a6) * 2 z(r43/a4) = x(r42/a5) sum(r41/a2) = 0 i(r40/a3) = 0 i(r40/a7) < n(r48/a10)? y(r44/a12) = i(r40/a7) t(r45/a11) = z(r43/a9) + y(r44/a12) sum(r41/a8) = sum(r41/a8) + t(r45/a11) i(r40/a7) = i(r40/a7) + 1 D.1385(r46/a1) = sum(r41/a2) <retval>(r47/a0) = D.1385(r46/a1) $r0 = <retval>(r47/a0) x/a5 z/a4 t/ a14 sum/ a2 i/a3 y/ a13 n/a6 D.1385/ a1 retval/ a0
  • 142. Coloring Region 0 142 x/a5 z/a4 t/ a14 sum/ a2 i/a3 y/ a13 n/a6 D.1385/ a1 retval/ a0 ira中coalesce並不會將點 合併而是用一種稱為Thread 的方式把copy相關的點一起 推到 Stack
  • 143. Colorable 143 • 計算Colorable的方式並非計算邊數: –以x86為例: •若Allocno a可用EAX及EBX,且有3個邊, 但其它相鄰的點都僅可使用EAX, 這樣一樣 是 Colorable a {EAX,EBX} b {EAX} c {EAX} d {EAX}
  • 144. Coloring Region 0 144 x/a5 z/a4 t/ a14 sum/ a2 i/a3 y/ a13 n/a6 D.1385/ a1 retval/ a0 IRA 會先依照初始的邊數 將所有 Allocno 分兩堆 可簡單上色的一堆(邊小於4的) 不可簡單上色的一堆(邊大於等於4的) colorable bucket uncolorable bucket
  • 145. Coloring Region 0 145 t/ a14 y/ a13 n/a6 x/a5 z/a4 i/a3 sum/ a2 D.1385/ a1 retval/ a0 colorable bucket uncolorable bucket x/a5 z/a4 t/ a14 sum/ a2 i/a3 y/ a13 n/a6 D.1385/ a1 retval/ a0
  • 146. Coloring Region 0 146 z/a4 x/a5 n/a6 sum/ a2 i/a3 y/ a13 t/ a14 D.1385/ a1 retval/ a0 colorable bucket uncolorable bucket sort uncolorable bucket by spill cost x/a5 z/a4 t/ a14 sum/ a2 i/a3 y/ a13 n/a6 D.1385/ a1 retval/ a0
  • 147. Coloring Region 0 147 先將所有 Colorable bucket 的推進 stack z/a4 x/a5 n/a6 sum/ a2 i/a3 y/ a13 t/ a14 D.1385/ a1 retval/ a0 colorable bucket uncolorable bucket Stack x/a5 z/a4 t/ a14 sum/ a2 i/a3 y/ a13 n/a6 D.1385/ a1 retval/ a0 1 5 6 4 4 5 5 0 0
  • 148. Coloring Region 0 148 先將所有 Colorable bucket 的推進 stack z/a4 x/a5 n/a6 sum/ a2 i/a3 y/ a13 t/ a14 D.1385/ a1 retval/ a0 colorable bucket uncolorable bucket Stack z/a4 t/ a14 sum/ a2 i/a3 y/ a13 n/a6 D.1385/ a1 retval/ a0 5 5 4 4 5 5 0 0
  • 149. Coloring Region 0 149 先將所有 Colorable bucket 的推進 stack z/a4 x/a5 n/a6 sum/ a2 i/a3 y/ a13 t/ a14 D.1385/ a1 retval/ a0 colorable bucket uncolorable bucket Stack z/a4 t/ a14 sum/ a2 i/a3 y/ a13 n/a6 retval/ a0 5 5 4 4 5 5 0
  • 150. Coloring Region 0 150 先將所有 Colorable bucket 的推進 stack z/a4 retval/ a0 x/a5 n/a6 sum/ a2 i/a3 y/ a13 t/ a14 D.1385/ a1 colorable bucket uncolorable bucket Stack z/a4 t/ a14 sum/ a2 i/a3 y/ a13 n/a6 5 5 4 4 5 5
  • 151. Coloring Region 0 151 先將所有 Colorable bucket 的推進 stack z/a4標為 Spill 候選人 z/a4 retval/ a0 x/a5 n/a6 sum/ a2 i/a3 y/ a13 t/ a14 D.1385/ a1 colorable bucket uncolorable bucket Stack t/ a14 sum/ a2 i/a3 y/ a13 n/a6 4 3 3 4 4
  • 152. Coloring Region 0 152 先將所有 Colorable bucket 的推進 stack z/a4標為 Spill 候選人 z/a4 retval/ a0 x/a5 n/a6 sum/ a2 t/ a14 y/ i/a3 a13 D.1385/ a1 colorable bucket uncolorable bucket Stack t/ a14 sum/ a2 i/a3 y/ a13 n/a6 4 3 3 4 4 y/a13及t/a14 因此變成 colorable
  • 153. Coloring Region 0 153 先將所有 Colorable bucket 的推進 stack z/a4 retval/ a0 x/a5 y/ a13 i/a3 sum/ a2 t/ a14 n/a6 D.1385/ a1 colorable bucket uncolorable bucket Stack sum/ a2 i/a3 y/ a13 n/a6 3 3 3 3 uncolorable 移到colorable 會照某種順序排入
  • 154. Coloring Region 0 154 先將所有 Colorable bucket 的推進 stack n/a6 z/a4 retval/ a0 x/a5 y/ a13 i/a3 sum/ a2 t/ a14 D.1385/ a1 colorable bucket uncolorable bucket Stack sum/ a2 i/a3 y/ a13 2 2 2
  • 155. Coloring Region 0 155 先將所有 Colorable bucket 的推進 stack n/a6 z/a4 retval/ a0 i/a3 y/ x/a5 sum/ a2 a13 t/ a14 D.1385/ a1 colorable bucket uncolorable bucket Stack sum/ a2 i/a3 1 1
  • 156. Coloring Region 0 156 先將所有 Colorable bucket 的推進 stack n/a6 z/a4 retval/ a0 x/a5 sum/ a2 i/a3 y/ a13 t/ a14 D.1385/ a1 colorable bucket uncolorable bucket Stack sum/ 0 a2
  • 157. Coloring Region 0 157 先將所有 Colorable bucket 的推進 stack n/a6 z/a4 retval/ a0 x/a5 sum/ a2 i/a3 y/ a13 t/ a14 D.1385/ a1 colorable bucket uncolorable bucket Stack
  • 158. Coloring Region 0 158 n/a6 z/a4 retval/ a0 x/a5 i/a3 y/ a13 t/ a14 D.1385/ a1 Stack $r0 $r1 $r2 $r3 sum/ a2
  • 159. Coloring Region 0 159 n/a6 z/a4 retval/ a0 x/a5 y/ a13 t/ a14 D.1385/ a1 Stack $r0 $r1 $r2 $r3 sum/ a2 i/a3
  • 160. Coloring Region 0 160 n/a6 t/ a14 z/a4 retval/ a0 D.1385/ a1 x/a5 Stack $r0 $r1 $r2 $r3 sum/ a2 i/a3 y/ a13
  • 161. Coloring Region 0 161 t/ a14 z/a4 retval/ a0 D.1385/ a1 x/a5 Stack $r0 $r1 $r2 $r3 sum/ a2 i/a3 y/ a13 n/a6 D.1385/ a1 retval/ a0
  • 162. Coloring Region 0 162 z/a4 retval/ a0 D.1385/ a1 x/a5 Stack $r0 $r1 $r2 $r3 t/ a14 sum/ a2 i/a3 y/ a13 n/a6
  • 163. Coloring Region 0 163 retval/ a0 D.1385/ a1 x/a5 Stack $r0 $r1 $r2 $r3 z/a4 t/ a14 sum/ a2 i/a3 y/ a13 n/a6 z/a4拿不到 Register 標為Spill 但繼續著色
  • 164. Coloring Region 0 164 D.1385/ a1 x/a5 Stack $r0 $r1 $r2 $r3 z/a4 t/ a14 sum/ a2 i/a3 y/ a13 n/a6 retval/ a0
  • 165. Coloring Region 0 165 x/a5 Stack $r0 $r1 $r2 $r3 z/a4 t/ a14 sum/ a2 i/a3 y/ a13 n/a6 D.1385/ a1 retval/ a0
  • 166. Coloring Region 0 166 Stack $r0 $r1 $r2 $r3 x/a5 z/a4 t/ a14 sum/ a2 i/a3 y/ a13 n/a6 D.1385/ a1 retval/ a0
  • 167. Coloring Region 1 167 n(r48/a6) = n($r0) x(r42/a5) = n(r8/a6) * 2 z(r43/a4) = x(r42/a5) sum(r41/a2) = 0 i(r40/a3) = 0 i(r40/a7) < n(r48/a10)? y(r44/a12) = i(r40/a7) t(r45/a11) = z(r43/a9) + y(r44/a12) sum(r41/a8) = sum(r41/a8) + t(r45/a11) i(r40/a7) = i(r40/a7) + 1 D.1385(r46/a1) = sum(r41/a2) <retval>(r47/a0) = D.1385(r46/a1) $r0 = <retval>(r47/a0) Loop 1 (parent 0, header bb4, depth 1) ... Forming thread by copy 3:a11r45-a12r44 (freq=1): Result (freq=4): a11r45(2) a12r44(2) Pushing a12(r44,l1)(cost 0) Making a7(r40,l1) colorable Making a8(r41,l1) colorable Making a10(r48,l1) colorable Pushing a10(r48,l1)(cost 40) Pushing a8(r41,l1)(cost 48) Pushing a11(r45,l1)(cost 0) Pushing a7(r40,l1)(cost 64) Popping a7(r40,l1) -- assign reg 2 Popping a11(r45,l1) -- assign reg 3 Popping a8(r41,l1) -- assign reg 1 Popping a10(r48,l1) -- assign reg 0 Popping a12(r44,l1) -- assign reg 3
  • 168. Coloring Region 1 168 n(r48/a6) = n($r0) x(r42/a5) = n(r8/a6) * 2 z(r43/a4) = x(r42/a5) sum(r41/a2) = 0 i(r40/a3) = 0 i(r40/a7) < n(r48/a10)? y(r44/a12) = i(r40/a7) t(r45/a11) = z(r43/a9) + y(r44/a12) sum(r41/a8) = sum(r41/a8) + t(r45/a11) i(r40/a7) = i(r40/a7) + 1 D.1385(r46/a1) = sum(r41/a2) <retval>(r47/a0) = D.1385(r46/a1) $r0 = <retval>(r47/a0) z/a9 t/ a11 sum/ a8 i/a7 y/ a12 n/ a10
  • 169. Coloring Region 1 169 z/a9 t/ a11 sum/ a8 i/a7 y/ a12 n/ a10 $r0 $r1 $r2 $r3 z/a4拿不到 Register, z/a9 繼續拿不到
  • 170. Coloring Region 1 170 z/a9 t/ a11 sum/ a8 i/a7 y/ a12 n/ a10 $r0 $r1 $r2 $r3 z/a4拿不到 Register, z/a9 繼續拿不到 所有Spill的Register會在下階段處理
  • 171. Insert Move 171 n(r48) = n($r0) x(r42) = n(r8) * 2 z(r43) = x(r42) sum(r41) = 0 i(r40) = 0 i(r50) = i(r40) sum(r51) = sum(r41) n(r52) = n(r48) i(r50) < n(r52)? y(r44) = i(r50) t(r45) = z(r43) + y(r44) sum(r51) = sum(r51) + t(r45) i(r50) = i(r50) + 1 sum(r41) = sum(r51) D.1385(r46) = sum(r41) <retval>(r47) = D.1385(r46) $r0 = <retval>(r47)
  • 172. RELOAD 172 n(r48) = n($r0) x(r42) = n(r8) * 2 z(r43) = x(r42) sum(r41) = 0 i(r40) = 0 i(r50) = i(r40) sum(r51) = sum(r41) n(r52) = n(r48) RELOAD i(r50) < n(r52)? y(r44) = i(r50) t(r45) = z(r43) + y(r44) sum(r51) = sum(r51) + t(r45) i(r50) = i(r50) + 1 sum(r41) = sum(r51) D.1385(r46) = sum(r41) <retval>(r47) = D.1385(r46) $r0 = <retval>(r47)
  • 174. RELOAD • nds32無實作Reload相關Hook因此不展示 Reload相關流程 • IRA中使用LRA或Reload, 流程會有所不 同 174
  • 175. IRA + LRA 175 LRA Ref: Vladimir. N. .Makarov, “The top-down regional register allocator for irregular register file architectures”
  • 176. LRA New (and old) pseudo assignment 176 Remove Scratches Update virtual register displacements No Change Constraints: RTL transformations Inheritance/split transformations in EBB scope Undo inheritance for spilled pseudos Memory-Memory move coalesce Spilled pseudo to memory substitution Hard reg substitution, devirtalization and restoring scratches New pseudos or insns Start Finish Ref: Vladimir. N. .Makarov, “The Local Register Allocator Project”, GNU Tools Cauldron 2012
  • 177. LRA New (and old) pseudo assignment 177 Remove Scratches Update virtual register displacements No Change Constraints: RTL transformations Inheritance/split transformations in EBB scope Undo inheritance for spilled pseudos Memory-Memory move coalesce Spilled pseudo to memory substitution Hard reg substitution, devirtalization and restoring scratches New pseudos or insns Start Finish
  • 178. Remove Scratches 178 處理 match_scratch 讓它暫時先變 pseudo register 以利後續處理
  • 179. LRA New (and old) pseudo assignment 179 Remove Scratches Update virtual register displacements No Change Constraints: RTL transformations Inheritance/split transformations in EBB scope Undo inheritance for spilled pseudos Memory-Memory move coalesce Spilled pseudo to memory substitution Hard reg substitution, devirtalization and restoring scratches New pseudos or insns Start Finish
  • 180. Update virtual register displacements 180 • 分配 Stack slot 空間 – eg: $sp + 0, $sp + 4, ... • 融合 $sp 跟 $fp
  • 181. LRA New (and old) pseudo assignment 181 Remove Scratches Update virtual register displacements No Change Constraints: RTL transformations Inheritance/split transformations in EBB scope Undo inheritance for spilled pseudos Memory-Memory move coalesce Spilled pseudo to memory substitution Hard reg substitution, devirtalization and restoring scratches New pseudos or insns Start Finish
  • 182. Constraint 182 (define_insn "addsi" [(set (match_operand:SI 0 "register_operand", "r, r") (plus:SI (match_operand:SI 1 "register_operand", "r, r") (match_operand:SI 2 "register_operand", "r, Is15")))] "" ... Constraint是gcc在md 中用來 描述指令中Operand的資訊 以上面為例, 上面描述add可接受兩種格式: add $ra, $rb, $rc add $ra, $rb, #simm15
  • 183. LRA Constraints #0 183 ... alt=0,overall=0,losers=0,rld_nregs=0 Choosing alt 0 in insn 6: (0) =l (1) l (2) Iu03 {ashlsi3} 0 Non input pseudo reload: reject++ alt=0,overall=607,losers=1,rld_nregs=1 0 Non input pseudo reload: reject++ alt=1,overall=607,losers=1,rld_nregs=1 0 Non pseudo reload: reject++ alt=2,overall=1,losers=0,rld_nregs=0 Choosing alt 2 in insn 7: (0) U45 (1) l {*movsi} ... 檢查每道指令是否符合任何一組 Constraint
  • 184. LRA Constraints #0 ... t(r45/r3) = z(r53/X) + y(r44/r3) sum(r51/r1) = sum(r51/r1) + t(r45/r3) 184 1 Matching alt: reject+=2 alt=0: Bad operand -- refuse alt=1: Bad operand -- refuse 1 Matching alt: reject+=2 alt=2: Bad operand -- refuse alt=3: Bad operand -- refuse 1 Matching alt: reject+=2 alt=4,overall=8,losers=1,rld_nregs=1 alt=5,overall=6,losers=1,rld_nregs=1 alt=6: Bad operand -- refuse alt=7: Bad operand -- refuse alt=8: Bad operand -- refuse alt=9,overall=6,losers=1,rld_nregs=1 alt=0: Bad operand -- refuse alt=1: Bad operand -- refuse alt=2: Bad operand -- refuse alt=3: Bad operand -- refuse alt=4,overall=6,losers=1,rld_nregs=1 alt=5,overall=6,losers=1,rld_nregs=1 alt=6: Bad operand -- refuse alt=7: Bad operand -- refuse alt=8: Bad operand -- refuse alt=9,overall=6,losers=1,rld_nregs=1 找不到一組 Constraint 可用 時會進行 Split Commutative operand exchange in insn 14 Choosing alt 4 in insn 14: (0) d (1) 0 (2) r {addsi3} y(r44/r3) = i(r50/r2) z(r53/X) = z(r43/M) Creating newreg=53 from oldreg=43, assigning class GENERAL_REGS to r53 14: r45:SI=r44:SI+r53:SI REG_DEAD r44:SI Inserting insn reload before: 40: r53:SI=r43:SI ... n(r48/r0) = n($r0) x(r42/r1) = n(r8/r0) * 2 z(r43/M) = x(r42/r1) sum(r41/r1) = 0 i(r40/r2) = 0 i(r50/r2) = i(r40/r2) sum(r51/r1) = sum(r41/r1) n(r52/r0) = n(r48/r0) i(r50/r2) < n(r52/r0)? i(r50/r2) = i(r50/r2) + 1 sum(r41/r1) = sum(r51/r1) D.1385(r46/r1) = sum(r41/r1) <retval>(r47/r1) = D.1385(r46/r1) $r0 = <retval>(r47/r1)
  • 185. LRA New (and old) pseudo assignment 185 Remove Scratches Update virtual register displacements No Change Constraints: RTL transformations Inheritance/split transformations in EBB scope Undo inheritance for spilled pseudos Memory-Memory move coalesce Spilled pseudo to memory substitution Hard reg substitution, devirtalization and restoring scratches New pseudos or insns Start Finish lra-constraints.c:lra_inheritance()
  • 186. Inheritance/Split 186 • 這部份比較複雜, LRA嘗試最佳化的部份 –時間關係略過 – lra-constraints.c:lra_inheritance()
  • 187. LRA New (and old) pseudo assignment 187 Remove Scratches Update virtual register displacements No Change Constraints: RTL transformations Inheritance/split transformations in EBB scope Undo inheritance for spilled pseudos Memory-Memory move coalesce Spilled pseudo to memory substitution Hard reg substitution, devirtalization and restoring scratches New pseudos or insns Start Finish
  • 188. LRA Assign #1 Assigning to 53 (cl=GENERAL_REGS, orig=43, freq=2, tfirst=53, tfreq=2)... Trying 0: spill 52(freq=2) Now best 0(cost=0) Trying 1: spill 51(freq=3) Trying 2: spill 50(freq=4) Trying 3: spill 44(freq=2) 188 Spill r52(hr=0, freq=2) for r53 Assign 0 to reload r53 (freq=2) 要Spill時會先嘗試有沒有 Register, 沒有的時候只好找個替死鬼...
  • 189. LRA Assign #1 Assigning to 53 (cl=GENERAL_REGS, orig=43, freq=2, tfirst=53, tfreq=2)... Trying 0: spill 52(freq=2) Now best 0(cost=0) Trying 1: spill 51(freq=3) Trying 2: spill 50(freq=4) Trying 3: spill 44(freq=2) t(r45/r3) = z(r53/X) + y(r44/r3) sum(r51/r1) = sum(r51/r1) + t(r45/r3) 189 Spill r52(hr=0, freq=2) for r53 Assign 0 to reload r53 (freq=2) n(r48/r0) = n($r0) x(r42/r1) = n(r8/r0) * 2 z(r43/M) = x(r42/r1) sum(r41/r1) = 0 i(r40/r2) = 0 i(r50/r2) = i(r40/r2) sum(r51/r1) = sum(r41/r1) n(r52/r0) = n(r48/r0) i(r50/r2) < n(r52/r0)? y(r44/r3) = i(r50/r2) z(r53/X) = z(r43/M) i(r50/r2) = i(r50/r2) + 1 sum(r41/r1) = sum(r51/r1) D.1385(r46/r1) = sum(r41/r1) <retval>(r47/r1) = D.1385(r46/r1) $r0 = <retval>(r47/r1) 分到$r0, 踢掉r52
  • 190. LRA Assign #1 Assigning to 53 (cl=GENERAL_REGS, orig=43, freq=2, tfirst=53, tfreq=2)... Trying 0: spill 52(freq=2) Now best 0(cost=0) Trying 1: spill 51(freq=3) Trying 2: spill 50(freq=4) Trying 3: spill 44(freq=2) t(r45/r3) = z(r53/r0) + y(r44/r3) sum(r51/r1) = sum(r51/r1) + t(r45/r3) 190 Spill r52(hr=0, freq=2) for r53 Assign 0 to reload r53 (freq=2) n(r48/r0) = n($r0) x(r42/r1) = n(r8/r0) * 2 z(r43/M) = x(r42/r1) sum(r41/r1) = 0 i(r40/r2) = 0 i(r50/r2) = i(r40/r2) sum(r51/r1) = sum(r41/r1) n(r52/M) = n(r48/r0) i(r50/r2) < n(r52/M)? y(r44/r3) = i(r50/r2) z(r53/r0) = z(r43/M) i(r50/r2) = i(r50/r2) + 1 sum(r41/r1) = sum(r51/r1) D.1385(r46/r1) = sum(r41/r1) <retval>(r47/r1) = D.1385(r46/r1) $r0 = <retval>(r47/r1) 分到$r0, 踢掉r52
  • 191. LRA New (and old) pseudo assignment 191 Remove Scratches Update virtual register displacements No Change Constraints: RTL transformations Inheritance/split transformations in EBB scope Undo inheritance for spilled pseudos Memory-Memory move coalesce Spilled pseudo to memory substitution Hard reg substitution, devirtalization and restoring scratches New pseudos or insns Start Finish
  • 192. LRA New (and old) pseudo assignment 192 Remove Scratches Update virtual register displacements No Change Constraints: RTL transformations Inheritance/split transformations in EBB scope Undo inheritance for spilled pseudos Memory-Memory move coalesce Spilled pseudo to memory substitution Hard reg substitution, devirtalization and restoring scratches New pseudos or insns Start Finish
  • 193. Memory-Memory move coalesce • 假設一道Move指令的來源與目的的 Allocno皆分配到Memory (Spill) 193 a(r53/Ma) = b(r43/Mb) ... a(r55/$r1) = a(r53/Ma) ... a(r55/$r1) = b(r43/Mb)
  • 194. LRA New (and old) pseudo assignment 194 Remove Scratches Update virtual register displacements No Change Constraints: RTL transformations Inheritance/split transformations in EBB scope Undo inheritance for spilled pseudos Memory-Memory move coalesce Spilled pseudo to memory substitution Hard reg substitution, devirtalization and restoring scratches New pseudos or insns Start Finish
  • 195. LRA New (and old) pseudo assignment 195 Remove Scratches Update virtual register displacements No Change Constraints: RTL transformations Inheritance/split transformations in EBB scope Undo inheritance for spilled pseudos Memory-Memory move coalesce Spilled pseudo to memory substitution Hard reg substitution, devirtalization and restoring scratches New pseudos or insns Start Finish
  • 196. LRA Constraint #2 t(r45/r3) = z(r53/r0) + y(r44/r3) sum(r51/r1) = sum(r51/r1) + t(r45/r3) 196 n(r48/r0) = n($r0) x(r42/r1) = n(r8/r0) * 2 z(r43/M) = x(r42/r1) sum(r41/r1) = 0 i(r40/r2) = 0 i(r50/r2) = i(r40/r2) sum(r51/r1) = sum(r41/r1) n(r52/M) = n(r48/r0) i(r50/r2) < n(r52/M)? y(r44/r3) = i(r50/r2) z(r53/r0) = z(r43/M) i(r50/r2) = i(r50/r2) + 1 sum(r41/r1) = sum(r51/r1) D.1385(r46/r1) = sum(r41/r1) <retval>(r47/r1) = D.1385(r46/r1) $r0 = <retval>(r47/r1) 比較指令只能Reg跟Reg比 不符合Constraint nds32中,比較指令只有 Reg 與 Reg相比 無 Reg 與 Mem 比
  • 197. LRA Constraint #2 t(r45/r3) = z(r53/r0) + y(r44/r3) sum(r51/r1) = sum(r51/r1) + t(r45/r3) 197 n(r48/r0) = n($r0) x(r42/r1) = n(r8/r0) * 2 z(r43/M) = x(r42/r1) sum(r41/r1) = 0 i(r40/r2) = 0 i(r50/r2) = i(r40/r2) sum(r51/r1) = sum(r41/r1) n(r52/M) = n(r48/r0) n(r54/X) = n(r52/M) i(r50/r2) < n(r54/X)? y(r44/r3) = i(r50/r2) z(r53/r0) = z(r43/M) i(r50/r2) = i(r50/r2) + 1 sum(r41/r1) = sum(r51/r1) D.1385(r46/r1) = sum(r41/r1) <retval>(r47/r1) = D.1385(r46/r1) $r0 = <retval>(r47/r1) 比較指令只能Reg跟Reg比 不符合Constraint
  • 198. LRA New (and old) pseudo assignment 198 Remove Scratches Update virtual register displacements No Change Constraints: RTL transformations Inheritance/split transformations in EBB scope Undo inheritance for spilled pseudos Memory-Memory move coalesce Spilled pseudo to memory substitution Hard reg substitution, devirtalization and restoring scratches New pseudos or insns Start Finish
  • 199. LRA New (and old) pseudo assignment 199 Remove Scratches Update virtual register displacements No Change Constraints: RTL transformations Inheritance/split transformations in EBB scope Undo inheritance for spilled pseudos Memory-Memory move coalesce Spilled pseudo to memory substitution Hard reg substitution, devirtalization and restoring scratches New pseudos or insns Start Finish
  • 200. LRA Assign #2 t(r45/r3) = z(r53/r0) + y(r44/r3) sum(r51/r1) = sum(r51/r1) + t(r45/r3) 200 n(r48/r0) = n($r0) x(r42/r1) = n(r8/r0) * 2 z(r43/M) = x(r42/r1) sum(r41/r1) = 0 i(r40/r2) = 0 i(r50/r2) = i(r40/r2) sum(r51/r1) = sum(r41/r1) n(r52/M) = n(r48/r0) n(r54/r0) = n(r52/M) i(r50/r2) < n(r54/r0)? y(r44/r3) = i(r50/r2) z(r53/r0) = z(r43/M) i(r50/r2) = i(r50/r2) + 1 sum(r41/r1) = sum(r51/r1) D.1385(r46/r1) = sum(r41/r1) <retval>(r47/r1) = D.1385(r46/r1) $r0 = <retval>(r47/r1) 很幸運直接撿到剛被 搶走的r0可以用
  • 201. LRA New (and old) pseudo assignment 201 Remove Scratches Update virtual register displacements No Change Constraints: RTL transformations Inheritance/split transformations in EBB scope Undo inheritance for spilled pseudos Memory-Memory move coalesce Spilled pseudo to memory substitution Hard reg substitution, devirtalization and restoring scratches New pseudos or insns Start Finish
  • 202. LRA New (and old) pseudo assignment 202 Remove Scratches Update virtual register displacements No Change Constraints: RTL transformations Inheritance/split transformations in EBB scope Undo inheritance for spilled pseudos Memory-Memory move coalesce Spilled pseudo to memory substitution Hard reg substitution, devirtalization and restoring scratches New pseudos or insns Start Finish
  • 203. LRA New (and old) pseudo assignment 203 Remove Scratches Update virtual register displacements No Change Constraints: RTL transformations Inheritance/split transformations in EBB scope Undo inheritance for spilled pseudos Memory-Memory move coalesce Spilled pseudo to memory substitution Hard reg substitution, devirtalization and restoring scratches New pseudos or insns Start Finish
  • 204. LRA New (and old) pseudo assignment 204 Remove Scratches Update virtual register displacements No Change Constraints: RTL transformations Inheritance/split transformations in EBB scope Undo inheritance for spilled pseudos Memory-Memory move coalesce Spilled pseudo to memory substitution Hard reg substitution, devirtalization and restoring scratches New pseudos or insns Start Finish
  • 205. LRA Constraint #3 t(r45/r3) = z(r53/r0) + y(r44/r3) sum(r51/r1) = sum(r51/r1) + t(r45/r3) 205 n(r48/r0) = n($r0) x(r42/r1) = n(r8/r0) * 2 z(r43/M) = x(r42/r1) sum(r41/r1) = 0 i(r40/r2) = 0 i(r50/r2) = i(r40/r2) sum(r51/r1) = sum(r41/r1) n(r52/M) = n(r48/r0) n(r54/r0) = n(r52/M) i(r50/r2) < n(r54/r0)? y(r44/r3) = i(r50/r2) z(r53/r0) = z(r43/M) i(r50/r2) = i(r50/r2) + 1 sum(r41/r1) = sum(r51/r1) D.1385(r46/r1) = sum(r41/r1) <retval>(r47/r1) = D.1385(r46/r1) $r0 = <retval>(r47/r1) 整個掃描過後發現所有指令 符合Constraint!
  • 206. LRA New (and old) pseudo assignment 206 Remove Scratches Update virtual register displacements No Change Constraints: RTL transformations Inheritance/split transformations in EBB scope Undo inheritance for spilled pseudos Memory-Memory move coalesce Spilled pseudo to memory substitution Hard reg substitution, devirtalization and restoring scratches New pseudos or insns Start Finish
  • 207. LRA Memory Substitution t(r45/r3) = z(r53/r0) + y(r44/r3) sum(r51/r1) = sum(r51/r1) + t(r45/r3) 207 n(r48/r0) = n($r0) x(r42/r1) = n(r8/r0) * 2 z([$sp + 0]) = x(r42/r1) sum(r41/r1) = 0 i(r40/r2) = 0 i(r50/r2) = i(r40/r2) sum(r51/r1) = sum(r41/r1) n([$sp + 4]) = n(r48/r0) n(r54/r0) = n([$sp + 4]) i(r50/r2) < n(r54/r0)? y(r44/r3) = i(r50/r2) z(r53/r0) = z([$sp + 0]) i(r50/r2) = i(r50/r2) + 1 sum(r41/r1) = sum(r51/r1) D.1385(r46/r1) = sum(r41/r1) <retval>(r47/r1) = D.1385(r46/r1) $r0 = <retval>(r47/r1) 將所有 Spill 的 Pseudo Register 替換成 Memory!
  • 208. LRA New (and old) pseudo assignment 208 Remove Scratches Update virtual register displacements No Change Constraints: RTL transformations Inheritance/split transformations in EBB scope Undo inheritance for spilled pseudos Memory-Memory move coalesce Spilled pseudo to memory substitution Hard reg substitution, devirtalization and restoring scratches New pseudos or insns Start Finish
  • 209. LRA Register Substitution 209 n(r0) = n($r0) x(r1) = n(r0) * 2 z([$sp + 0]) = x(r1) sum(r1) = 0 i(r2) = 0 i(r2) = i(r2) sum(r1) = sum(r1) n([$sp + 4]) = n(rr0) n(r0) = n([$sp + 4]) i(r2) < n(r0)? y(r3) = i(r2) z(r0) = z([$sp + 0]) t(r3) = z(r0) + y(r3) sum(r1) = sum(r1) + t(r3) i(r2) = i(r2) + 1 sum(r1) = sum(r1) D.1385(r1) = sum(r1) <retval>(r1) = D.1385(r1) $r0 = <retval>(r1) 將所有 Pseudo Register 替換成 Hard Register!
  • 210. 210 reload_completed != 0 在 gcc 中, 當reload_complete不等於0時 代表Register Allocation 完畢, 並且所有RTL都必須是Strict RTL 隨時都可以輸出成 Assembly Language!
  • 211. 總結 • IRA/LRA 雖然龐大, 但循著理論走會容易 理解很多 • RA在整個編譯階段末段, 但RA結果的優 劣對於效能與程式大小有相當大的影響 211
  • 212. 212