SlideShare a Scribd company logo
Presented by
Date
Event
HKG15-405: Redundant
zero/sign-extension
elimination in GCC
Kugan Vivekanandarajah
Linaro Toolchain Working Group
February 12, 2015
Linaro Connect HKG15
Overview
● What is zero/sign extension
● Why GCC generates zero/sign extensions
● How GCC minimizes redundant zero/sign extensions
● Shortcomings in current approach
● Using value range (VR) to remove redundant
extensions
● Register promotion / widening computations to remove
redundant extensions
● Conclusion
Zero/sign extension operation
● zero-extend extends unsigned values to wider mode
● sign-extend extends signed values to wider mode
Zero/sign extension instructions
in AArch32 and AArch64
● Explicit extend instructions
○ SXTB Sign-extend byte/UXTB Unsigned extend byte
○ SXTH Sign-extend halfword/UXTH Unsigned extend halfword
● Extend operations as part of other instructions
○ The extended register instructions provide an optional sign-extension
or zero-extension (in AArch64)
■ ADD X1, X2, W3, UXTB #2
○ Modern CPUs may take an extra cycle to perform an ALU-with-shift
type operation
● Load and extend are also combined
○ ldb/ldh effectively does extension
Why extensions are needed
● Extensions are part of ABI
○ aapcs32 : "A Fundamental Data Type that is smaller than 4 bytes is
zero-or sign-extended to a word and returned in r0"
○ aapcs64 : "When an argument is assigned to a register any unused
bits in the register have unspecified value."
char foo (char val1, char
val2)
{
return val1 + val2;
}
aarch32:
foo:
add r0, r0, r1
uxtb r0, r0
bx lr
aarch64:
foo:
uxtb w1, w1
add w0, w1, w0, uxtb
ret
Why extensions are needed
● Conversion between types
○ ABI difference between aarch32 and aarch64 means extension is in
different places for this simple example
char foo (int ch)
{
return ch;
}
int bar (char ch)
{
return ch;
}
aarch32:
foo:
uxtb r0, r0
bx lr
bar:
bx lr
aarch64:
foo:
ret
bar:
uxtb w0, w0
ret
Redundant extensions in GCC
● SUBREG and extensions are generated while
converting from tree to RTL
○ Type conversions
○ Some of the extensions may be redundant
● Subsequent passes remove/optimize redundant
extensions
○ combine - combines it to a single instruction
○ ree - redundant extension elimination pass
○ RTX simplifications
■ With SRP_SIGNED and SRP_UNSIGNED
Combine example (AArch64)
● Combines zero/sign-extension with the use if target has support
● Not zero/sign-extension specific but helps here
● Works only within basic-blocks
void bar (int *ch,
char a)
{
*ch = *ch + a;
}
3: r77:SI=zero_extend(x1:QI)
8: r78:SI=r79:SI+r77:SI
8: r78:SI=zero_extend(x1:QI)+r79:SI
add w1, w2, w1, uxtb
Combines to
ree pass (Enabled for AArch64)
● Moves extension to definition of extension src
○ If md supports such instruction
● Runs after register allocation
○ Does not help in reducing register pressure
● Combines reaching definition
○ Not reaching single use
○ ABI mandated extension of parameters for aarch64 cannot be
combined (i.e. extension of parameter followed by use expression)
11: x19:HI=x0:HI
13: x1:DI=x20:DI
14: x0:DI=x21:DI
15: x0:HI=call [`calc_func'] argc:0
19: x19:SI=sign_extend(x19:HI)
11: x19:SI=sign_extend(x0:HI)
13: x1:DI=x20:DI
14: x0:DI=x21:DI
15: x0:HI=call [`calc_func'] argc:0
Shortcomings in current
approach
● Challenges in removing redundant extensions in RTL
○ Combine with other patterns (to generate single instruction) has its
limits
○ Simplification of SUBREG does not have all the information in RTL
■ Relies on SRP_SIGNED and SRP_UNSIGNED
■ Sign information is lost in RTL making it hard to remove some
redundant extensions
● What can we do?
○ Provide additional information to RTL
○ Change the tree such that extensions are minimized when converting
to RTL
Using value range to remove
redundant extensions
● Value range is used to annotate RTL SUBREG with
SRP_SIGNED_AND_UNSIGNED
○ Extension at RTL level will be redundant and optimized away when
this is true
● Example
○ _6 = (short int) _5;
○ Value range of _6: [3, 10]
(insn 13 12 0 (set (reg:SI 73 [ D.2640 ])
(sign_extend:SI (subreg:HI (reg:SI 81) 0))) t.c:6 -1
(nil))
○ Value range of _6 means reg:SI 81 is already sign and zero extended
(SRP_SIGNED_AND_UNSIGNED)
Example
short foo (unsigned char c)
{
c = c & (unsigned char)
0x0F;
if (c > 7)
return ((short)(c - 5));
else
return ((short)c);
}
foo:
and w0, w0, 15
cmp w0, 7
bhi .L5
sxth w0, w0
ret
.p2align 3
.L5:
sub w0, w0, #5
sxth w0, w0
ret
foo:
and w0, w0, 15
cmp w0, 7
bhi .L5
ret
.p2align 3
.L5:
sub w0, w0, #5
ret
VRP Dump @ tree level RTL expansion using VRP Normal RTL Expansion
short foo (unsigned char c)
{
c = c & (unsigned char)
0x0F;
if (c > 7)
return ((short)(c - 5));
else
return ((short)c);
}
c_3: [0, 15]
_4: [8, 15]
_6: [3, 10]
_7: [0, 7]
<bb 2>:
c_3 = c_2(D) & 15;
if (c_3 > 7)
goto <bb 3>;
else
goto <bb 4>;
<bb 3>:
_4 = (unsigned short) c_3;
_5 = _4 + 65531;
_6 = (short int) _5;
goto <bb 5>;
<bb 4>:
_7 = (short int) c_3;
<bb 5>:
# _1 = PHI <_6(3), _7(4)>
return _1;
2:r78:SI=zero_extend(x0:QI)
6: r79:SI=r78:SI&0xf
7: r74:SI=r79:SI
8: cc:CC=cmp(r74:SI,0x7)
9: pc={(leu(cc:CC,0))?L16:pc}
REG_BR_PROB 6100
11: r80:HI=r74:SI#0
12: r81:SI=r80:HI#0-0x5
13: r73:SI=r81:SI
14: pc=L19
15: barrier
16: L16:
18: r73:SI=r74:SI
19: L19:
20: 21: r77:HI=r73:SI#0
25: x0:HI=r77:HI
26: use x0:HI
r78:SI=zero_extend(x0:QI)
6: r79:SI=r78:SI&0xf
7:r74:SI=zero_extend(r79:SI#0)
8:cc:CC=cmp(r74:SI,0x7)
9:pc={(leu(cc:CC,0))?L16:pc}
REG_BR_PROB 6100
11: r80:HI=r74:SI#0
12: r81:SI=r80:HI#0-0x5
13:r73:SI=sign_extend(r81:SI#0)
14: pc=L19
15: barrier
16: L16:
18:r73:SI=sign_extend(r74:SI#0)
19: L19:
20: r77:HI=r73:SI#0
25: x0:HI=r77:HI
26: use x0:HI
Challenges
● Value ranges propagated are for the type of SSA_NAME
○ If the operation can WRAP, higher bits in PROMOTE_MODE can be
unpredictable
○ Additional WRAP attribute is needed
○ Example (on alpha where register width is 64 bit and _344 is 32 bit
unsigned variable):
_343 = ivtmp.179_52 + 2147483645; [0x80000004, 0x800000043]
_344 = _343 * 2; [0x8, 0x86]
_345 = (integer(kind=4)) _344; [0x8, 0x86]
● Value ranges propagated are pessimistic at times
○ Unlike our case, most value range based optimizations are done in
VRP pass itself and have more precise data
Register promotion in tree to
remove redundant extensions
● Promote operations to word mode such that SUBREG
and extensions are minimized in RTL
● Fix-up (conversions) are still needed
○ to preserve the semantics of the program
;; c_3 = c_2(D) & 15; // char type
(insn 6 5 7 (set (reg:SI 116)
(and:SI (reg/v:SI 115 [ c ])
(const_int 15 [0xf]))) t.c:3 -1
(nil))
(insn 7 6 0 (set (reg/v:SI 111 [ c ])
(zero_extend:SI (subreg:QI (reg:SI 116) 0))) t.c:3 -1 (nil))
;; c_3 = c_2(D) & 15; // int type
(insn 6 5 0 (set (reg/v:SI 111 [ c ])
(and:SI (reg/v:SI 115 [ c ])
(const_int 15 [0xf]))) t.c:3 -1
(nil))
short foo (unsigned char c)
{
c = c & (unsigned char)0x0F;
if (c > 7)
return ((short)(c - 5));
else
return ((short)c);
}
foo (unsigned char c)
{
short int _1;
unsigned short _4;
unsigned short _5;
short int _6;
short int _7;
<bb 2>:
c_3 = c_2(D) & 15;
if (c_3 > 7)
goto <bb 3>;
else
goto <bb 4>;
<bb 3>:
_4 = (unsigned short) c_3;
_5 = _4 + 65531;
_6 = (short int) _5;
goto <bb 5>;
<bb 4>:
_7 = (short int) c_3;
<bb 5>:
# _1 = PHI <_6(3), _7(4)>
return _1;
}
foo (unsigned int c)
{
int _5, _6, _7;
unsigned int _4, _9, _10;
short int _11;
<bb 2>:
c_3 = c_2(D) & 15;
if (c_3 > 7)
goto <bb 3>;
else
goto <bb 4>;
<bb 3>:
_10 = (unsigned int) c_3;
_4 = _10 + 4294967291;
_9 = _4 & 65535;
_5 = (int) _9;
goto <bb 5>;
<bb 4>:
_6 = (int) c_3;
<bb 5>:
# _7 = PHI <_5(3), _6(4)>
_11 = (short int) _7;
return _11;
}
Example
short foo (unsigned char c)
{
c = c & (unsigned char)
0x0F;
if (c > 7)
return ((short)(c - 5));
else
return ((short)c);
}
foo:
and w0, w0, 15
cmp w0, 7
bhi .L5
sxth w0, w0
ret
.p2align 3
.L5:
sub w0, w0, #5
sxth w0, w0
ret
foo:
and w0, w0, 15
cmp w0, 7
bhi .L5
ret
.p2align 3
.L5:
sub w0, w0, #5
ret
Challenges
● Inserting fix-ups and conversions to preserve
semantics of the program
● Making subsequent passes optimize
redundancies
Conclusion
● Redundant zero/sign extension results in performance
and code size penalty
○ Excess instructions
○ Increased register pressure
● Removing redundant zero/sign extension is a challenge
and has to be approached from multiple levels.
● Patches for the discussed optimizations are currently
being reviewed/upstreamed.
HKG15-405: Redundant zero/sign-extension elimination in GCC

More Related Content

What's hot

IVUS y OCT: Técnicas de Diagnóstico Intracoronario
IVUS y OCT: Técnicas de Diagnóstico IntracoronarioIVUS y OCT: Técnicas de Diagnóstico Intracoronario
IVUS y OCT: Técnicas de Diagnóstico Intracoronario
CardioTeca
 
Enfermedad arterial de miembros inferiores
Enfermedad arterial de miembros inferioresEnfermedad arterial de miembros inferiores
Enfermedad arterial de miembros inferiores
Dannia Robles
 
Aneurisma de la Aorta Abdominal
Aneurisma de la Aorta AbdominalAneurisma de la Aorta Abdominal
Aneurisma de la Aorta Abdominal
Verónica Pérez
 
Los Mitos a derribar sobre al acceso Radial
Los Mitos a derribar sobre al acceso RadialLos Mitos a derribar sobre al acceso Radial
Los Mitos a derribar sobre al acceso Radial
Sociedad Latinoamericana de Cardiología Intervencionista
 
(22-12-22) Más allá de la elevación del segmento ST (Doc).docx
(22-12-22) Más allá de la elevación del segmento ST (Doc).docx(22-12-22) Más allá de la elevación del segmento ST (Doc).docx
(22-12-22) Más allá de la elevación del segmento ST (Doc).docx
UDMAFyC SECTOR ZARAGOZA II
 
Sepsis y shock septico
Sepsis y shock septicoSepsis y shock septico
Sepsis y shock septicoResidentes1hun
 
choque cardiogenico revisado.pptx
choque cardiogenico revisado.pptxchoque cardiogenico revisado.pptx
choque cardiogenico revisado.pptx
Gerardo Corona Burgos
 
Esclerosis sistemica
Esclerosis sistemicaEsclerosis sistemica
Esclerosis sistemicamedicina ICB
 
Cor pulmonale
Cor pulmonaleCor pulmonale
Cor pulmonale
Alejandro Mayor
 
Choque
ChoqueChoque
VENDAJE JONES.pdf
VENDAJE JONES.pdfVENDAJE JONES.pdf
VENDAJE JONES.pdf
PaolaReyes210175
 
Insuficiencia Respiratoria Aguda & Cronica
Insuficiencia Respiratoria Aguda & CronicaInsuficiencia Respiratoria Aguda & Cronica
Insuficiencia Respiratoria Aguda & Cronica
Eduardo Ricardo Cano Luján
 
ULCERAS EN MIEMBROS INFERIORES
ULCERAS EN MIEMBROS INFERIORESULCERAS EN MIEMBROS INFERIORES
ULCERAS EN MIEMBROS INFERIORES
Katrina Carrillo
 
Insuficiencia arterial periférica
Insuficiencia arterial periféricaInsuficiencia arterial periférica
Insuficiencia arterial periférica
Alonso Custodio
 
Urgencias - Trastornos del sodio y el cloro
Urgencias - Trastornos del sodio y el cloroUrgencias - Trastornos del sodio y el cloro
Urgencias - Trastornos del sodio y el cloro
Killiam Alberto Argote Araméndiz
 
Hipertensión Arterial Refractaria
Hipertensión Arterial RefractariaHipertensión Arterial Refractaria
Hipertensión Arterial Refractaria
Edwin Daniel Maldonado Domínguez
 
Edema agudo pulmonar
Edema agudo pulmonarEdema agudo pulmonar
Edema agudo pulmonar
genosa
 

What's hot (20)

IVUS y OCT: Técnicas de Diagnóstico Intracoronario
IVUS y OCT: Técnicas de Diagnóstico IntracoronarioIVUS y OCT: Técnicas de Diagnóstico Intracoronario
IVUS y OCT: Técnicas de Diagnóstico Intracoronario
 
Ruggiero saturacion venosa mixta y central
Ruggiero saturacion venosa mixta y centralRuggiero saturacion venosa mixta y central
Ruggiero saturacion venosa mixta y central
 
Enfermedad arterial de miembros inferiores
Enfermedad arterial de miembros inferioresEnfermedad arterial de miembros inferiores
Enfermedad arterial de miembros inferiores
 
Aneurisma de la Aorta Abdominal
Aneurisma de la Aorta AbdominalAneurisma de la Aorta Abdominal
Aneurisma de la Aorta Abdominal
 
Los Mitos a derribar sobre al acceso Radial
Los Mitos a derribar sobre al acceso RadialLos Mitos a derribar sobre al acceso Radial
Los Mitos a derribar sobre al acceso Radial
 
(22-12-22) Más allá de la elevación del segmento ST (Doc).docx
(22-12-22) Más allá de la elevación del segmento ST (Doc).docx(22-12-22) Más allá de la elevación del segmento ST (Doc).docx
(22-12-22) Más allá de la elevación del segmento ST (Doc).docx
 
Amparo iraola siadh
Amparo iraola siadhAmparo iraola siadh
Amparo iraola siadh
 
Sepsis y shock septico
Sepsis y shock septicoSepsis y shock septico
Sepsis y shock septico
 
choque cardiogenico revisado.pptx
choque cardiogenico revisado.pptxchoque cardiogenico revisado.pptx
choque cardiogenico revisado.pptx
 
Esclerosis sistemica
Esclerosis sistemicaEsclerosis sistemica
Esclerosis sistemica
 
Cor pulmonale
Cor pulmonaleCor pulmonale
Cor pulmonale
 
Choque
ChoqueChoque
Choque
 
Iam psf
Iam psf Iam psf
Iam psf
 
VENDAJE JONES.pdf
VENDAJE JONES.pdfVENDAJE JONES.pdf
VENDAJE JONES.pdf
 
Insuficiencia Respiratoria Aguda & Cronica
Insuficiencia Respiratoria Aguda & CronicaInsuficiencia Respiratoria Aguda & Cronica
Insuficiencia Respiratoria Aguda & Cronica
 
ULCERAS EN MIEMBROS INFERIORES
ULCERAS EN MIEMBROS INFERIORESULCERAS EN MIEMBROS INFERIORES
ULCERAS EN MIEMBROS INFERIORES
 
Insuficiencia arterial periférica
Insuficiencia arterial periféricaInsuficiencia arterial periférica
Insuficiencia arterial periférica
 
Urgencias - Trastornos del sodio y el cloro
Urgencias - Trastornos del sodio y el cloroUrgencias - Trastornos del sodio y el cloro
Urgencias - Trastornos del sodio y el cloro
 
Hipertensión Arterial Refractaria
Hipertensión Arterial RefractariaHipertensión Arterial Refractaria
Hipertensión Arterial Refractaria
 
Edema agudo pulmonar
Edema agudo pulmonarEdema agudo pulmonar
Edema agudo pulmonar
 

Viewers also liked

Chapter 02 Data Types
Chapter 02   Data TypesChapter 02   Data Types
Chapter 02 Data TypesNathan Yeung
 
ARM-KVM: Weather Report
ARM-KVM: Weather ReportARM-KVM: Weather Report
ARM-KVM: Weather Report
Samsung Open Source Group
 
2010 11 psa montreal explanation and fundamentalism
2010 11 psa montreal explanation and fundamentalism2010 11 psa montreal explanation and fundamentalism
2010 11 psa montreal explanation and fundamentalismIoan Muntean
 
BKK16-304 The State of GDB on AArch64
BKK16-304 The State of GDB on AArch64BKK16-304 The State of GDB on AArch64
BKK16-304 The State of GDB on AArch64
Linaro
 
LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...
LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...
LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...
Linaro
 
20141111_SOS3_Gallo
20141111_SOS3_Gallo20141111_SOS3_Gallo
20141111_SOS3_GalloAndrea Gallo
 
BKK16-305B ILP32 Performance on AArch64
BKK16-305B ILP32 Performance on AArch64BKK16-305B ILP32 Performance on AArch64
BKK16-305B ILP32 Performance on AArch64
Linaro
 
BKK16-504 Running Linux in EL2 Virtualization
BKK16-504 Running Linux in EL2 VirtualizationBKK16-504 Running Linux in EL2 Virtualization
BKK16-504 Running Linux in EL2 Virtualization
Linaro
 
HKG15-400: Next steps in KVM enablement on ARM
HKG15-400: Next steps in KVM enablement on ARMHKG15-400: Next steps in KVM enablement on ARM
HKG15-400: Next steps in KVM enablement on ARM
Linaro
 
LCE12: LCE12 ARMv8 Plenary
LCE12: LCE12 ARMv8 PlenaryLCE12: LCE12 ARMv8 Plenary
LCE12: LCE12 ARMv8 Plenary
Linaro
 
Dave Gilbert - KVM and QEMU
Dave Gilbert - KVM and QEMUDave Gilbert - KVM and QEMU
Dave Gilbert - KVM and QEMU
Danny Abukalam
 
Linux on ARM 64-bit Architecture
Linux on ARM 64-bit ArchitectureLinux on ARM 64-bit Architecture
Linux on ARM 64-bit Architecture
Ryo Jin
 
GCC for ARMv8 Aarch64
GCC for ARMv8 Aarch64GCC for ARMv8 Aarch64
GCC for ARMv8 Aarch64
Yi-Hsiu Hsu
 
LAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
LAS16-500: The Rise and Fall of Assembler and the VGIC from HellLAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
LAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
Linaro
 
Introduction to armv8 aarch64
Introduction to armv8 aarch64Introduction to armv8 aarch64
Introduction to armv8 aarch64
Yi-Hsiu Hsu
 
Tree of quantum_mechanics2
Tree of quantum_mechanics2Tree of quantum_mechanics2
Tree of quantum_mechanics2thambaji
 
The britannica guide to relativity and quantum mechanics (physics explained)
The britannica guide to relativity and quantum mechanics (physics explained) The britannica guide to relativity and quantum mechanics (physics explained)
The britannica guide to relativity and quantum mechanics (physics explained) أحمد عبد القادر
 
SFO15-407: Performance Overhead of ARM Virtualization
SFO15-407: Performance Overhead of ARM VirtualizationSFO15-407: Performance Overhead of ARM Virtualization
SFO15-407: Performance Overhead of ARM Virtualization
Linaro
 
LAS16-200: SCMI - System Management and Control Interface
LAS16-200:  SCMI - System Management and Control InterfaceLAS16-200:  SCMI - System Management and Control Interface
LAS16-200: SCMI - System Management and Control Interface
Linaro
 

Viewers also liked (20)

Chapter 02 Data Types
Chapter 02   Data TypesChapter 02   Data Types
Chapter 02 Data Types
 
05 multiply divide
05 multiply divide05 multiply divide
05 multiply divide
 
ARM-KVM: Weather Report
ARM-KVM: Weather ReportARM-KVM: Weather Report
ARM-KVM: Weather Report
 
2010 11 psa montreal explanation and fundamentalism
2010 11 psa montreal explanation and fundamentalism2010 11 psa montreal explanation and fundamentalism
2010 11 psa montreal explanation and fundamentalism
 
BKK16-304 The State of GDB on AArch64
BKK16-304 The State of GDB on AArch64BKK16-304 The State of GDB on AArch64
BKK16-304 The State of GDB on AArch64
 
LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...
LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...
LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...
 
20141111_SOS3_Gallo
20141111_SOS3_Gallo20141111_SOS3_Gallo
20141111_SOS3_Gallo
 
BKK16-305B ILP32 Performance on AArch64
BKK16-305B ILP32 Performance on AArch64BKK16-305B ILP32 Performance on AArch64
BKK16-305B ILP32 Performance on AArch64
 
BKK16-504 Running Linux in EL2 Virtualization
BKK16-504 Running Linux in EL2 VirtualizationBKK16-504 Running Linux in EL2 Virtualization
BKK16-504 Running Linux in EL2 Virtualization
 
HKG15-400: Next steps in KVM enablement on ARM
HKG15-400: Next steps in KVM enablement on ARMHKG15-400: Next steps in KVM enablement on ARM
HKG15-400: Next steps in KVM enablement on ARM
 
LCE12: LCE12 ARMv8 Plenary
LCE12: LCE12 ARMv8 PlenaryLCE12: LCE12 ARMv8 Plenary
LCE12: LCE12 ARMv8 Plenary
 
Dave Gilbert - KVM and QEMU
Dave Gilbert - KVM and QEMUDave Gilbert - KVM and QEMU
Dave Gilbert - KVM and QEMU
 
Linux on ARM 64-bit Architecture
Linux on ARM 64-bit ArchitectureLinux on ARM 64-bit Architecture
Linux on ARM 64-bit Architecture
 
GCC for ARMv8 Aarch64
GCC for ARMv8 Aarch64GCC for ARMv8 Aarch64
GCC for ARMv8 Aarch64
 
LAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
LAS16-500: The Rise and Fall of Assembler and the VGIC from HellLAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
LAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
 
Introduction to armv8 aarch64
Introduction to armv8 aarch64Introduction to armv8 aarch64
Introduction to armv8 aarch64
 
Tree of quantum_mechanics2
Tree of quantum_mechanics2Tree of quantum_mechanics2
Tree of quantum_mechanics2
 
The britannica guide to relativity and quantum mechanics (physics explained)
The britannica guide to relativity and quantum mechanics (physics explained) The britannica guide to relativity and quantum mechanics (physics explained)
The britannica guide to relativity and quantum mechanics (physics explained)
 
SFO15-407: Performance Overhead of ARM Virtualization
SFO15-407: Performance Overhead of ARM VirtualizationSFO15-407: Performance Overhead of ARM Virtualization
SFO15-407: Performance Overhead of ARM Virtualization
 
LAS16-200: SCMI - System Management and Control Interface
LAS16-200:  SCMI - System Management and Control InterfaceLAS16-200:  SCMI - System Management and Control Interface
LAS16-200: SCMI - System Management and Control Interface
 

Similar to HKG15-405: Redundant zero/sign-extension elimination in GCC

BKK16-503 Undefined Behavior and Compiler Optimizations – Why Your Program St...
BKK16-503 Undefined Behavior and Compiler Optimizations – Why Your Program St...BKK16-503 Undefined Behavior and Compiler Optimizations – Why Your Program St...
BKK16-503 Undefined Behavior and Compiler Optimizations – Why Your Program St...
Linaro
 
15CS44 MP & MC Module 2
15CS44 MP & MC Module  215CS44 MP & MC Module  2
15CS44 MP & MC Module 2
RLJIT
 
5 - Advanced SVE.pdf
5 - Advanced SVE.pdf5 - Advanced SVE.pdf
5 - Advanced SVE.pdf
JunZhao68
 
ARM instruction set
ARM instruction  setARM instruction  set
ARM instruction set
Karthik Vivek
 
8086-instruction-set-ppt
 8086-instruction-set-ppt 8086-instruction-set-ppt
8086-instruction-set-pptjemimajerome
 
Vechicle accident prevention using eye bilnk sensor ppt
Vechicle accident prevention using eye bilnk sensor pptVechicle accident prevention using eye bilnk sensor ppt
Vechicle accident prevention using eye bilnk sensor pptsatish 486
 
RISC-V Zce Extension
RISC-V Zce ExtensionRISC-V Zce Extension
RISC-V Zce Extension
RISC-V International
 
15CS44 MP & MC module 5
15CS44 MP & MC  module 515CS44 MP & MC  module 5
15CS44 MP & MC module 5
RLJIT
 
optimization c code on blackfin
optimization c code on blackfinoptimization c code on blackfin
optimization c code on blackfin
Pantech ProLabs India Pvt Ltd
 
Basic computer organization design
Basic computer organization designBasic computer organization design
Basic computer organization design
ndasharath
 
ARM 64bit has come!
ARM 64bit has come!ARM 64bit has come!
ARM 64bit has come!
Tetsuyuki Kobayashi
 
EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5PRADEEP
 
Boosting Developer Productivity with Clang
Boosting Developer Productivity with ClangBoosting Developer Productivity with Clang
Boosting Developer Productivity with Clang
Samsung Open Source Group
 
Lecture 03 Arithmetic Group of Instructions
Lecture 03 Arithmetic Group of InstructionsLecture 03 Arithmetic Group of Instructions
Lecture 03 Arithmetic Group of Instructions
Zeeshan Ahmed
 
ARM instruction set
ARM instruction  setARM instruction  set
ARM instruction set
Karthik Vivek
 
LAS16-501: Introduction to LLVM - Projects, Components, Integration, Internals
LAS16-501: Introduction to LLVM - Projects, Components, Integration, InternalsLAS16-501: Introduction to LLVM - Projects, Components, Integration, Internals
LAS16-501: Introduction to LLVM - Projects, Components, Integration, Internals
Linaro
 
Pragmatic Optimization in Modern Programming - Mastering Compiler Optimizations
Pragmatic Optimization in Modern Programming - Mastering Compiler OptimizationsPragmatic Optimization in Modern Programming - Mastering Compiler Optimizations
Pragmatic Optimization in Modern Programming - Mastering Compiler Optimizations
Marina Kolpakova
 
Exploring the x64
Exploring the x64Exploring the x64
Exploring the x64FFRI, Inc.
 

Similar to HKG15-405: Redundant zero/sign-extension elimination in GCC (20)

BKK16-503 Undefined Behavior and Compiler Optimizations – Why Your Program St...
BKK16-503 Undefined Behavior and Compiler Optimizations – Why Your Program St...BKK16-503 Undefined Behavior and Compiler Optimizations – Why Your Program St...
BKK16-503 Undefined Behavior and Compiler Optimizations – Why Your Program St...
 
15CS44 MP & MC Module 2
15CS44 MP & MC Module  215CS44 MP & MC Module  2
15CS44 MP & MC Module 2
 
OptimizingARM
OptimizingARMOptimizingARM
OptimizingARM
 
5 - Advanced SVE.pdf
5 - Advanced SVE.pdf5 - Advanced SVE.pdf
5 - Advanced SVE.pdf
 
ARM instruction set
ARM instruction  setARM instruction  set
ARM instruction set
 
8086-instruction-set-ppt
 8086-instruction-set-ppt 8086-instruction-set-ppt
8086-instruction-set-ppt
 
Vechicle accident prevention using eye bilnk sensor ppt
Vechicle accident prevention using eye bilnk sensor pptVechicle accident prevention using eye bilnk sensor ppt
Vechicle accident prevention using eye bilnk sensor ppt
 
RISC-V Zce Extension
RISC-V Zce ExtensionRISC-V Zce Extension
RISC-V Zce Extension
 
15CS44 MP & MC module 5
15CS44 MP & MC  module 515CS44 MP & MC  module 5
15CS44 MP & MC module 5
 
optimization c code on blackfin
optimization c code on blackfinoptimization c code on blackfin
optimization c code on blackfin
 
Basic computer organization design
Basic computer organization designBasic computer organization design
Basic computer organization design
 
ARM 64bit has come!
ARM 64bit has come!ARM 64bit has come!
ARM 64bit has come!
 
EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5
 
Boosting Developer Productivity with Clang
Boosting Developer Productivity with ClangBoosting Developer Productivity with Clang
Boosting Developer Productivity with Clang
 
Lecture 03 Arithmetic Group of Instructions
Lecture 03 Arithmetic Group of InstructionsLecture 03 Arithmetic Group of Instructions
Lecture 03 Arithmetic Group of Instructions
 
ARM instruction set
ARM instruction  setARM instruction  set
ARM instruction set
 
LAS16-501: Introduction to LLVM - Projects, Components, Integration, Internals
LAS16-501: Introduction to LLVM - Projects, Components, Integration, InternalsLAS16-501: Introduction to LLVM - Projects, Components, Integration, Internals
LAS16-501: Introduction to LLVM - Projects, Components, Integration, Internals
 
FINISHED_CODE
FINISHED_CODEFINISHED_CODE
FINISHED_CODE
 
Pragmatic Optimization in Modern Programming - Mastering Compiler Optimizations
Pragmatic Optimization in Modern Programming - Mastering Compiler OptimizationsPragmatic Optimization in Modern Programming - Mastering Compiler Optimizations
Pragmatic Optimization in Modern Programming - Mastering Compiler Optimizations
 
Exploring the x64
Exploring the x64Exploring the x64
Exploring the x64
 

More from Linaro

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Linaro
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Linaro
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Linaro
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qa
Linaro
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
Linaro
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
Linaro
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
Linaro
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Linaro
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Linaro
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
Linaro
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
Linaro
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
Linaro
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
Linaro
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
Linaro
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
Linaro
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
Linaro
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
Linaro
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
Linaro
 

More from Linaro (20)

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qa
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
 

Recently uploaded

Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
Ortus Solutions, Corp
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
WSO2
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
Jelle | Nordend
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Anthony Dahanne
 
Visitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.appVisitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.app
NaapbooksPrivateLimi
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
KrzysztofKkol1
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2
 
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Hivelance Technology
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
Tendenci - The Open Source AMS (Association Management Software)
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
Tier1 app
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 

Recently uploaded (20)

Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 
Visitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.appVisitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.app
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 

HKG15-405: Redundant zero/sign-extension elimination in GCC

  • 1. Presented by Date Event HKG15-405: Redundant zero/sign-extension elimination in GCC Kugan Vivekanandarajah Linaro Toolchain Working Group February 12, 2015 Linaro Connect HKG15
  • 2. Overview ● What is zero/sign extension ● Why GCC generates zero/sign extensions ● How GCC minimizes redundant zero/sign extensions ● Shortcomings in current approach ● Using value range (VR) to remove redundant extensions ● Register promotion / widening computations to remove redundant extensions ● Conclusion
  • 3. Zero/sign extension operation ● zero-extend extends unsigned values to wider mode ● sign-extend extends signed values to wider mode
  • 4. Zero/sign extension instructions in AArch32 and AArch64 ● Explicit extend instructions ○ SXTB Sign-extend byte/UXTB Unsigned extend byte ○ SXTH Sign-extend halfword/UXTH Unsigned extend halfword ● Extend operations as part of other instructions ○ The extended register instructions provide an optional sign-extension or zero-extension (in AArch64) ■ ADD X1, X2, W3, UXTB #2 ○ Modern CPUs may take an extra cycle to perform an ALU-with-shift type operation ● Load and extend are also combined ○ ldb/ldh effectively does extension
  • 5. Why extensions are needed ● Extensions are part of ABI ○ aapcs32 : "A Fundamental Data Type that is smaller than 4 bytes is zero-or sign-extended to a word and returned in r0" ○ aapcs64 : "When an argument is assigned to a register any unused bits in the register have unspecified value." char foo (char val1, char val2) { return val1 + val2; } aarch32: foo: add r0, r0, r1 uxtb r0, r0 bx lr aarch64: foo: uxtb w1, w1 add w0, w1, w0, uxtb ret
  • 6. Why extensions are needed ● Conversion between types ○ ABI difference between aarch32 and aarch64 means extension is in different places for this simple example char foo (int ch) { return ch; } int bar (char ch) { return ch; } aarch32: foo: uxtb r0, r0 bx lr bar: bx lr aarch64: foo: ret bar: uxtb w0, w0 ret
  • 7. Redundant extensions in GCC ● SUBREG and extensions are generated while converting from tree to RTL ○ Type conversions ○ Some of the extensions may be redundant ● Subsequent passes remove/optimize redundant extensions ○ combine - combines it to a single instruction ○ ree - redundant extension elimination pass ○ RTX simplifications ■ With SRP_SIGNED and SRP_UNSIGNED
  • 8. Combine example (AArch64) ● Combines zero/sign-extension with the use if target has support ● Not zero/sign-extension specific but helps here ● Works only within basic-blocks void bar (int *ch, char a) { *ch = *ch + a; } 3: r77:SI=zero_extend(x1:QI) 8: r78:SI=r79:SI+r77:SI 8: r78:SI=zero_extend(x1:QI)+r79:SI add w1, w2, w1, uxtb Combines to
  • 9. ree pass (Enabled for AArch64) ● Moves extension to definition of extension src ○ If md supports such instruction ● Runs after register allocation ○ Does not help in reducing register pressure ● Combines reaching definition ○ Not reaching single use ○ ABI mandated extension of parameters for aarch64 cannot be combined (i.e. extension of parameter followed by use expression) 11: x19:HI=x0:HI 13: x1:DI=x20:DI 14: x0:DI=x21:DI 15: x0:HI=call [`calc_func'] argc:0 19: x19:SI=sign_extend(x19:HI) 11: x19:SI=sign_extend(x0:HI) 13: x1:DI=x20:DI 14: x0:DI=x21:DI 15: x0:HI=call [`calc_func'] argc:0
  • 10. Shortcomings in current approach ● Challenges in removing redundant extensions in RTL ○ Combine with other patterns (to generate single instruction) has its limits ○ Simplification of SUBREG does not have all the information in RTL ■ Relies on SRP_SIGNED and SRP_UNSIGNED ■ Sign information is lost in RTL making it hard to remove some redundant extensions ● What can we do? ○ Provide additional information to RTL ○ Change the tree such that extensions are minimized when converting to RTL
  • 11. Using value range to remove redundant extensions ● Value range is used to annotate RTL SUBREG with SRP_SIGNED_AND_UNSIGNED ○ Extension at RTL level will be redundant and optimized away when this is true ● Example ○ _6 = (short int) _5; ○ Value range of _6: [3, 10] (insn 13 12 0 (set (reg:SI 73 [ D.2640 ]) (sign_extend:SI (subreg:HI (reg:SI 81) 0))) t.c:6 -1 (nil)) ○ Value range of _6 means reg:SI 81 is already sign and zero extended (SRP_SIGNED_AND_UNSIGNED)
  • 12. Example short foo (unsigned char c) { c = c & (unsigned char) 0x0F; if (c > 7) return ((short)(c - 5)); else return ((short)c); } foo: and w0, w0, 15 cmp w0, 7 bhi .L5 sxth w0, w0 ret .p2align 3 .L5: sub w0, w0, #5 sxth w0, w0 ret foo: and w0, w0, 15 cmp w0, 7 bhi .L5 ret .p2align 3 .L5: sub w0, w0, #5 ret
  • 13. VRP Dump @ tree level RTL expansion using VRP Normal RTL Expansion short foo (unsigned char c) { c = c & (unsigned char) 0x0F; if (c > 7) return ((short)(c - 5)); else return ((short)c); } c_3: [0, 15] _4: [8, 15] _6: [3, 10] _7: [0, 7] <bb 2>: c_3 = c_2(D) & 15; if (c_3 > 7) goto <bb 3>; else goto <bb 4>; <bb 3>: _4 = (unsigned short) c_3; _5 = _4 + 65531; _6 = (short int) _5; goto <bb 5>; <bb 4>: _7 = (short int) c_3; <bb 5>: # _1 = PHI <_6(3), _7(4)> return _1; 2:r78:SI=zero_extend(x0:QI) 6: r79:SI=r78:SI&0xf 7: r74:SI=r79:SI 8: cc:CC=cmp(r74:SI,0x7) 9: pc={(leu(cc:CC,0))?L16:pc} REG_BR_PROB 6100 11: r80:HI=r74:SI#0 12: r81:SI=r80:HI#0-0x5 13: r73:SI=r81:SI 14: pc=L19 15: barrier 16: L16: 18: r73:SI=r74:SI 19: L19: 20: 21: r77:HI=r73:SI#0 25: x0:HI=r77:HI 26: use x0:HI r78:SI=zero_extend(x0:QI) 6: r79:SI=r78:SI&0xf 7:r74:SI=zero_extend(r79:SI#0) 8:cc:CC=cmp(r74:SI,0x7) 9:pc={(leu(cc:CC,0))?L16:pc} REG_BR_PROB 6100 11: r80:HI=r74:SI#0 12: r81:SI=r80:HI#0-0x5 13:r73:SI=sign_extend(r81:SI#0) 14: pc=L19 15: barrier 16: L16: 18:r73:SI=sign_extend(r74:SI#0) 19: L19: 20: r77:HI=r73:SI#0 25: x0:HI=r77:HI 26: use x0:HI
  • 14. Challenges ● Value ranges propagated are for the type of SSA_NAME ○ If the operation can WRAP, higher bits in PROMOTE_MODE can be unpredictable ○ Additional WRAP attribute is needed ○ Example (on alpha where register width is 64 bit and _344 is 32 bit unsigned variable): _343 = ivtmp.179_52 + 2147483645; [0x80000004, 0x800000043] _344 = _343 * 2; [0x8, 0x86] _345 = (integer(kind=4)) _344; [0x8, 0x86] ● Value ranges propagated are pessimistic at times ○ Unlike our case, most value range based optimizations are done in VRP pass itself and have more precise data
  • 15. Register promotion in tree to remove redundant extensions ● Promote operations to word mode such that SUBREG and extensions are minimized in RTL ● Fix-up (conversions) are still needed ○ to preserve the semantics of the program ;; c_3 = c_2(D) & 15; // char type (insn 6 5 7 (set (reg:SI 116) (and:SI (reg/v:SI 115 [ c ]) (const_int 15 [0xf]))) t.c:3 -1 (nil)) (insn 7 6 0 (set (reg/v:SI 111 [ c ]) (zero_extend:SI (subreg:QI (reg:SI 116) 0))) t.c:3 -1 (nil)) ;; c_3 = c_2(D) & 15; // int type (insn 6 5 0 (set (reg/v:SI 111 [ c ]) (and:SI (reg/v:SI 115 [ c ]) (const_int 15 [0xf]))) t.c:3 -1 (nil))
  • 16. short foo (unsigned char c) { c = c & (unsigned char)0x0F; if (c > 7) return ((short)(c - 5)); else return ((short)c); } foo (unsigned char c) { short int _1; unsigned short _4; unsigned short _5; short int _6; short int _7; <bb 2>: c_3 = c_2(D) & 15; if (c_3 > 7) goto <bb 3>; else goto <bb 4>; <bb 3>: _4 = (unsigned short) c_3; _5 = _4 + 65531; _6 = (short int) _5; goto <bb 5>; <bb 4>: _7 = (short int) c_3; <bb 5>: # _1 = PHI <_6(3), _7(4)> return _1; } foo (unsigned int c) { int _5, _6, _7; unsigned int _4, _9, _10; short int _11; <bb 2>: c_3 = c_2(D) & 15; if (c_3 > 7) goto <bb 3>; else goto <bb 4>; <bb 3>: _10 = (unsigned int) c_3; _4 = _10 + 4294967291; _9 = _4 & 65535; _5 = (int) _9; goto <bb 5>; <bb 4>: _6 = (int) c_3; <bb 5>: # _7 = PHI <_5(3), _6(4)> _11 = (short int) _7; return _11; }
  • 17. Example short foo (unsigned char c) { c = c & (unsigned char) 0x0F; if (c > 7) return ((short)(c - 5)); else return ((short)c); } foo: and w0, w0, 15 cmp w0, 7 bhi .L5 sxth w0, w0 ret .p2align 3 .L5: sub w0, w0, #5 sxth w0, w0 ret foo: and w0, w0, 15 cmp w0, 7 bhi .L5 ret .p2align 3 .L5: sub w0, w0, #5 ret
  • 18. Challenges ● Inserting fix-ups and conversions to preserve semantics of the program ● Making subsequent passes optimize redundancies
  • 19. Conclusion ● Redundant zero/sign extension results in performance and code size penalty ○ Excess instructions ○ Increased register pressure ● Removing redundant zero/sign extension is a challenge and has to be approached from multiple levels. ● Patches for the discussed optimizations are currently being reviewed/upstreamed.