SlideShare a Scribd company logo
1 of 75
Download to read offline
Mediump support
in Mesa
Overview
• What is mediump?
• What does Mesa currently do?
• The plan
• Reducing conversion operations
• Changing types of variables
• Folding conversions
• Testing
• Code
• Questions?
What is mediump?
• Only in GLSL ES
• Available since the first version of GLSL ES.
• Used to tell the driver an operation in a shader can
be done with lower precision.
• Some hardware can take advantage of this to trade
off precision for speed.
• For example, an operation can be done with a 16-
bit float:
sign bit
exponent bits
fraction bits
largest number approximately 3 × 10³⁸
approximately 7 decimal digits of accuracy
32-bit float
sign bit
exponent bits
fraction bits
largest number 65504
approximately 3 decimal digits of accuracy
16-bit float
• GLSL ES has three available precisions:
• lowp, mediump and highp
• The spec specifies a minimum precision for each
of these.
• highp needs 16-bit fractional part.
• It will probably end up being a single-
precision float.
• mediump needs 10-bit fractional part.
• This can be represented as a half float.
• lowp has enough precision to store 8-bit colour
channels.
• The precision does not affect the visible storage of
a variable.
• For example a mediump float will still be stored
as 32-bit in a UBO.
• Only operations are affected.
• The precision requirements are only a minimum.
• Therefore a valid implementation could be to
just ignore the precision and do every operation
at highp.
• This is effectively what Mesa currently does.
• The precision for a variable can be specified
directly:
uniform mediump vec3 rect_color;
• Or it can be specified as a global default for each
type:
precision mediump float;
uniform vec3 rect_color;
• The compiler specifies global defaults for most
types except floats in the fragment shader.
• In GLSL ES 1.00 high precision support in fragment
shaders is optional.
• The precision of operands to an operation
determine the precision of the operation.
• Almost works like automatic float to double
promotion in C.
mediump float a, b;
highp float c = a * b;
• The precision of operands to an operation
determine the precision of the operation.
• Almost works like automatic float to double
promotion in C.
mediump float a, b;
highp float c = a * b;
This operation can be done
in mediump
All operands are mediump.
• The precision of operands to an operation
determine the precision of the operation.
• Almost works like automatic float to double
promotion in C.
mediump float a, b;
highp float c = a * b;
This operation can be done
in mediump
All operands are mediump.
precision of result
doesn’t matter
• Another example
mediump float a, b;
highp float c;
mediump float r = c * (a * b);
• Another example
mediump float a, b;
highp float c;
mediump float r = c * (a * b);
This operation can still
be done in mediump
• Another example
mediump float a, b;
highp float c;
mediump float r = c * (a * b);
This operation can still
be done in mediump
This outer operation
must be done at
highp
• Corner case
• Some things don’t have a precision, eg
constants.
mediump float diameter;
float circ = diameter * 3.141592;
• Corner case
• Some things don’t have a precision, eg
constants.
mediump float diameter;
float circ = diameter * 3.141592;
Constants have no precision
• Corner case
• Some things don’t have a precision, eg
constants.
mediump float diameter;
float circ = diameter * 3.141592;
Constants have no precision
Precision of multiplication
is mediump anyway
because one of the arguments
has a precision
• Extreme corner case
• Sometimes none of the operands have a
precision.
uniform bool should_pi;
mediump float result =
float(should_pi) * 3.141592;
• Extreme corner case
• Sometimes none of the operands have a
precision.
uniform bool should_pi;
mediump float result =
float(should_pi) * 3.141592;
Neither operand has a precision
• Extreme corner case
• Sometimes none of the operands have a
precision.
uniform bool should_pi;
mediump float result =
float(should_pi) * 3.141592;
Neither operand has a precision
Precision of operation can come from
outer expression, even the lvalue
of an assignment
What does Mesa
currently do?
• Mesa already has code to parse the precision
qualiers and store them in the IR tree.
• These currently aren’t used for anything except to
check for compile-time errors.
• For example redeclaring a variable with a
different precision.
• In desktop GL, the precision is always set to NONE.
• The precision usually doesn’t form part of the
glsl_type.
• Instead it is stored out-of-band as part of the
ir_variable.
enum {
GLSL_PRECISION_NONE = 0,
GLSL_PRECISION_HIGH,
GLSL_PRECISION_MEDIUM,
GLSL_PRECISION_LOW
};
class ir_variable : public ir_instruction {
/* … */
public:
struct ir_variable_data {
/* … */
/**
* Precision qualifier.
*
* In desktop GLSL we do not care about precision qualifiers at
* all, in fact, the spec says that precision qualifiers are
* ignored.
*
* To make things easy, we make it so that this field is always
* GLSL_PRECISION_NONE on desktop shaders. This way all the
* variables have the same precision value and the checks we add
* in the compiler for this field will never break a desktop
* shader compile.
*/
unsigned precision:2;
/* … */
};
};
• However this gets complicated for structs because
members can have their own precision.
uniform block {
mediump vec3 just_a_color;
highp mat4 important_matrix;
} things;
• In that case the precision does end up being part of
the glsl_type.
The plan
• The idea is to lower mediump operations to float16
types in NIR.
• We want to lower the actual operations instead of
the variables.
• This needs to be done at a high level in order to
implement the spec rules.
• Work being done by Hyunjun Ko and myself and
Igalia.
• Working on behalf of Google.
• Based on / inspired by patches by Topi Pohjolainen.
• Aiming specifically to make this work on the
Freedreno driver.
• Most of the work is reusable for any driver though.
• Currently this is done as a pass over the IR
representation.
uniform mediump float a, b;
void main()
{
gl_FragColor.r = a / b;
}
uniform mediump float a, b;
void main()
{
gl_FragColor.r = a / b;
}
These two variables
are mediump
uniform mediump float a, b;
void main()
{
gl_FragColor.r = a / b;
}
These two variables
are mediump
So this division can be
done at medium precision
• We only want to lower the division operation
without changing the type of the variables.
• The lowering pass will add a conversion to float16
around the variable dereferences and then add a
conversion back to float32 after the division.
• This minimises the modifications to the IR.
• IR tree before lowering pass
(assign (x) (var_ref gl_FragColor)
(swiz x (swiz xxxx (expression float /
(var_ref a)
(var_ref b)))))
• IR tree before lowering pass
(assign (x) (var_ref gl_FragColor)
(swiz x (swiz xxxx (expression float /
(var_ref a)
(var_ref b)))))
division operation
• IR tree before lowering pass
(assign (x) (var_ref gl_FragColor)
(swiz x (swiz xxxx (expression float /
(var_ref a)
(var_ref b)))))
division operation
type is
32-bit float
• Lowering pass finds sections of the tree involving
only mediump/lowp operations.
• Adds f2f16 conversion after variable derefs
• Adds f2f32 conversion at root of lowered branch
• IR tree after lowering pass
(assign (x) (var_ref gl_FragColor)
(expression float f162f
(swiz x (swiz xxxx
(expression float16_t /
(expression float16_t f2f16
(var_ref a))
(expression float16_t f2f16
(var_ref b)))))))
• IR tree after lowering pass
(assign (x) (var_ref gl_FragColor)
(expression float f162f
(swiz x (swiz xxxx
(expression float16_t /
(expression float16_t f2f16
(var_ref a))
(expression float16_t f2f16
(var_ref b)))))))
each var_ref is
converted to float16
• IR tree after lowering pass
(assign (x) (var_ref gl_FragColor)
(expression float f162f
(swiz x (swiz xxxx
(expression float16_t /
(expression float16_t f2f16
(var_ref a))
(expression float16_t f2f16
(var_ref b)))))))
division operation
is done in float16
• IR tree after lowering pass
(assign (x) (var_ref gl_FragColor)
(expression float f162f
(swiz x (swiz xxxx
(expression float16_t /
(expression float16_t f2f16
(var_ref a))
(expression float16_t f2f16
(var_ref b)))))))
Result is converted
back to float32
before storing in var
Reducing conversion
operations
• This will end up generating a lot of conversion
operations.
• Worse:
precision mediump float;
uniform mediump float a;
void main()
{
float scaled = a / 5.0;
gl_FragColor.r = scaled + 0.5;
}
• This will end up generating a lot of conversion
operations.
• Worse:
precision mediump float;
uniform mediump float a;
void main()
{
float scaled = a / 5.0;
gl_FragColor.r = scaled + 0.5;
}
operation will be done in mediump
then converted back to float32
to store in the variable
• This will end up generating a lot of conversion
operations.
• Worse:
precision mediump float;
uniform mediump float a;
void main()
{
float scaled = a / 5.0;
gl_FragColor.r = scaled + 0.5;
}
then the result will be
immediately converted back
to float16 for this operation
• Resulting NIR
vec1 32 ssa_1 = deref_var &a (uniform float)
vec1 32 ssa_2 = intrinsic load_deref (ssa_1)
vec1 16 ssa_3 = f2f16 ssa_2
vec1 16 ssa_6 = fdiv ssa_3, ssa_20
vec1 32 ssa_7 = f2f32 ssa_6
vec1 16 ssa_8 = f2f16 ssa_7
vec1 32 ssa_9 = f2f32 ssa_8
vec1 16 ssa_10 = f2f16 ssa_9
vec1 16 ssa_13 = fadd ssa_10, ssa_22
• Resulting NIR
vec1 32 ssa_1 = deref_var &a (uniform float)
vec1 32 ssa_2 = intrinsic load_deref (ssa_1)
vec1 16 ssa_3 = f2f16 ssa_2
vec1 16 ssa_6 = fdiv ssa_3, ssa_20
vec1 32 ssa_7 = f2f32 ssa_6
vec1 16 ssa_8 = f2f16 ssa_7
vec1 32 ssa_9 = f2f32 ssa_8
vec1 16 ssa_10 = f2f16 ssa_9
vec1 16 ssa_13 = fadd ssa_10, ssa_22
Lots of redundant
coversions!
• There is a NIR optimisation to remove redundant
conversions
• Only enabled for GLES because converting
f32→f16→f32 is not lossless
Changing types of
variables
• Normally we don’t want to change the type of
variables
• For example, this would break uniforms because
they are visible to the app
• Sometimes we can do it anyway though depending
on the hardware
• On Freedreno, we can change the type of the
fragment outputs if they are mediump.
• gl_FragColor is declared as mediump by default
• The variable type is not user-visible so it won’t
break the app.
• This removes a conversion.
• We have a specific pass for Freedreno to do this.
vec1 32 ssa_1 = load_const (0x00000000 /* 0.000000 */)
vec1 16 ssa_2 = intrinsic load_uniform (ssa_1) (0, 0, 0)
vec1 32 ssa_4 = load_const (0x00000001 /* 0.000000 */)
vec1 16 ssa_5 = intrinsic load_uniform (ssa_4) (0, 0, 0)
vec1 16 ssa_7 = frcp ssa_5
vec1 16 ssa_8 = fmul ssa_2, ssa_7
vec1 32 ssa_9 = f2f32 ssa_8
vec4 32 ssa_10 = vec4 ssa_9, ssa_0.y, ssa_0.z, ssa_0.w
intrinsic store_output (ssa_10, ssa_1) (0, 15, 0, 160)
vec1 32 ssa_1 = load_const (0x00000000 /* 0.000000 */)
vec1 16 ssa_2 = intrinsic load_uniform (ssa_1) (0, 0, 0)
vec1 32 ssa_4 = load_const (0x00000001 /* 0.000000 */)
vec1 16 ssa_5 = intrinsic load_uniform (ssa_4) (0, 0, 0)
vec1 16 ssa_7 = frcp ssa_5
vec1 16 ssa_8 = fmul ssa_2, ssa_7
vec1 32 ssa_9 = f2f32 ssa_8
vec4 32 ssa_10 = vec4 ssa_9, ssa_0.y, ssa_0.z, ssa_0.w
intrinsic store_output (ssa_10, ssa_1) (0, 15, 0, 144)
removes this conversion
vec1 32 ssa_1 = load_const (0x00000000 /* 0.000000 */)
vec1 16 ssa_2 = intrinsic load_uniform (ssa_1) (0, 0, 0)
vec1 32 ssa_4 = load_const (0x00000001 /* 0.000000 */)
vec1 16 ssa_5 = intrinsic load_uniform (ssa_4) (0, 0, 0)
vec1 16 ssa_7 = frcp ssa_5
vec1 16 ssa_8 = fmul ssa_2, ssa_7
vec4 16 ssa_10 = vec4 ssa_8, ssa_0.y, ssa_0.z, ssa_0.w
intrinsic store_output (ssa_10, ssa_1) (0, 15, 0, 160)
use 16-bit output directly
Folding conversions
• Consider this simple fragment shader
uniform highp float a, b;
void main()
{
gl_FragColor.r = a / b;
}
• Consider this simple fragment shader
uniform highp float a, b;
void main()
{
gl_FragColor.r = a / b;
}
operation is using highp
• Consider this simple fragment shader
uniform highp float a, b;
void main()
{
gl_FragColor.r = a / b;
}
gl_FragColor will converted
to a 16-bit output
• This can generate an IR3 disassembly like this:
mov.f32f32 r0.x, c0.y
(rpt5)nop
rcp r0.x, r0.x
(ss)mul.f r0.x, c0.x, r0.x
(rpt2)nop
cov.f32f16 hr0.x, r0.x
• This can generate an IR3 disassembly like this:
mov.f32f32 r0.x, c0.y
(rpt5)nop
rcp r0.x, r0.x
(ss)mul.f r0.x, c0.x, r0.x
(rpt2)nop
cov.f32f16 hr0.x, r0.x
32-bit float registers
for the multiplication
• This can generate an IR3 disassembly like this:
mov.f32f32 r0.x, c0.y
(rpt5)nop
rcp r0.x, r0.x
(ss)mul.f r0.x, c0.x, r0.x
(rpt2)nop
cov.f32f16 hr0.x, r0.x
result is converted
to half-float for output
• This last conversion shouldn’t be necessary.
• Adreno allows the destination register to have a
different size from the source registers.
• We can fold the conversion directly into the
multiplication.
• We have added a pass on the NIR that does this
folding.
• It requires changes the NIR validation to allow the
dest to have a different size.
• Only enabled for Freedreno.
vec1 32 ssa_1 = load_const (0x00000000 /* 0.000000 */)
vec1 32 ssa_2 = intrinsic load_uniform (ssa_1) (0, 0, 0)
vec1 32 ssa_3 = load_const (0x00000001 /* 0.000000 */)
vec1 32 ssa_4 = intrinsic load_uniform (ssa_3) (0, 0, 0)
vec1 32 ssa_5 = frcp ssa_4
vec1 32 ssa_6 = fmul ssa_2, ssa_5
vec1 16 ssa_7 = f2f16 ssa_6
vec4 16 ssa_8 = vec4 ssa_7, ssa_0.y, ssa_0.z, ssa_0.w
intrinsic store_output (ssa_8, ssa_1) (0, 15, 0, 144)
vec1 32 ssa_1 = load_const (0x00000000 /* 0.000000 */)
vec1 32 ssa_2 = intrinsic load_uniform (ssa_1) (0, 0, 0)
vec1 32 ssa_3 = load_const (0x00000001 /* 0.000000 */)
vec1 32 ssa_4 = intrinsic load_uniform (ssa_3) (0, 0, 0)
vec1 32 ssa_5 = frcp ssa_4
vec1 32 ssa_6 = fmul ssa_2, ssa_5
vec1 16 ssa_7 = f2f16 ssa_6
vec4 16 ssa_8 = vec4 ssa_7, ssa_0.y, ssa_0.z, ssa_0.w
intrinsic store_output (ssa_8, ssa_1) (0, 15, 0, 144)
remove this conversion
vec1 32 ssa_1 = load_const (0x00000000 /* 0.000000 */)
vec1 32 ssa_2 = intrinsic load_uniform (ssa_1) (0, 0, 0)
vec1 32 ssa_3 = load_const (0x00000001 /* 0.000000 */)
vec1 32 ssa_4 = intrinsic load_uniform (ssa_3) (0, 0, 0)
vec1 32 ssa_5 = frcp ssa_4
vec1 16 ssa_6 = fmul ssa_2, ssa_5
vec4 16 ssa_8 = vec4 ssa_6, ssa_0.y, ssa_0.z, ssa_0.w
intrinsic store_output (ssa_8, ssa_1) (0, 15, 0, 144)
vec1 32 ssa_1 = load_const (0x00000000 /* 0.000000 */)
vec1 32 ssa_2 = intrinsic load_uniform (ssa_1) (0, 0, 0)
vec1 32 ssa_3 = load_const (0x00000001 /* 0.000000 */)
vec1 32 ssa_4 = intrinsic load_uniform (ssa_3) (0, 0, 0)
vec1 32 ssa_5 = frcp ssa_4
vec1 16 ssa_6 = fmul ssa_2, ssa_5
vec4 16 ssa_8 = vec4 ssa_6, ssa_0.y, ssa_0.z, ssa_0.w
intrinsic store_output (ssa_8, ssa_1) (0, 15, 0, 144)
change destination type of multiplication
vec1 32 ssa_1 = load_const (0x00000000 /* 0.000000 */)
vec1 32 ssa_2 = intrinsic load_uniform (ssa_1) (0, 0, 0)
vec1 32 ssa_3 = load_const (0x00000001 /* 0.000000 */)
vec1 32 ssa_4 = intrinsic load_uniform (ssa_3) (0, 0, 0)
vec1 32 ssa_5 = frcp ssa_4
vec1 16 ssa_6 = fmul ssa_2, ssa_5
vec4 16 ssa_8 = vec4 ssa_6, ssa_0.y, ssa_0.z, ssa_0.w
intrinsic store_output (ssa_8, ssa_1) (0, 15, 0, 144)
source types
are still 32-bit
Testing
• We are writing Piglit tests that use mediump
• Most of them check that the result is less accurate
than if it was done at highp
• That way we catch regressions where we break the
lowering
• These tests couldn’t be merged into Piglit proper
because not lowering would be valid behaviour.
Code
• The code is at gitlab.freedesktop.org/zzoon on the
mediump branch
• There are also merge requests (1043, 1044, 1045).
• Piglit tests are at: https://github.com/Igalia/piglit/
• branch nroberts/wip/mediump-tests
Questions?

More Related Content

Similar to Mediump support in Mesa (XDC 2019)

Verilog Lecture2 thhts
Verilog Lecture2 thhtsVerilog Lecture2 thhts
Verilog Lecture2 thhtsBéo Tú
 
Slicing of Object-Oriented Programs
Slicing of Object-Oriented ProgramsSlicing of Object-Oriented Programs
Slicing of Object-Oriented ProgramsPraveen Penumathsa
 
Hpg2011 papers kazakov
Hpg2011 papers kazakovHpg2011 papers kazakov
Hpg2011 papers kazakovmistercteam
 
Slicing of Object-Oriented Programs
Slicing of Object-Oriented ProgramsSlicing of Object-Oriented Programs
Slicing of Object-Oriented ProgramsPraveen Penumathsa
 
Rethinking Attention with Performers
Rethinking Attention with PerformersRethinking Attention with Performers
Rethinking Attention with PerformersJoonhyung Lee
 
02 functions, variables, basic input and output of c++
02   functions, variables, basic input and output of c++02   functions, variables, basic input and output of c++
02 functions, variables, basic input and output of c++Manzoor ALam
 
Cranking Floating Point Performance To 11 On The iPhone
Cranking Floating Point Performance To 11 On The iPhoneCranking Floating Point Performance To 11 On The iPhone
Cranking Floating Point Performance To 11 On The iPhoneNoel Llopis
 
Mining quasi bicliques using giraph
Mining quasi bicliques using giraphMining quasi bicliques using giraph
Mining quasi bicliques using giraphHsiao-Fei Liu
 
To Loop or Not to Loop: Overcoming Roadblocks with FME
To Loop or Not to Loop: Overcoming Roadblocks with FMETo Loop or Not to Loop: Overcoming Roadblocks with FME
To Loop or Not to Loop: Overcoming Roadblocks with FMESafe Software
 
Fundamental of Information Technology - UNIT 6
Fundamental of Information Technology - UNIT 6Fundamental of Information Technology - UNIT 6
Fundamental of Information Technology - UNIT 6Shipra Swati
 
running stable diffusion on android
running stable diffusion on androidrunning stable diffusion on android
running stable diffusion on androidKoan-Sin Tan
 
GBM package in r
GBM package in rGBM package in r
GBM package in rmark_landry
 
Devel::NYTProf 2009-07 (OUTDATED, see 201008)
Devel::NYTProf 2009-07 (OUTDATED, see 201008)Devel::NYTProf 2009-07 (OUTDATED, see 201008)
Devel::NYTProf 2009-07 (OUTDATED, see 201008)Tim Bunce
 
Nagios Conference 2014 - Rob Seiwert - Graphing and Trend Prediction in Nagios
Nagios Conference 2014 - Rob Seiwert - Graphing and Trend Prediction in NagiosNagios Conference 2014 - Rob Seiwert - Graphing and Trend Prediction in Nagios
Nagios Conference 2014 - Rob Seiwert - Graphing and Trend Prediction in NagiosNagios
 
Functional Programming in Java
Functional Programming in JavaFunctional Programming in Java
Functional Programming in JavaJim Bethancourt
 

Similar to Mediump support in Mesa (XDC 2019) (20)

Verilog Lecture2 thhts
Verilog Lecture2 thhtsVerilog Lecture2 thhts
Verilog Lecture2 thhts
 
Rseminarp
RseminarpRseminarp
Rseminarp
 
Slicing of Object-Oriented Programs
Slicing of Object-Oriented ProgramsSlicing of Object-Oriented Programs
Slicing of Object-Oriented Programs
 
Hpg2011 papers kazakov
Hpg2011 papers kazakovHpg2011 papers kazakov
Hpg2011 papers kazakov
 
Slicing of Object-Oriented Programs
Slicing of Object-Oriented ProgramsSlicing of Object-Oriented Programs
Slicing of Object-Oriented Programs
 
Rethinking Attention with Performers
Rethinking Attention with PerformersRethinking Attention with Performers
Rethinking Attention with Performers
 
Should i Go there
Should i Go thereShould i Go there
Should i Go there
 
02 functions, variables, basic input and output of c++
02   functions, variables, basic input and output of c++02   functions, variables, basic input and output of c++
02 functions, variables, basic input and output of c++
 
C language
C languageC language
C language
 
Cranking Floating Point Performance To 11 On The iPhone
Cranking Floating Point Performance To 11 On The iPhoneCranking Floating Point Performance To 11 On The iPhone
Cranking Floating Point Performance To 11 On The iPhone
 
Mining quasi bicliques using giraph
Mining quasi bicliques using giraphMining quasi bicliques using giraph
Mining quasi bicliques using giraph
 
To Loop or Not to Loop: Overcoming Roadblocks with FME
To Loop or Not to Loop: Overcoming Roadblocks with FMETo Loop or Not to Loop: Overcoming Roadblocks with FME
To Loop or Not to Loop: Overcoming Roadblocks with FME
 
Fundamental of Information Technology - UNIT 6
Fundamental of Information Technology - UNIT 6Fundamental of Information Technology - UNIT 6
Fundamental of Information Technology - UNIT 6
 
Ndp Slides
Ndp SlidesNdp Slides
Ndp Slides
 
running stable diffusion on android
running stable diffusion on androidrunning stable diffusion on android
running stable diffusion on android
 
GBM package in r
GBM package in rGBM package in r
GBM package in r
 
Devel::NYTProf 2009-07 (OUTDATED, see 201008)
Devel::NYTProf 2009-07 (OUTDATED, see 201008)Devel::NYTProf 2009-07 (OUTDATED, see 201008)
Devel::NYTProf 2009-07 (OUTDATED, see 201008)
 
Nagios Conference 2014 - Rob Seiwert - Graphing and Trend Prediction in Nagios
Nagios Conference 2014 - Rob Seiwert - Graphing and Trend Prediction in NagiosNagios Conference 2014 - Rob Seiwert - Graphing and Trend Prediction in Nagios
Nagios Conference 2014 - Rob Seiwert - Graphing and Trend Prediction in Nagios
 
Lesson 5
Lesson 5Lesson 5
Lesson 5
 
Functional Programming in Java
Functional Programming in JavaFunctional Programming in Java
Functional Programming in Java
 

More from Igalia

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Building End-user Applications on Embedded Devices with WPE
Building End-user Applications on Embedded Devices with WPEBuilding End-user Applications on Embedded Devices with WPE
Building End-user Applications on Embedded Devices with WPEIgalia
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Automated Testing for Web-based Systems on Embedded Devices
Automated Testing for Web-based Systems on Embedded DevicesAutomated Testing for Web-based Systems on Embedded Devices
Automated Testing for Web-based Systems on Embedded DevicesIgalia
 
Embedding WPE WebKit - from Bring-up to Maintenance
Embedding WPE WebKit - from Bring-up to MaintenanceEmbedding WPE WebKit - from Bring-up to Maintenance
Embedding WPE WebKit - from Bring-up to MaintenanceIgalia
 
Optimizing Scheduler for Linux Gaming.pdf
Optimizing Scheduler for Linux Gaming.pdfOptimizing Scheduler for Linux Gaming.pdf
Optimizing Scheduler for Linux Gaming.pdfIgalia
 
Running JS via WASM faster with JIT
Running JS via WASM      faster with JITRunning JS via WASM      faster with JIT
Running JS via WASM faster with JITIgalia
 
To crash or not to crash: if you do, at least recover fast!
To crash or not to crash: if you do, at least recover fast!To crash or not to crash: if you do, at least recover fast!
To crash or not to crash: if you do, at least recover fast!Igalia
 
Implementing a Vulkan Video Encoder From Mesa to GStreamer
Implementing a Vulkan Video Encoder From Mesa to GStreamerImplementing a Vulkan Video Encoder From Mesa to GStreamer
Implementing a Vulkan Video Encoder From Mesa to GStreamerIgalia
 
8 Years of Open Drivers, including the State of Vulkan in Mesa
8 Years of Open Drivers, including the State of Vulkan in Mesa8 Years of Open Drivers, including the State of Vulkan in Mesa
8 Years of Open Drivers, including the State of Vulkan in MesaIgalia
 
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por IgaliaIntroducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por IgaliaIgalia
 
2023 in Chimera Linux
2023 in Chimera                    Linux2023 in Chimera                    Linux
2023 in Chimera LinuxIgalia
 
Building a Linux distro with LLVM
Building a Linux distro        with LLVMBuilding a Linux distro        with LLVM
Building a Linux distro with LLVMIgalia
 
turnip: Update on Open Source Vulkan Driver for Adreno GPUs
turnip: Update on Open Source Vulkan Driver for Adreno GPUsturnip: Update on Open Source Vulkan Driver for Adreno GPUs
turnip: Update on Open Source Vulkan Driver for Adreno GPUsIgalia
 
Graphics stack updates for Raspberry Pi devices
Graphics stack updates for Raspberry Pi devicesGraphics stack updates for Raspberry Pi devices
Graphics stack updates for Raspberry Pi devicesIgalia
 
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOSDelegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOSIgalia
 
MessageFormat: The future of i18n on the web
MessageFormat: The future of i18n on the webMessageFormat: The future of i18n on the web
MessageFormat: The future of i18n on the webIgalia
 
Replacing the geometry pipeline with mesh shaders
Replacing the geometry pipeline with mesh shadersReplacing the geometry pipeline with mesh shaders
Replacing the geometry pipeline with mesh shadersIgalia
 
I'm not an AMD expert, but...
I'm not an AMD expert, but...I'm not an AMD expert, but...
I'm not an AMD expert, but...Igalia
 
Status of Vulkan on Raspberry
Status of Vulkan on RaspberryStatus of Vulkan on Raspberry
Status of Vulkan on RaspberryIgalia
 

More from Igalia (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Building End-user Applications on Embedded Devices with WPE
Building End-user Applications on Embedded Devices with WPEBuilding End-user Applications on Embedded Devices with WPE
Building End-user Applications on Embedded Devices with WPE
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Automated Testing for Web-based Systems on Embedded Devices
Automated Testing for Web-based Systems on Embedded DevicesAutomated Testing for Web-based Systems on Embedded Devices
Automated Testing for Web-based Systems on Embedded Devices
 
Embedding WPE WebKit - from Bring-up to Maintenance
Embedding WPE WebKit - from Bring-up to MaintenanceEmbedding WPE WebKit - from Bring-up to Maintenance
Embedding WPE WebKit - from Bring-up to Maintenance
 
Optimizing Scheduler for Linux Gaming.pdf
Optimizing Scheduler for Linux Gaming.pdfOptimizing Scheduler for Linux Gaming.pdf
Optimizing Scheduler for Linux Gaming.pdf
 
Running JS via WASM faster with JIT
Running JS via WASM      faster with JITRunning JS via WASM      faster with JIT
Running JS via WASM faster with JIT
 
To crash or not to crash: if you do, at least recover fast!
To crash or not to crash: if you do, at least recover fast!To crash or not to crash: if you do, at least recover fast!
To crash or not to crash: if you do, at least recover fast!
 
Implementing a Vulkan Video Encoder From Mesa to GStreamer
Implementing a Vulkan Video Encoder From Mesa to GStreamerImplementing a Vulkan Video Encoder From Mesa to GStreamer
Implementing a Vulkan Video Encoder From Mesa to GStreamer
 
8 Years of Open Drivers, including the State of Vulkan in Mesa
8 Years of Open Drivers, including the State of Vulkan in Mesa8 Years of Open Drivers, including the State of Vulkan in Mesa
8 Years of Open Drivers, including the State of Vulkan in Mesa
 
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por IgaliaIntroducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
 
2023 in Chimera Linux
2023 in Chimera                    Linux2023 in Chimera                    Linux
2023 in Chimera Linux
 
Building a Linux distro with LLVM
Building a Linux distro        with LLVMBuilding a Linux distro        with LLVM
Building a Linux distro with LLVM
 
turnip: Update on Open Source Vulkan Driver for Adreno GPUs
turnip: Update on Open Source Vulkan Driver for Adreno GPUsturnip: Update on Open Source Vulkan Driver for Adreno GPUs
turnip: Update on Open Source Vulkan Driver for Adreno GPUs
 
Graphics stack updates for Raspberry Pi devices
Graphics stack updates for Raspberry Pi devicesGraphics stack updates for Raspberry Pi devices
Graphics stack updates for Raspberry Pi devices
 
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOSDelegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
 
MessageFormat: The future of i18n on the web
MessageFormat: The future of i18n on the webMessageFormat: The future of i18n on the web
MessageFormat: The future of i18n on the web
 
Replacing the geometry pipeline with mesh shaders
Replacing the geometry pipeline with mesh shadersReplacing the geometry pipeline with mesh shaders
Replacing the geometry pipeline with mesh shaders
 
I'm not an AMD expert, but...
I'm not an AMD expert, but...I'm not an AMD expert, but...
I'm not an AMD expert, but...
 
Status of Vulkan on Raspberry
Status of Vulkan on RaspberryStatus of Vulkan on Raspberry
Status of Vulkan on Raspberry
 

Recently uploaded

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 

Recently uploaded (20)

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 

Mediump support in Mesa (XDC 2019)

  • 2. Overview • What is mediump? • What does Mesa currently do? • The plan • Reducing conversion operations • Changing types of variables • Folding conversions • Testing • Code • Questions?
  • 4. • Only in GLSL ES • Available since the first version of GLSL ES. • Used to tell the driver an operation in a shader can be done with lower precision. • Some hardware can take advantage of this to trade off precision for speed.
  • 5. • For example, an operation can be done with a 16- bit float: sign bit exponent bits fraction bits largest number approximately 3 × 10³⁸ approximately 7 decimal digits of accuracy 32-bit float sign bit exponent bits fraction bits largest number 65504 approximately 3 decimal digits of accuracy 16-bit float
  • 6. • GLSL ES has three available precisions: • lowp, mediump and highp • The spec specifies a minimum precision for each of these. • highp needs 16-bit fractional part. • It will probably end up being a single- precision float. • mediump needs 10-bit fractional part. • This can be represented as a half float. • lowp has enough precision to store 8-bit colour channels.
  • 7. • The precision does not affect the visible storage of a variable. • For example a mediump float will still be stored as 32-bit in a UBO. • Only operations are affected. • The precision requirements are only a minimum. • Therefore a valid implementation could be to just ignore the precision and do every operation at highp. • This is effectively what Mesa currently does.
  • 8. • The precision for a variable can be specified directly: uniform mediump vec3 rect_color; • Or it can be specified as a global default for each type: precision mediump float; uniform vec3 rect_color;
  • 9. • The compiler specifies global defaults for most types except floats in the fragment shader. • In GLSL ES 1.00 high precision support in fragment shaders is optional.
  • 10. • The precision of operands to an operation determine the precision of the operation. • Almost works like automatic float to double promotion in C. mediump float a, b; highp float c = a * b;
  • 11. • The precision of operands to an operation determine the precision of the operation. • Almost works like automatic float to double promotion in C. mediump float a, b; highp float c = a * b; This operation can be done in mediump All operands are mediump.
  • 12. • The precision of operands to an operation determine the precision of the operation. • Almost works like automatic float to double promotion in C. mediump float a, b; highp float c = a * b; This operation can be done in mediump All operands are mediump. precision of result doesn’t matter
  • 13. • Another example mediump float a, b; highp float c; mediump float r = c * (a * b);
  • 14. • Another example mediump float a, b; highp float c; mediump float r = c * (a * b); This operation can still be done in mediump
  • 15. • Another example mediump float a, b; highp float c; mediump float r = c * (a * b); This operation can still be done in mediump This outer operation must be done at highp
  • 16. • Corner case • Some things don’t have a precision, eg constants. mediump float diameter; float circ = diameter * 3.141592;
  • 17. • Corner case • Some things don’t have a precision, eg constants. mediump float diameter; float circ = diameter * 3.141592; Constants have no precision
  • 18. • Corner case • Some things don’t have a precision, eg constants. mediump float diameter; float circ = diameter * 3.141592; Constants have no precision Precision of multiplication is mediump anyway because one of the arguments has a precision
  • 19. • Extreme corner case • Sometimes none of the operands have a precision. uniform bool should_pi; mediump float result = float(should_pi) * 3.141592;
  • 20. • Extreme corner case • Sometimes none of the operands have a precision. uniform bool should_pi; mediump float result = float(should_pi) * 3.141592; Neither operand has a precision
  • 21. • Extreme corner case • Sometimes none of the operands have a precision. uniform bool should_pi; mediump float result = float(should_pi) * 3.141592; Neither operand has a precision Precision of operation can come from outer expression, even the lvalue of an assignment
  • 23. • Mesa already has code to parse the precision qualiers and store them in the IR tree. • These currently aren’t used for anything except to check for compile-time errors. • For example redeclaring a variable with a different precision. • In desktop GL, the precision is always set to NONE.
  • 24. • The precision usually doesn’t form part of the glsl_type. • Instead it is stored out-of-band as part of the ir_variable.
  • 25. enum { GLSL_PRECISION_NONE = 0, GLSL_PRECISION_HIGH, GLSL_PRECISION_MEDIUM, GLSL_PRECISION_LOW };
  • 26. class ir_variable : public ir_instruction { /* … */ public: struct ir_variable_data { /* … */ /** * Precision qualifier. * * In desktop GLSL we do not care about precision qualifiers at * all, in fact, the spec says that precision qualifiers are * ignored. * * To make things easy, we make it so that this field is always * GLSL_PRECISION_NONE on desktop shaders. This way all the * variables have the same precision value and the checks we add * in the compiler for this field will never break a desktop * shader compile. */ unsigned precision:2; /* … */ }; };
  • 27. • However this gets complicated for structs because members can have their own precision. uniform block { mediump vec3 just_a_color; highp mat4 important_matrix; } things; • In that case the precision does end up being part of the glsl_type.
  • 29. • The idea is to lower mediump operations to float16 types in NIR. • We want to lower the actual operations instead of the variables. • This needs to be done at a high level in order to implement the spec rules.
  • 30. • Work being done by Hyunjun Ko and myself and Igalia. • Working on behalf of Google. • Based on / inspired by patches by Topi Pohjolainen.
  • 31. • Aiming specifically to make this work on the Freedreno driver. • Most of the work is reusable for any driver though. • Currently this is done as a pass over the IR representation.
  • 32. uniform mediump float a, b; void main() { gl_FragColor.r = a / b; }
  • 33. uniform mediump float a, b; void main() { gl_FragColor.r = a / b; } These two variables are mediump
  • 34. uniform mediump float a, b; void main() { gl_FragColor.r = a / b; } These two variables are mediump So this division can be done at medium precision
  • 35. • We only want to lower the division operation without changing the type of the variables. • The lowering pass will add a conversion to float16 around the variable dereferences and then add a conversion back to float32 after the division. • This minimises the modifications to the IR.
  • 36. • IR tree before lowering pass (assign (x) (var_ref gl_FragColor) (swiz x (swiz xxxx (expression float / (var_ref a) (var_ref b)))))
  • 37. • IR tree before lowering pass (assign (x) (var_ref gl_FragColor) (swiz x (swiz xxxx (expression float / (var_ref a) (var_ref b))))) division operation
  • 38. • IR tree before lowering pass (assign (x) (var_ref gl_FragColor) (swiz x (swiz xxxx (expression float / (var_ref a) (var_ref b))))) division operation type is 32-bit float
  • 39. • Lowering pass finds sections of the tree involving only mediump/lowp operations. • Adds f2f16 conversion after variable derefs • Adds f2f32 conversion at root of lowered branch
  • 40. • IR tree after lowering pass (assign (x) (var_ref gl_FragColor) (expression float f162f (swiz x (swiz xxxx (expression float16_t / (expression float16_t f2f16 (var_ref a)) (expression float16_t f2f16 (var_ref b)))))))
  • 41. • IR tree after lowering pass (assign (x) (var_ref gl_FragColor) (expression float f162f (swiz x (swiz xxxx (expression float16_t / (expression float16_t f2f16 (var_ref a)) (expression float16_t f2f16 (var_ref b))))))) each var_ref is converted to float16
  • 42. • IR tree after lowering pass (assign (x) (var_ref gl_FragColor) (expression float f162f (swiz x (swiz xxxx (expression float16_t / (expression float16_t f2f16 (var_ref a)) (expression float16_t f2f16 (var_ref b))))))) division operation is done in float16
  • 43. • IR tree after lowering pass (assign (x) (var_ref gl_FragColor) (expression float f162f (swiz x (swiz xxxx (expression float16_t / (expression float16_t f2f16 (var_ref a)) (expression float16_t f2f16 (var_ref b))))))) Result is converted back to float32 before storing in var
  • 45. • This will end up generating a lot of conversion operations. • Worse: precision mediump float; uniform mediump float a; void main() { float scaled = a / 5.0; gl_FragColor.r = scaled + 0.5; }
  • 46. • This will end up generating a lot of conversion operations. • Worse: precision mediump float; uniform mediump float a; void main() { float scaled = a / 5.0; gl_FragColor.r = scaled + 0.5; } operation will be done in mediump then converted back to float32 to store in the variable
  • 47. • This will end up generating a lot of conversion operations. • Worse: precision mediump float; uniform mediump float a; void main() { float scaled = a / 5.0; gl_FragColor.r = scaled + 0.5; } then the result will be immediately converted back to float16 for this operation
  • 48. • Resulting NIR vec1 32 ssa_1 = deref_var &a (uniform float) vec1 32 ssa_2 = intrinsic load_deref (ssa_1) vec1 16 ssa_3 = f2f16 ssa_2 vec1 16 ssa_6 = fdiv ssa_3, ssa_20 vec1 32 ssa_7 = f2f32 ssa_6 vec1 16 ssa_8 = f2f16 ssa_7 vec1 32 ssa_9 = f2f32 ssa_8 vec1 16 ssa_10 = f2f16 ssa_9 vec1 16 ssa_13 = fadd ssa_10, ssa_22
  • 49. • Resulting NIR vec1 32 ssa_1 = deref_var &a (uniform float) vec1 32 ssa_2 = intrinsic load_deref (ssa_1) vec1 16 ssa_3 = f2f16 ssa_2 vec1 16 ssa_6 = fdiv ssa_3, ssa_20 vec1 32 ssa_7 = f2f32 ssa_6 vec1 16 ssa_8 = f2f16 ssa_7 vec1 32 ssa_9 = f2f32 ssa_8 vec1 16 ssa_10 = f2f16 ssa_9 vec1 16 ssa_13 = fadd ssa_10, ssa_22 Lots of redundant coversions!
  • 50. • There is a NIR optimisation to remove redundant conversions • Only enabled for GLES because converting f32→f16→f32 is not lossless
  • 52. • Normally we don’t want to change the type of variables • For example, this would break uniforms because they are visible to the app • Sometimes we can do it anyway though depending on the hardware
  • 53. • On Freedreno, we can change the type of the fragment outputs if they are mediump. • gl_FragColor is declared as mediump by default • The variable type is not user-visible so it won’t break the app. • This removes a conversion. • We have a specific pass for Freedreno to do this.
  • 54. vec1 32 ssa_1 = load_const (0x00000000 /* 0.000000 */) vec1 16 ssa_2 = intrinsic load_uniform (ssa_1) (0, 0, 0) vec1 32 ssa_4 = load_const (0x00000001 /* 0.000000 */) vec1 16 ssa_5 = intrinsic load_uniform (ssa_4) (0, 0, 0) vec1 16 ssa_7 = frcp ssa_5 vec1 16 ssa_8 = fmul ssa_2, ssa_7 vec1 32 ssa_9 = f2f32 ssa_8 vec4 32 ssa_10 = vec4 ssa_9, ssa_0.y, ssa_0.z, ssa_0.w intrinsic store_output (ssa_10, ssa_1) (0, 15, 0, 160)
  • 55. vec1 32 ssa_1 = load_const (0x00000000 /* 0.000000 */) vec1 16 ssa_2 = intrinsic load_uniform (ssa_1) (0, 0, 0) vec1 32 ssa_4 = load_const (0x00000001 /* 0.000000 */) vec1 16 ssa_5 = intrinsic load_uniform (ssa_4) (0, 0, 0) vec1 16 ssa_7 = frcp ssa_5 vec1 16 ssa_8 = fmul ssa_2, ssa_7 vec1 32 ssa_9 = f2f32 ssa_8 vec4 32 ssa_10 = vec4 ssa_9, ssa_0.y, ssa_0.z, ssa_0.w intrinsic store_output (ssa_10, ssa_1) (0, 15, 0, 144) removes this conversion
  • 56. vec1 32 ssa_1 = load_const (0x00000000 /* 0.000000 */) vec1 16 ssa_2 = intrinsic load_uniform (ssa_1) (0, 0, 0) vec1 32 ssa_4 = load_const (0x00000001 /* 0.000000 */) vec1 16 ssa_5 = intrinsic load_uniform (ssa_4) (0, 0, 0) vec1 16 ssa_7 = frcp ssa_5 vec1 16 ssa_8 = fmul ssa_2, ssa_7 vec4 16 ssa_10 = vec4 ssa_8, ssa_0.y, ssa_0.z, ssa_0.w intrinsic store_output (ssa_10, ssa_1) (0, 15, 0, 160) use 16-bit output directly
  • 58. • Consider this simple fragment shader uniform highp float a, b; void main() { gl_FragColor.r = a / b; }
  • 59. • Consider this simple fragment shader uniform highp float a, b; void main() { gl_FragColor.r = a / b; } operation is using highp
  • 60. • Consider this simple fragment shader uniform highp float a, b; void main() { gl_FragColor.r = a / b; } gl_FragColor will converted to a 16-bit output
  • 61. • This can generate an IR3 disassembly like this: mov.f32f32 r0.x, c0.y (rpt5)nop rcp r0.x, r0.x (ss)mul.f r0.x, c0.x, r0.x (rpt2)nop cov.f32f16 hr0.x, r0.x
  • 62. • This can generate an IR3 disassembly like this: mov.f32f32 r0.x, c0.y (rpt5)nop rcp r0.x, r0.x (ss)mul.f r0.x, c0.x, r0.x (rpt2)nop cov.f32f16 hr0.x, r0.x 32-bit float registers for the multiplication
  • 63. • This can generate an IR3 disassembly like this: mov.f32f32 r0.x, c0.y (rpt5)nop rcp r0.x, r0.x (ss)mul.f r0.x, c0.x, r0.x (rpt2)nop cov.f32f16 hr0.x, r0.x result is converted to half-float for output
  • 64. • This last conversion shouldn’t be necessary. • Adreno allows the destination register to have a different size from the source registers. • We can fold the conversion directly into the multiplication.
  • 65. • We have added a pass on the NIR that does this folding. • It requires changes the NIR validation to allow the dest to have a different size. • Only enabled for Freedreno.
  • 66. vec1 32 ssa_1 = load_const (0x00000000 /* 0.000000 */) vec1 32 ssa_2 = intrinsic load_uniform (ssa_1) (0, 0, 0) vec1 32 ssa_3 = load_const (0x00000001 /* 0.000000 */) vec1 32 ssa_4 = intrinsic load_uniform (ssa_3) (0, 0, 0) vec1 32 ssa_5 = frcp ssa_4 vec1 32 ssa_6 = fmul ssa_2, ssa_5 vec1 16 ssa_7 = f2f16 ssa_6 vec4 16 ssa_8 = vec4 ssa_7, ssa_0.y, ssa_0.z, ssa_0.w intrinsic store_output (ssa_8, ssa_1) (0, 15, 0, 144)
  • 67. vec1 32 ssa_1 = load_const (0x00000000 /* 0.000000 */) vec1 32 ssa_2 = intrinsic load_uniform (ssa_1) (0, 0, 0) vec1 32 ssa_3 = load_const (0x00000001 /* 0.000000 */) vec1 32 ssa_4 = intrinsic load_uniform (ssa_3) (0, 0, 0) vec1 32 ssa_5 = frcp ssa_4 vec1 32 ssa_6 = fmul ssa_2, ssa_5 vec1 16 ssa_7 = f2f16 ssa_6 vec4 16 ssa_8 = vec4 ssa_7, ssa_0.y, ssa_0.z, ssa_0.w intrinsic store_output (ssa_8, ssa_1) (0, 15, 0, 144) remove this conversion
  • 68. vec1 32 ssa_1 = load_const (0x00000000 /* 0.000000 */) vec1 32 ssa_2 = intrinsic load_uniform (ssa_1) (0, 0, 0) vec1 32 ssa_3 = load_const (0x00000001 /* 0.000000 */) vec1 32 ssa_4 = intrinsic load_uniform (ssa_3) (0, 0, 0) vec1 32 ssa_5 = frcp ssa_4 vec1 16 ssa_6 = fmul ssa_2, ssa_5 vec4 16 ssa_8 = vec4 ssa_6, ssa_0.y, ssa_0.z, ssa_0.w intrinsic store_output (ssa_8, ssa_1) (0, 15, 0, 144)
  • 69. vec1 32 ssa_1 = load_const (0x00000000 /* 0.000000 */) vec1 32 ssa_2 = intrinsic load_uniform (ssa_1) (0, 0, 0) vec1 32 ssa_3 = load_const (0x00000001 /* 0.000000 */) vec1 32 ssa_4 = intrinsic load_uniform (ssa_3) (0, 0, 0) vec1 32 ssa_5 = frcp ssa_4 vec1 16 ssa_6 = fmul ssa_2, ssa_5 vec4 16 ssa_8 = vec4 ssa_6, ssa_0.y, ssa_0.z, ssa_0.w intrinsic store_output (ssa_8, ssa_1) (0, 15, 0, 144) change destination type of multiplication
  • 70. vec1 32 ssa_1 = load_const (0x00000000 /* 0.000000 */) vec1 32 ssa_2 = intrinsic load_uniform (ssa_1) (0, 0, 0) vec1 32 ssa_3 = load_const (0x00000001 /* 0.000000 */) vec1 32 ssa_4 = intrinsic load_uniform (ssa_3) (0, 0, 0) vec1 32 ssa_5 = frcp ssa_4 vec1 16 ssa_6 = fmul ssa_2, ssa_5 vec4 16 ssa_8 = vec4 ssa_6, ssa_0.y, ssa_0.z, ssa_0.w intrinsic store_output (ssa_8, ssa_1) (0, 15, 0, 144) source types are still 32-bit
  • 72. • We are writing Piglit tests that use mediump • Most of them check that the result is less accurate than if it was done at highp • That way we catch regressions where we break the lowering • These tests couldn’t be merged into Piglit proper because not lowering would be valid behaviour.
  • 73. Code
  • 74. • The code is at gitlab.freedesktop.org/zzoon on the mediump branch • There are also merge requests (1043, 1044, 1045). • Piglit tests are at: https://github.com/Igalia/piglit/ • branch nroberts/wip/mediump-tests