SPU Assisted Rendering

/* * SPU Assisted Rendering. */Steven Tovey & Stephen McAuleyGraphics Programmers, Bizarre Creations Ltd.steven.tovey@bizarrecreations.comstephen.mcauley@bizarrecreations.comhttp://www.bizarrecreations.com

/* Welcome! */We have some copies of Blur to give away, stick around and fill out your evaluation sheets! Part I (w/ Steven Tovey):What is SPU Assisted Rendering?Case StudiesCar Damage

Part II (w/ Stephen McAuley):Fragment ShadingParallelisationCase StudyPre-pass Lighting on SPUsQuestions/* Agenda */

/* * Part I w/ Steven Tovey */SPU Acceleration of Car Rendering in Blur

Assisting RSX™ with the SPUs (der!)

Why do this?Free up RSX™ to do other things.Enable otherwise unfeasible techniques.Optimise rendering./* What is SPU AR? I */

Original Xenon implementation:

2xVTF (volume & 2D) for damage.

Large amount of work in vertex shader, making cars in Blur heavily vertex-bound.

All lighting in pixel shader./* Case Study: Cars I */

Loose fitting damage volume:/* Case Study: Cars II */

Control points:/* Case Study: Cars III */

Morph targets:/* Case Study: Cars IV */

Scratch/dent textures:/* Case Study: Cars IV */

Increase rendering speed of cars.

Maintain same quality./* Case Study: Cars VI */

Work split between GPU/SPU./* Damage: Solution */

SPU-modified damage vertex data.

One-to-one mapping of vertices.

Crude approximation of volume preservation.

Dent/scratch blend levels./* Damage: Data I */

/* Damage: Data II */Stream0Stream1PositionSPU_PositionNormalUV0SPU_NormalUV1PosOffsetNormalOffsetControlPointsAO

MFC writes data atomically in 16 byte chunks...

If vertex format is 16 bytes exactly can atomically change a vertex from SPU.

If you can live with the odd vertex being wrong for a frame, this could be a huge win!/* Damage: Data III */

/* Damage: Data IV */SPURSX LocalMainWrite-only VerticesRead-only Vertices

Damage events from game-side code are queued.

Note: There is no link to the player health, purely superficial./* Damage: Events */ImpactImpactGame CodeImpactImpactImpactImpact

/* Damage: Data V */ImpactImpactImpactConstantsImpactImpactImpactSPUGPUWrite-only Vertices*Read-only Vertices** - w.r.t to SPU

/* Damage: Data VI */SPUGPUWrite-only Vertices*Read-only Vertices** - w.r.t to SPU

Kick off SPU tasksLess sync points should be the goal of any multi-core code:/* Damage: Control */Other Work(1)PPU Damage

Less sync points should be the goal of any multi-core code:/* Damage: Control */Other Work(1)Other Work(1)PPU Damage

Less sync points should be the goal of any multi-core code:/* Damage: Control */Vertex WorkVertex WorkVertex WorkVertex WorkOther Work(1)Other Work(1)PPU Damage

Less sync points should be the goal of any multi-core code:/* Damage: Control */FlagVertex WorkVertex WorkVertex WorkVertex WorkOther Work(1)Other Work(1)PPU Damage

Less sync points should be the goal of any multi-core code:/* Damage: Control */FlagVertex WorkVertex WorkVertex WorkVertex WorkOther Work(1)Other Work(2)Other Work(1)PPU Damage

Less sync points should be the goal of any multi-core code:/* Damage: Control */FlagVertex WorkVertex WorkVertex WorkVertex WorkOther Work(1)Other Work(2)Other Work(1)PPU DamagePPU Damage

Pretty easy to go from shaders to SPU intrinsics or asm.

We favour si style for simplicity and ease./* de-code into IEEE754-ish 32bit float (meh): */qword sign_bit = si_and(result, sign_bit_mask);sign_bit = si_shli(sign_bit, 0x10); /* move 16 bits into correct place. */qword significand = si_and(result, mant_bit_mask);significand = si_shli(significand, 0xd);qword is_zero_mask = si_cgti(significand, 0x0); /* all bits set if non-zero. */expo_bias = si_and(is_zero_mask, expo_bias);qword exponent_bias= si_a(significand, expo_bias); /* move expo up range, 0x07800000=>0x3f800000. */exponent_bias= si_or(exponent_bias, sign_bit);/* Damage: SPU I */

GPU version relied on bilinear filtering of volume texture to smooth damage.

Filtering on SPU is a bit of a pain.

Working out which events affect which vertices?/* Damage: SPU II */

1. Get data in volume texture-ish format.

2. Apply x-form to all vertices./* Damage: SPU III */

Some interesting instructions in ISA will help here./* Damage: SPU IV */

Data flow through SPU program is paramount to performance.Process in 16KB chunks.Multi-buffer input and output.If your system isn’t ‘mission critical’, align and lose double buffer./* Damage: Lessons I */

Make use of SoA mode data layout, liberated from rigidity of GPU programming model! /* Damage: Lessons II */xyzwxxxxxyzwyyyyxyzwzzzzxyzwwwww

Add value to your SPU program for relatively small computational effort:

We added some of the per-vertex lighting calculations for brake lights, for example./* Damage: Lessons III */

40 in total (accounting for double buffer).

Cars are lit with a mixture of things:

SPU Assisted Rendering

More Related Content

What's hot

Similar to SPU Assisted Rendering

Recently uploaded

SPU Assisted Rendering

Editor's Notes