SlideShare a Scribd company logo
Uncharted 2: HDR LightingJohn HableNaughty Dog
AgendaGamma/Linear-Space LightingFilmic TonemappingSSAORendering Architecture/How it all fits together.
Skeletons in the ClosetUsed to work for EA Black Box (Vancouver)‏
Ucap Project
I.e. Tiger Woods's Face
Also used to work for EA Los Angeles
LMNO (Spielberg Project)‏
Not PQRS (Boom Blox)‏
Now at Naughty Dog
Uncharted 2Part 1: Gamma!
GammaMostly solved in Film Industry after lots of kicking and screaming
Never heard about it in college
George Borshukov taught it to me back at EA
Matrix Sequels
Issue is getting more traction now
Eugene d’Eon’s chapter in GPU Gems 3DemoWhat is halfway between white and black
Suppose we show alternating light/dark lines.
If we squint, we get half the linear intensity.DemoTest image with 4 rectangles.
Real image of a computer screen shot from my camera.DemoAlternating lines of black and white (0 and 255).DemoSo if the left and right are alternating lines of 0 and 255, then what color is the rectangle with the blue dot?DemoCommon wrong answer: 127/128.
That’s the box on the top.128
DemoCorrect answer is 187.
WTF?128187
Color RampYou have seen this before.0127255187
Gamma Lines255F(x) = pow(x,2.2)‏1871270
What is GammaGamma of 0.45, 1, 2.2
Our monitors are the red oneWe all love mink...F(x) = xWe all love mink...F(x) = pow(x,0.45)        note: .45 ~= 1/2.2‏We all love mink...F(x) = pow(x,2.2)‏What is GammaGet to know these curves…What is GammaBack and forth between curves.2.22.2.45.45
How bad is it?Your monitor has a gamma of about 2.2
So if you output the left image to the framebuffer, it will actually look like the one right.2.2
Let’s build a Camera…A linear camera.ActualLightMonitorOutput1.02.2HardDrive
Let’s build a Camera…Any consumer camera.ActualLightMonitorOutput.452.2HardDrive
How to fix this?Your screen displays a gamma of 2.2
This is stored on your hard drive.
You never see this image, but it’s on your hard drive.2.2
What is GammaCurves again.What is GammaQ) Why not just store as linear?A) Our eyes perceive more data in the blacks.
How bad is it?Suppose we look at the 0-63 range.
What if displays were linear?GammaGamma of 2.2 gives us more dark colors.How bad is it?If displays were linear:
On a scale from 0-255
0 would equal 0
1 would equal our current value of 20
2 would equal our current value of 28
3 would equal our current value of 33
Darkest Values:Ramp AgainSee any banding in the whites? 3D GraphicsIf your game is not gamma correct, it looks like the image on the left..
Note: we’re talking about “correct”, not “good”3D GraphicsHere's what is going on.ScreenHardDriveLighting
3D GraphicsAccount for gamma when reading textures
color = pow( tex2D( Sampler, Uv ), 2.2 )‏
Do your lighting calculations
Account for gamma on color output
finalcolor = pow( color, 1/2.2 )‏3D GraphicsHere's what is going on.MonitorAdjustHardDriveLightingShaderCorrectGamma
3D GraphicsComparison again...3D GraphicsThe Wrong Shader?Spec = CalSpec();Diff = tex2D( Sampler, UV );Color = Diff * max( 0, dot( N, L ) ) + Spec;return Color;
3D GraphicsThe Right Shader?Spec = CalSpec();Diff = pow( tex2D( Sampler, UV ), 2.2 );Color = Diff * max( 0, dot( N, L ) ) + Spec;return pow( Color, 1/2.2);
3D GraphicsBut, there is hardware to do this for us.
Hardware does sampling for free
For Texture read:
D3DSAMP_SRGBTEXTURE
For RenderTarget write:
D3DRS_SRGBWRITEENABLE3D GraphicsWith those states we can remove the pow functions.Spec = CalSpec();Diff = tex2D( Sampler, UV );Color = Diff * max( 0, dot( N, L ) ) + Spec;return Color;
3D GraphicsWhat does the real world look like?
Notice the harsh line.3D GraphicsAnother ExampleFAQsQ1) But I like the soft falloff!A1) Don’t be so sure.
FAQsIt looks like the one on top before the monitor’s 2.2Harsh FalloffDevastating.Harsh FalloffDevastating.Harsh FalloffDevastating.Which Maps?Which maps should be Linear-Space vs Gamma-Space?
Gamma-Space
Use sRGB hardware or pow(2.2) on read
128 ~= 0.2
Linear-Space
Don’t use sRGB or pow(2.2) on read
128 = 0.5Which Maps?Diffuse Map
Definitely Gamma
Normal Map
Definitely LinearWhich Maps?Specular?
Uncharted 2 had them as Linear
Artists have trouble tweaking them to look right
Probably should be in GammaWhich Maps?Ambient Occlusion
Technically, it’s a mathematical value like a normal map.
But artists tweak them a lot and bake extra lighting into them.
Uncharted 2 had them as Linear
Probably should be GammaExercise for the ReaderGamma 2.2 != sRGB != Xenon PWL
sRGB: PC and PS3 gamma
Xenon PWL: Piecewise linear gammaXenon GotchasXbox 360 Gamma Curve is wonky
Way too bright at the low end.
HDR the Bungie Way, Chris Tchou
Post Processing in The Orange Box, Alex Vlachos
Output curve is extra contrasty
Henry LaBounta was going to talk about it.
Try hooking up your Xenon to a waveform monitor and display a test pattern.  Prepare to be mortified.
Both factors counteract each otherLinear-Space Lighting: Conclusion“Drake!  You must believe in linear-space lighting!”
Part 2: Filmic Tonemaping
Filmic TonemappingLet's talk about the magic of the human eye.Filmic TonemappingReal example from my apartment.
Full Auto settings.
Outside blown out.
Cello case looks empty.
My eye somehow adjusts...Filmic TonemappingAdaptiveness of human eye.Filmic TonemappingAdaptiveness of human eye.Filmic TonemappingAdaptiveness of human eye.Filmic TonemappingAdaptiveness of human eye.Filmic TonemappingAdaptiveness of human eye.Filmic TonemappingAdaptiveness of human eye.Filmic TonemappingAdaptiveness of human eye.Filmic TonemappingAdaptiveness of human eye.Filmic TonemappingOne moreFilmic TonemappingOne moreGammaLet’s take an HDR image
Note: 1 F-Stop is a power of two
So 4 F-stops is 2^4=16x difference in intensityGammaExposure: -4GammaExposure: -2GammaExposure: 0GammaExposure: 2GammaExposure: 4Now what?We actually need 8 stops
Good News: HDR Image
Bad News: Now what?
Re-tweaking exposure won't helpGammaThis is what Photography people call tonemapping.
Photomatix
This one is local
The one we’ll use is globalPoint of ConfusionTwo problems to solve.
Simulate the Iris
Dynamic Range in different shots
Tunnel vs. Daylight
Simulate the Retina
Different Range within a shot
Inside looking outsideTerminologyPhotography/Film
Within a Shot – HDR Tonemapping
Different Shots – Choosing the correct exposure?
Video Games
Within a Shot – HDR Tonemapping
Different Shots – HDR Tonemapping?TerminologyFor this presentation
Within a Shot – HDR Tonemapping
Different Shots
Automatic Exposure Adjustment
Dynamic IrisLinear CurveApartment of Habib Zargarpour
Creative Director, Microsoft Game Studios
Used to be Art Director on LMNO at EALinear CurveOutColor = pow(x,1/2.2)‏ReinhardMost common tonemapping operator is Reinhard
Simplest version is:
F(x) = x/(x+1)
Yes, I’m oversimplifiying for time.Fade!Let's fade from linear to Reinhard.Fade!Let's fade from linear to Reinhard.Fade!Let's fade from linear to Reinhard.Fade!Let's fade from linear to Reinhard.Fade!Let's fade from linear to Reinhard.Fade!Let's fade from linear to Reinhard.Why Does this happenStart with orange (255,88,21)‏Linear CurveLinear curve at the low end and high endBad GammaBetter blacks actually, but wrong color at the bottom and horrible hue-shifting.ReinhardDesaturated blacks, but nice top end.Color ChangesGee...if only we could get:
The crisper, more saturated blacks of improper gamma.
The nice soft highlights of Reinhard
And get the input colors to actually match like pure linear.Voila!Guess what?  There is a solution!Voila!Solution by Haarm-Peter Duiker
CG Supervisor at Digital Domain
Used to be a CG Supervisor on LMNO at EAFilmQ) Why do they still use film in movies?A) The look.
FilmKodak film examples.ShoulderToe
FilmUsing a film curve solves tons of problems.
Film vs. Digital purists.
Who would’ve thought: Kodak knows how to make good film.Filmic CurveCrisp blacks, saturated dark end, and nice highlights.Filmic TonemappingThis technique works with any color.ReinhardLinearFilmic
Filmic TonemappingThis technique works with any color.ReinhardLinearFilmic
Filmic TonemappingThis technique works with any color.ReinhardLinearFilmic
Filmic TonemappingThis technique works with any color.ReinhardLinearFilmic
Filmic TonemappingThis technique works with any color.ReinhardLinearFilmic
Filmic TonemappingThis technique works with any color.ReinhardLinearFilmic
Filmic TonemappingThis technique works with any color.ReinhardLinearFilmic
Filmic TonemappingThis technique works with any color.ReinhardLinearFilmic
Filmic TonemappingThis technique works with any color.ReinhardLinearFilmic
Filmic TonemappingIn terms of “Crisp Blacks” vs. “Milky Blacks”
Filmic > Linear > Reinhard
In terms of “Soft Highights” vs. “Clamped Highlights”
Filmic = Reinhard > LinearFilmic TonemappingLinear.Filmic TonemappingFade from linear to filmic.Filmic TonemappingFade from linear to filmic.Filmic TonemappingFade from linear to filmic.Filmic TonemappingFade from linear to filmic.Filmic TonemappingFade from linear to filmic.Filmic TonemappingFade from linear to filmic.Filmic vs ReinhardFade from linear to Reinhard.Filmic TonemappingFade from Reinhard to filmic.Filmic TonemappingFade from Reinhard to filmic.Filmic TonemappingFade from Reinhard to filmic.Filmic TonemappingFade from Reinhard to filmic.Filmic TonemappingFade from Reinhard to filmic.Filmic TonemappingFade from Reinhard to filmic.Filmic TonemappingLet's do a comparison.Filmic TonemappingThis is with a linear curve.Filmic TonemappingAnd here is filmic.Filmic TonemappingFirst, we'll go up.Filmic TonemappingExposure 0Filmic TonemappingExposure +1Filmic TonemappingExposure +2Filmic TonemappingExposure +3Filmic TonemappingExposure +4Filmic TonemappingOk, now that's bad.
What happens if we go down?
Notice the crisper blacks in the filmic version.Filmic TonemappingExposure: 0Filmic TonemappingExposure: -1Filmic TonemappingExposure: -2Filmic TonemappingExposure: -3Filmic TonemappingExposure: -4Filmic TonemappingColors get crisper and more saturated at the bottom end.Filmic Tonemappingfloat3 ld = 0.002;float linReference = 0.18;float logReference = 444;float logGamma = 0.45;outColor.rgb = (log10(0.4*outColor.rgb/linReference)/ld*logGamma + logReference)/1023.f;outColor.rgb  = clamp(outColor.rgb , 0, 1);float FilmLutWidth = 256;float Padding = .5/FilmLutWidth;outColor.r = tex2D(FilmLutSampler, float2( lerp(Padding,1-Padding,outColor.r), .5)).r;outColor.g = tex2D(FilmLutSampler, float2( lerp(Padding,1-Padding,outColor.g), .5)).r;outColor.b = tex2D(FilmLutSampler, float2( lerp(Padding,1-Padding,outColor.b), .5)).r;
Filmic TonemappingVariation in white section:254248224240252
More MagicThe catch:
Three texture lookups
Jim Hejl to the rescue	Principle Member of Technical Staff	GPU Group, Office of the CTO	AMD ResearchUsed to work for EAMore MagicAwesome approximation with only ALU ops.
Replaces the entire Lin/Log and Texture LUT
Includes the pow(x,1/2.2)‏x = max(0, LinearColor-0.004);GammaColor = (x*(6.2*x+0.5))/(x*(6.2*x+1.7)+0.06);
Filmic TonemappingNice curve
Toe arguably too strong
Most monitors are too contrasty, strong toe makes it worse
May want less shoulder
The more range, the more shoulder
If your scene lacks range, you want to tone down the shoulder a bitFilmic TonemappingA = Shoulder StrengthB = Linear StrengthC = Linear AngleD = Toe StrengthE = Toe NumeratorF = Toe DenominatorNote: E/F = Toe AngleLinearWhite = Linear White Point ValueF(x) = ((x*(A*x+C*B)+D*E)/(x*(A*x+B)+D*F)) - E/F;FinalColor = F(LinearColor)/F(LinearWhite)‏
Filmic TonemappingShoulder Strength = 0.22Linear Strength = 0.30Linear Angle = 0.10Toe Strength = 0.20Toe Numerator = 0.01Toe Denominator = 0.30Linear White Point Value = 11.2These numbers DO NOT have the pow(x,1/2.2) baked in
ConclusionsFilmic Tonemapping
IMO: The most important post effect
Changes your life completely
Once you have it, you can't live without it
If you implement it, you have to redo all your lighting
You can't just switch it on if your lighting is tweaked for no dynamic range.Filmic Tonemapping“At least I’m not getting choked in gamma space...”
Part 3: SSAO
SSAO TimeThat was fun.
Time for stage 3.
Let’s talk about why you need AO.Why AO?The shadow “grounds” the car.Why AO?Unless you’re already in the shadow!Why AO?Your shadow can't help you now!Why AO?Your shadow can't help you now!Why AO?Your shadow can't help you now!Why AO?Your shadow can't help you now!How important is it?The shadow from the sun grounds object.
But if you are already in shadow, you need something else...
It's the AO that grounds the car in those shots.
So what if we took the AO out?
Trick taught to me by Habib (the guy with the awesome condo earlier)‏Why AO?Your shadow can't help you now!Why AO?Your shadow can't help you now!Why AO?Your shadow can't help you now!Why AO?Your shadow can't help you now!Why AO?Your shadow can't help you now!Why AO?Your shadow can't help you now!Why AO?Your shadow can't help you now!Why AO?Your shadow can't help you now!How to use AO?Sun is Directional Light
Sky is Hemisphere Light (ish)‏How to use AO?Sun is Yellow (ish)‏
Sky is Blue (ish)‏How to use AO?Sun is Directional
Approximate Directional Shadow with Shadow MappingHow to use AO?Sky is Hemisphere (sorta)‏
Approximate Hemisphere Shadow with AOHow to use AO?Sun is yellow, sky is blue.How to use AO?
How to use AO?
How to use AO?
How to use AO?
How to use AO?
SunlightQ) Why not have it affect sunlight?A) It does the exact opposite of what you want it to do.
SSAO ExampleOffSSAO ExampleOnSSAO ExampleSSAO Darkens Shadows – Good
SSAO Darkens Sunlight – BAD
Reduces perceived contrast
Makes lighting look flatter
My Opinion: SSAO should NOT affect Direct LightThe AlgorithmOnly Input is Half-Res Depth
No normals
No special filtering (I.e. Bilateral)‏
Processed in 64x64 blocks
Adjacent Depth tiles included as wellThe AlgorithmCalculate Base AO
Dilate Horizontal
Dilate Vertical
Apply Low Pass FilterSSAO Passes0) Base Image
SSAO Passes1) Base SSAO
SSAO Passes2) Dilate Horizontal
SSAO Passes3) Dilate Vertical
SSAO Passes4) Low Pass Filter	(I.e. blur the $&%@ out of it)‏
SSAO PassesStep One: The Base AO
Uses half-res depth buffer
Calculate AO based on nearby depth samplesSSAO PassesPink point on a black surface.SSAO PassesCorrect way is to look at the hemisphere...SSAO Passes...and find the area that is visible to the point.
Our solution is much hackier.SSAO PassesWho needs sphere?  A box will do just as well.SSAO PassesInstead of area though, we will approximate volume.SSAO PassesWe can estimate volume without knowing the normal.
Half Volume = Fully UnoccludedSSAO PassesWe can estimate volume without knowing the normal.
Half Volume = Fully UnoccludedSSAO PassesWe can estimate volume without knowing the normal.
Half Volume = Fully UnoccludedSSAO PassesWe can estimate volume without knowing the normal.
Half Volume = Fully UnoccludedSSAO PassesStart with a pointSSAO PassesAnd sample pairs of points.SSAO PassesWhat if we have an occluder?SSAO PassesWhat if we have an occluder?SSAO PassesWe lose depth info.SSAO PassesWe could call it occluded.SSAO PassesBut, is that correct?  Maybe not.SSAO PassesLet's choose a cutoff, and mark the pair as invalid.SSAO PassesSample pattern?SSAO PassesSample pattern.  Discard in pairs.SSAO PassesNote the single light outline.SSAO PassesFind discontinuties...SSAO PassesFind discontinuties...SSAO PassesFind discontinuties...SSAO PassesFind discontinuties...SSAO Passes...and dilate at discontinuties.  Here is original.SSAO PassesAfter X dilate.SSAO PassesAfter Y dilate.  And the light outline is gone.SSAO PassesAnd do a gaussian blur.SSAO PassesFinalArtifactsWhite OutlineCrossing PatternDepth Discontinuities
ArtifactsBlur adds white outline
Crossing PatternArtifactsMore Crossing patternsArtifactsDepth Discontinuities.  Notice a little anti-halo.ArtifactsHard to see in real levels though.ArtifactsArtifacts are usually only noticeable to the trained eye
Benefits outweigh drawbacks
Have option to turn it off per surface
Most snow fades at distance
Some objects apply sqrt() to brighten it
Some objects just have it offMore ShotsOnMore ShotsOffMore ShotsHere's a few more shots.More ShotsHere's a few more shots.More ShotsHere's a few more shots.More ShotsHere's a few more shots.More ShotsHere's a few more shots.More ShotsHere's a few more shots.More ShotsHere's a few more shots.More ShotsHere's a few more shots.More ShotsHere's a few more shots.More ShotsHere's a few more shots.SSAO ConclusionGrounds characters and other objects
Deepens our Blacks
Artifacts are not too bad
Could use better filtering for halos
Runs on SPUs, and doesn’t need normal bufferPart 4: Architecture
ArchitectureTime for part 4.
Here is the base overview.
Skip some thingsRendering PassesFinal outputRendering PassesDepth and Normal, 2x MSAARendering PassesShadow depths for cascadesRendering PassesCascade calculation to screen-spaceRendering PassesMeanwhile on the SPUs...Rendering PassesSSAORendering PassesFullscreen lighting.  Diffuse and specular.
Calculate lighting in TilesRendering PassesBack on the GPU...Rendering PassesRender to RGBM, 2x MSAARendering PassesResolve RGBM 2x MSAA to FP16 1xRendering PassesTransparent objects and post.Full TimelineFull Rendering PipelineFull TimelineFull Render in 3 framesFrame 1Begin Frame 1.Frame 1TimelineSPUsGPUPPU
Frame 1Begin Frame 1.Frame 1All PPU stuff.Frame 1PPU kicks off SPU jobsFrame 2Does most of the rendering.Frame 2Vertex processing on SPUs...Frame 2...kicks of 2x DepthNormal pass.Frame 2Resolve 2x MSAA buffers to 1x and copy.Frame 2Spotlight shadow depth pass.Frame 2Sunlight shadow depth pass.Frame 2Kick Fullscreen Lighting SPU jobs.Frame 2SSAO Jobs.Frame 2Motion BlurFrame 2Resolve shadow to screen.  Copy Fullscreen lighting.Frame 2Add Fullscreen shadowed lights.Frame 2Standard Pass Vertex Processing on SPUs.Frame 2Standard pass.  Writes to 2x RGBM.Frame 2Resolve 2x RGBM to 1x FP16Frame 2Full-Res Alpha-Blended effects.Frame 2Half-Res Particles and Composite.Frame 3Now for the 3rd frame.Frame 3Start PostFX jobs at end of Frame 2.Frame 3Fill in PostFX jobs in Frame 3 with low priority.Frame 3Read from main memory, do distortion and HUD.Full TimelineAll together.SPU OptimizationQuick notes on how we optimize SPU code.PerformanceYes!  You can optimize SPU code by hand!
This was news to me when I started here.PerformanceLet’s talk about Laundry
Say you have 5 loads to do
1 washer, 1 dryerIn-OrderLoad 1 in Washer
Load 1 in Dryer
Load 2 in Washer
Load 2 in Dryer
Load 3 in Washer
Load 3 in Dryer
Load 4 in Washer
Load 4 in Dryer
Load 5 in Washer
Load 5 in DryerPipelinedLoad 1 in Washer
Load 2 in Washer, Load 1 in Dryer
Load 3 in Washer, Load 2 in Dryer
Load 4 in Washer, Load 3 in Dryer
Load 5 in Washer, Load 4 in Dryer
Load 5 in DryerSoftware PipeliningWe can do this in software
Called “Software Pipelining”
It’s how we optimize SPU code by hand
Pal Engstad (our Lead Graphics Programmer) will be putting a doc online explaining the full process.
Should be online: look for the post on the blog
Direct links:
www.naughtydog.com/docs/gdc2010/intro-spu-optimizations-part-1.pdf
www.naughtydog.com/docs/gdc2010/intro-spu-optimizations-part-2.pdfSPU C++ Intrinsicsfor( int y = 0; y < clampHeight; y++ )	{intlColorOffset = y * kAoBufferLineWidth;		VF32 currWordAo = pvColor[ lColorOffset / 4 ];prevWordAo[0] = prevWordAo[1] = prevWordAo[2] = prevWordAo[3] = currWordAo[0];		ao_n4 = prevWordAo;		ao_n3 = (VF32)si_shufb( (qword)prevWordAo, (qword)currWordAo, (qword)s_BCDa );		ao_n2 = (VF32)si_shufb( (qword)prevWordAo, (qword)currWordAo, (qword)s_CDab );		ao_n1 = (VF32)si_shufb( (qword)prevWordAo, (qword)currWordAo, (qword)s_Dabc );ao_curr = currWordAo;		for (int x = 0; x < kAoBufferLineWidth; x+=4) {			VF32 nextWordAo = pvColor[ (lColorOffset + x + 4)/ 4 ];			VF32 ao_p1 = (VF32)si_shufb( (qword)currWordAo, (qword)nextWordAo, (qword)s_BCDa );			VF32 ao_p2 = (VF32)si_shufb( (qword)currWordAo, (qword)nextWordAo, (qword)s_CDab );			VF32 ao_p3 = (VF32)si_shufb( (qword)currWordAo, (qword)nextWordAo, (qword)s_Dabc );			VF32 ao_p4 = nextWordAo;			VF32 blurAo = ao_n4*w4 + ao_n3*w3 + ao_n2*w2 + ao_n1*w1 + ao_curr*w0;			blurAo += ao_p1*w1 + ao_p2*w2 + ao_p3*w3 + ao_p4*w4;pvColor[ (lColorOffset + x)/ 4 ] = blurAo;prevWordAo = currWordAo;currWordAo = nextWordAo;			ao_n4 = prevWordAo;			ao_n3 = ao_p1;			ao_n2 = ao_p2;			ao_n1 = ao_p3;ao_curr = currWordAo;		}	}
SPU Assembly 1/2EVENODDloop:		{nop}								{o6} lqxcurrAo, pvColor, colorOffset		{nop}								{o4} shufb prevAo0, currAo, currAo, s_AAAA		{nop}								{o4} shufb ao_n30, prevAo0, currAo, s_BCDa		{nop}								{o4} shufb ao_n20, prevAo0, currAo, s_CDab		{nop}								{o4} shufb ao_n10, prevAo0, currAo, s_Dabc	{e2} selb ao_n4, ao_n4, prevAo0, endLineMask					{lnop}	{e2} selb ao_n3, ao_n3, ao_n30, endLineMask					{lnop}	{e2} selb ao_n2, ao_n2, ao_n20, endLineMask					{lnop}	{e2} selb ao_n1, ao_n1, ao_n10, endLineMask					{lnop}		{nop}								{o6} lqxnextAo, pvNextColor, colorOffset	{nop}								{o4} shufb ao_p1, currAo, nextAo, s_BCDa		{nop}								{o4} shufb ao_p2, currAo, nextAo, s_CDab		{nop}								{o4} shufb ao_p3, currAo, nextAo, s_Dabc	{e6} fa ao_n4, ao_n4, nextAo							{lnop}	{e6} fa ao_n3, ao_n3, ao_p3							{lnop}	{e6} fa ao_n2, ao_n2, ao_p2							{lnop}	{e6} fa ao_n1, ao_n1, ao_p1							{lnop}
SPU Assembly 2/2EVENODD	{e6} fm blurAo, ao_n4, w4							{lnop}	{e6} fma blurAo, ao_n3, w3, blurAo						{lnop}	{e6} fma blurAo, ao_n2, w2, blurAo						{lnop}	{e6} fma blurAo, ao_n1, w1, blurAo						{lnop}	{e6} fmablurAo, currAo, w0, blurAo						{lnop}		{nop}								{o6} stqxblurAo, pvColor, colorOffset		{nop}								{o4} shlqbii ao_n4, currAo, 0			{nop}								{o4} shlqbii ao_n3, ao_p1, 0			{nop}								{o4} shlqbii ao_n2, ao_p2, 0			{nop}								{o4} shlqbii ao_n1, ao_p3, 0		{e2} ceqendLineMask, colorOffset, endLineOffset				{lnop}	{e2} ceqendBlockMask, colorOffset, endOffset					{lnop}	{e2} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask			{lnop}	{e2} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask	{lnop}	{e2} a endLineOffset, endLineOffset, endLineOffsetIncr				{lnop}	{e2} a colorOffset, colorOffset, colorOffsetIncr				{lnop}		{nop}							branch: {o?} brzendBlockMask, loop
SPU Assembly{e6} faao_n3, ao_n3, ao_p3{o6}lqxnextAo, pvNextColor, colorOffsetHere is a sample line of assembly code
SPUs issue one even and odd instruction per cycle
One line = One cycle
e6 = even, 6 cycle latency
o6 = odd, 6 cycle latencyIteration 1loop:nop											lnopnop 											lnop  nop 											lnopnop 						lnopnop 											lnop  nop 											lnopnop 						lnopnop 						lnopnop 											lnop  nop 											lnopnop 						lnopnop 											lnop  nop 											lnopnop 						lnopnop 											lnop  nop 											lnopnop 											lnop  nop 											lnopnop 						lnopnop branch: {o?:0} brzendBlockMask, loop
Iteration 1loop:nop											lnopnop 					{o6:0} lqx currAo, pvColor, colorOffsetnop 					{o6:0} lqx nextAo, pvNextColor, colorOffsetnop 						{o4:0} shlqbiiendLineMask_, endLineMask, 0nop 											lnop  nop 											lnopnop 						lnopnop 						{o4:0} shufb prevAo0, currAo, currAo, s_AAAA{e2:0} ceqendLineMask, colorOffset, endLineOffset				{o4:0} shufb ao_p1, currAo, nextAo, s_BCDa{e2:0} ceqendBlockMask, colorOffset_, endOffset				{o4:0} shufb ao_p2, currAo, nextAo, s_CDab{e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask		{o4:0} shufb ao_p3, currAo, nextAo, s_Dabc{e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_				{o4:0} shufb ao_n30, prevAo0, currAo, s_BCDanop 						{o4:0} shufb ao_n20, prevAo0, currAo, s_CDabnop								{o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc{e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask lnop{e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_			{o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0{e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_lnop{e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_				lnop{e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop{e2:0} a endLineOffset, endLineOffset, endLineOffsetIncrbranch: {o?:0} brzendBlockMask, loop
Iteration 1loop:nop											lnopnop 					{o6:0} lqx currAo, pvColor, colorOffsetnop 					{o6:0} lqx nextAo, pvNextColor, colorOffsetnop 						{o4:0} shlqbiiendLineMask_, endLineMask, 0nop 											lnop  nop 											lnopnop 						lnopnop 						{o4:0} shufb prevAo0, currAo, currAo, s_AAAA{e2:0} ceqendLineMask, colorOffset, endLineOffset				{o4:0} shufb ao_p1, currAo, nextAo, s_BCDa{e2:0} ceqendBlockMask, colorOffset_, endOffset				{o4:0} shufb ao_p2, currAo, nextAo, s_CDab{e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask		{o4:0} shufb ao_p3, currAo, nextAo, s_Dabc{e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_				{o4:0} shufb ao_n30, prevAo0, currAo, s_BCDanop 						{o4:0} shufb ao_n20, prevAo0, currAo, s_CDabnop								{o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc{e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask lnop{e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_			{o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0{e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_lnop{e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_				lnop{e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop{e2:0} a endLineOffset, endLineOffset, endLineOffsetIncrbranch: {o?:0} brzendBlockMask, loop
ProblemsTake this line:            {o6} lqxcurrAo, pvColor, colorOffset            {o4} shufb prevAo0, currAo, currAo, s_AAAALoad a value into currAo
Next instruction needs currAo
currAo won’t be ready yet
Stall for 5 cyclesIteration 1loop:nop											lnopnop 					{o6:0} lqx currAo, pvColor, colorOffsetnop 					{o6:0} lqx nextAo, pvNextColor, colorOffsetnop 						{o4:0} shlqbiiendLineMask_, endLineMask, 0nop 											lnop  nop 											lnopnop 						lnopnop 						{o4:0} shufb prevAo0, currAo, currAo, s_AAAA{e2:0} ceqendLineMask, colorOffset, endLineOffset				{o4:0} shufb ao_p1, currAo, nextAo, s_BCDa{e2:0} ceqendBlockMask, colorOffset_, endOffset				{o4:0} shufb ao_p2, currAo, nextAo, s_CDab{e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask		{o4:0} shufb ao_p3, currAo, nextAo, s_Dabc{e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_				{o4:0} shufb ao_n30, prevAo0, currAo, s_BCDanop 						{o4:0} shufb ao_n20, prevAo0, currAo, s_CDabnop								{o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc{e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask lnop{e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_			{o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0{e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_lnop{e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_				lnop{e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop{e2:0} a endLineOffset, endLineOffset, endLineOffsetIncrbranch: {o?:0} brzendBlockMask, loop
Iteration 1loop:nop											lnopnop 					{o6:0} lqx currAo, pvColor, colorOffsetnop 					{o6:0} lqx nextAo, pvNextColor, colorOffsetnop 						{o4:0} shlqbiiendLineMask_, endLineMask, 0nop 											lnop  nop 											lnopnop 						lnopnop 						{o4:0} shufb prevAo0, currAo, currAo, s_AAAA{e2:0} ceqendLineMask, colorOffset, endLineOffset				{o4:0} shufb ao_p1, currAo, nextAo, s_BCDa{e2:0} ceqendBlockMask, colorOffset_, endOffset				{o4:0} shufb ao_p2, currAo, nextAo, s_CDab{e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask		{o4:0} shufb ao_p3, currAo, nextAo, s_Dabc{e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_				{o4:0} shufb ao_n30, prevAo0, currAo, s_BCDanop 						{o4:0} shufb ao_n20, prevAo0, currAo, s_CDabnop								{o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc{e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask lnop{e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_			{o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0{e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_lnop{e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_				lnop{e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop{e2:0} a endLineOffset, endLineOffset, endLineOffsetIncrbranch: {o?:0} brzendBlockMask, loop
Iteration 2loop:nop											lnop{e6:1} fa ao_n4, ao_n4, nextAo	{o6:0} lqx currAo, pvColor, colorOffset{e6:1} fa ao_n3, ao_n3, ao_p3{o6:0} lqx nextAo, pvNextColor, colorOffsetnop{o4:0} shlqbiiendLineMask_, endLineMask, 0{e6:1} fa ao_n1_, ao_n1, ao_p1	lnop{e6:1} fa ao_n2_, ao_n2, ao_p2	lnopnoplnop{e6:1} fm blurAo, ao_n4, w4	{o4:0} shufb prevAo0, currAo, currAo, s_AAAA{e2:0} ceqendLineMask, colorOffset, endLineOffset				{o4:0} shufb ao_p1, currAo, nextAo, s_BCDa{e2:0} ceqendBlockMask, colorOffset_, endOffset				{o4:0} shufb ao_p2, currAo, nextAo, s_CDab{e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask		{o4:0} shufb ao_p3, currAo, nextAo, s_Dabc{e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_				{o4:0} shufb ao_n30, prevAo0, currAo, s_BCDanop{o4:0} shufb ao_n20, prevAo0, currAo, s_CDab{e6:1} fma blurAo, ao_n3, w3, blurAo	{o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc{e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMasklnop {e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_			{o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0{e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_lnop{e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_				lnop{e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop{e2:0} a endLineOffset, endLineOffset, endLineOffsetIncrbranch: {o?:0} brzendBlockMask, loop
Iteration 3loop:nop				 							lnop{e6:1} fa ao_n4, ao_n4, nextAo	{o6:0} lqx currAo, pvColor, colorOffset{e6:1} fa ao_n3, ao_n3, ao_p3{o6:0} lqx nextAo, pvNextColor, colorOffset{e6:2} fmablurAo_, ao_n2_, w2, blurAo{o4:0} shlqbiiendLineMask_, endLineMask, 0{e6:1} fa ao_n1_, ao_n1, ao_p1	lnop{e6:1} fa ao_n2_, ao_n2, ao_p2	lnopnop 						lnop{e6:1} fm blurAo, ao_n4, w4	{o4:0} shufb prevAo0, currAo, currAo, s_AAAA{e2:0} ceqendLineMask, colorOffset, endLineOffset				{o4:0} shufb ao_p1, currAo, nextAo, s_BCDa{e2:0} ceqendBlockMask, colorOffset_, endOffset				{o4:0} shufb ao_p2, currAo, nextAo, s_CDab{e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask		{o4:0} shufb ao_p3, currAo, nextAo, s_Dabc{e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_				{o4:0} shufb ao_n30, prevAo0, currAo, s_BCDa{e6:2} fma blurAo__, ao_n1__, w1, blurAo_{o4:0} shufb ao_n20, prevAo0, currAo, s_CDab{e6:1} fma blurAo, ao_n3, w3, blurAo	{o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc{e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMasklnop {e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_			{o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0{e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_lnop{e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_				lnop{e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop{e2:0} a endLineOffset, endLineOffset, endLineOffsetIncrbranch: {o?:0} brzendBlockMask, loop
Iteration 4loop:nop 											lnop{e6:1} fa ao_n4, ao_n4, nextAo	{o6:0} lqx currAo, pvColor, colorOffset{e6:1} fa ao_n3, ao_n3, ao_p3{o6:0} lqx nextAo, pvNextColor, colorOffset{e6:2} fmablurAo_, ao_n2_, w2, blurAo{o4:0} shlqbiiendLineMask_, endLineMask, 0{e6:1} fa ao_n1_, ao_n1, ao_p1	lnop{e6:1} fa ao_n2_, ao_n2, ao_p2	lnop{e6:3} fmablurAo___, currAo___, w0, blurAo__lnop{e6:1} fm blurAo, ao_n4, w4	{o4:0} shufb prevAo0, currAo, currAo, s_AAAA{e2:0} ceqendLineMask, colorOffset, endLineOffset				{o4:0} shufb ao_p1, currAo, nextAo, s_BCDa{e2:0} ceqendBlockMask, colorOffset_, endOffset				{o4:0} shufb ao_p2, currAo, nextAo, s_CDab{e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask		{o4:0} shufb ao_p3, currAo, nextAo, s_Dabc{e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_				{o4:0} shufb ao_n30, prevAo0, currAo, s_BCDa{e6:2} fma blurAo__, ao_n1__, w1, blurAo_{o4:0} shufb ao_n20, prevAo0, currAo, s_CDab{e6:1} fma blurAo, ao_n3, w3, blurAo	{o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc{e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask{o6:3} stqxblurAo___, pvColor, colorOffset_{e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_			{o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0{e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_lnop{e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_				lnop{e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop{e2:0} a endLineOffset, endLineOffset, endLineOffsetIncrbranch: {o?:0} brzendBlockMask, loop
Copy Operationsloop:{e2:x} ai ao_n1__, ao_n1_, 0							{o4:x} shlqbii currAo_, currAo, 0{e6:1} fa ao_n4, ao_n4, nextAo	{o6:0} lqx currAo, pvColor, colorOffset{e6:1} fa ao_n3, ao_n3, ao_p3{o6:0} lqx nextAo, pvNextColor, colorOffset{e6:2} fmablurAo_, ao_n2_, w2, blurAo{o4:0} shlqbiiendLineMask_, endLineMask, 0{e6:1} fa ao_n1_, ao_n1, ao_p1	{o4:x} shlqbii ao_p1_, ao_p1, 0{e6:1} fa ao_n2_, ao_n2, ao_p2	{o4:x} shlqbii ao_p2_, ao_p2, 0{e6:3} fmablurAo___, currAo___, w0, blurAo__	{o4:x} shlqbii ao_p3_, ao_p3, 0{e6:1} fm blurAo, ao_n4, w4	{o4:0} shufb prevAo0, currAo, currAo, s_AAAA{e2:0} ceqendLineMask, colorOffset, endLineOffset				{o4:0} shufb ao_p1, currAo, nextAo, s_BCDa{e2:0} ceqendBlockMask, colorOffset_, endOffset				{o4:0} shufb ao_p2, currAo, nextAo, s_CDab{e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask		{o4:0} shufb ao_p3, currAo, nextAo, s_Dabc{e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_				{o4:0} shufb ao_n30, prevAo0, currAo, s_BCDa{e6:2} fma blurAo__, ao_n1__, w1, blurAo_{o4:0} shufb ao_n20, prevAo0, currAo, s_CDab{e6:1} fma blurAo, ao_n3, w3, blurAo	{o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc{e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask{o6:3} stqxblurAo___, pvColor, colorOffset_{e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_			{o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0{e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_{o4:x} shlqbii currAo___, currAo__, 0{e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_				{o4:x} shlqbii currAo__, currAo_, 0{e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop{e2:0} a endLineOffset, endLineOffset, endLineOffsetIncrbranch: {o?:0} brzendBlockMask, loop
Optimizedloop:{e2:x} ai ao_n1__, ao_n1_, 0							{o4:x} shlqbii currAo_, currAo, 0{e6:1} fa ao_n4, ao_n4, nextAo							{o6:0} lqx currAo, pvColor, colorOffset{e6:1} fa ao_n3, ao_n3, ao_p3							{o6:0} lqx nextAo, pvNextColor, colorOffset{e6:2} fmablurAo_, ao_n2_, w2, blurAo						{o4:0} shlqbiiendLineMask_, endLineMask, 0{e6:1} fa ao_n1_, ao_n1, ao_p1							{o4:x} shlqbii ao_p1_, ao_p1, 0{e6:1} fa ao_n2_, ao_n2, ao_p2							{o4:x} shlqbii ao_p2_, ao_p2, 0{e6:3} fmablurAo___, currAo___, w0, blurAo__					{o4:x} shlqbii ao_p3_, ao_p3, 0{e6:1} fm blurAo, ao_n4, w4							{o4:0} shufb prevAo0, currAo, currAo, s_AAAA{e2:0} ceqendLineMask, colorOffset, endLineOffset				{o4:0} shufb ao_p1, currAo, nextAo, s_BCDa{e2:0} ceqendBlockMask, colorOffset_, endOffset				{o4:0} shufb ao_p2, currAo, nextAo, s_CDab{e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask		{o4:0} shufb ao_p3, currAo, nextAo, s_Dabc{e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_				{o4:0} shufb ao_n30, prevAo0, currAo, s_BCDa{e6:2} fma blurAo__, ao_n1__, w1, blurAo_					{o4:0} shufb ao_n20, prevAo0, currAo, s_CDab{e6:1} fma blurAo, ao_n3, w3, blurAo						{o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc{e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask	{o6:3} stqxblurAo___, pvColor, colorOffset_{e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_			{o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0{e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_				{o4:x} shlqbii currAo___, currAo__, 0{e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_				{o4:x} shlqbii currAo__, currAo_, 0{e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop{e2:0} a endLineOffset, endLineOffset, endLineOffsetIncr			branch: {o?:0} brzendBlockMask, loop
OptimizedEVENODDloop:{e2:x} ai ao_n1__, ao_n1_, 0							{o4:x} shlqbii currAo_, currAo, 0{e6:1} fa ao_n4, ao_n4, nextAo							{o6:0} lqx currAo, pvColor, colorOffset{e6:1} fa ao_n3, ao_n3, ao_p3							{o6:0} lqx nextAo, pvNextColor, colorOffset{e6:2} fmablurAo_, ao_n2_, w2, blurAo						{o4:0} shlqbiiendLineMask_, endLineMask, 0{e6:1} fa ao_n1_, ao_n1, ao_p1							{o4:x} shlqbii ao_p1_, ao_p1, 0{e6:1} fa ao_n2_, ao_n2, ao_p2							{o4:x} shlqbii ao_p2_, ao_p2, 0{e6:3} fmablurAo___, currAo___, w0, blurAo__					{o4:x} shlqbii ao_p3_, ao_p3, 0{e6:1} fm blurAo, ao_n4, w4							{o4:0} shufb prevAo0, currAo, currAo, s_AAAA{e2:0} ceqendLineMask, colorOffset, endLineOffset				{o4:0} shufb ao_p1, currAo, nextAo, s_BCDa{e2:0} ceqendBlockMask, colorOffset_, endOffset				{o4:0} shufb ao_p2, currAo, nextAo, s_CDab{e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask		{o4:0} shufb ao_p3, currAo, nextAo, s_Dabc{e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_				{o4:0} shufb ao_n30, prevAo0, currAo, s_BCDa{e6:2} fma blurAo__, ao_n1__, w1, blurAo_					{o4:0} shufb ao_n20, prevAo0, currAo, s_CDab{e6:1} fma blurAo, ao_n3, w3, blurAo						{o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc{e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask	{o6:3} stqxblurAo___, pvColor, colorOffset_{e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_			{o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0{e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_				{o4:x} shlqbii currAo___, currAo__, 0{e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_				{o4:x} shlqbii currAo__, currAo_, 0{e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop{e2:0} a endLineOffset, endLineOffset, endLineOffsetIncr			branch: {o?:0} brzendBlockMask, loop
OptimizedEVENODDloop:{e2:x} ai ao_n1__, ao_n1_, 0							{o4:x} shlqbii currAo_, currAo, 0{e6:1} fa ao_n4, ao_n4, nextAo							{o6:0} lqx currAo, pvColor, colorOffset{e6:1} fa ao_n3, ao_n3, ao_p3							{o6:0} lqx nextAo, pvNextColor, colorOffset{e6:2} fmablurAo_, ao_n2_, w2, blurAo						{o4:0} shlqbiiendLineMask_, endLineMask, 0{e6:1} fa ao_n1_, ao_n1, ao_p1							{o4:x} shlqbii ao_p1_, ao_p1, 0{e6:1} fa ao_n2_, ao_n2, ao_p2							{o4:x} shlqbii ao_p2_, ao_p2, 0{e6:3} fmablurAo___, currAo___, w0, blurAo__					{o4:x} shlqbii ao_p3_, ao_p3, 0{e6:1} fm blurAo, ao_n4, w4							{o4:0} shufb prevAo0, currAo, currAo, s_AAAA{e2:0} ceqendLineMask, colorOffset, endLineOffset				{o4:0} shufb ao_p1, currAo, nextAo, s_BCDa{e2:0} ceqendBlockMask, colorOffset_, endOffset				{o4:0} shufb ao_p2, currAo, nextAo, s_CDab{e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask		{o4:0} shufb ao_p3, currAo, nextAo, s_Dabc{e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_				{o4:0} shufb ao_n30, prevAo0, currAo, s_BCDa{e6:2} fma blurAo__, ao_n1__, w1, blurAo_					{o4:0} shufb ao_n20, prevAo0, currAo, s_CDab{e6:1} fma blurAo, ao_n3, w3, blurAo						{o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc{e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask	{o6:3} stqxblurAo___, pvColor, colorOffset_{e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_			{o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0{e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_				{o4:x} shlqbii currAo___, currAo__, 0{e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_				{o4:x} shlqbii currAo__, currAo_, 0{e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop{e2:0} a endLineOffset, endLineOffset, endLineOffsetIncr			branch: {o?:0} brzendBlockMask, loop
OptimizedEVENODDloop:{e2:x} ai ao_n1__, ao_n1_, 0							{o4:x} shlqbii currAo_, currAo, 0{e6:1} fa ao_n4, ao_n4, nextAo							{o6:0} lqx currAo, pvColor, colorOffset{e6:1} fa ao_n3, ao_n3, ao_p3							{o6:0} lqx nextAo, pvNextColor, colorOffset{e6:2} fmablurAo_, ao_n2_, w2, blurAo						{o4:0} shlqbiiendLineMask_, endLineMask, 0{e6:1} fa ao_n1_, ao_n1, ao_p1							{o4:x} shlqbii ao_p1_, ao_p1, 0{e6:1} fa ao_n2_, ao_n2, ao_p2							{o4:x} shlqbii ao_p2_, ao_p2, 0{e6:3} fmablurAo___, currAo___, w0, blurAo__					{o4:x} shlqbii ao_p3_, ao_p3, 0{e6:1} fm blurAo, ao_n4, w4							{o4:0} shufb prevAo0, currAo, currAo, s_AAAA{e2:0} ceqendLineMask, colorOffset, endLineOffset				{o4:0} shufb ao_p1, currAo, nextAo, s_BCDa{e2:0} ceqendBlockMask, colorOffset_, endOffset				{o4:0} shufb ao_p2, currAo, nextAo, s_CDab{e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask		{o4:0} shufb ao_p3, currAo, nextAo, s_Dabc{e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_				{o4:0} shufb ao_n30, prevAo0, currAo, s_BCDa{e6:2} fma blurAo__, ao_n1__, w1, blurAo_					{o4:0} shufb ao_n20, prevAo0, currAo, s_CDab{e6:1} fma blurAo, ao_n3, w3, blurAo						{o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc{e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask	{o6:3} stqxblurAo___, pvColor, colorOffset_{e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_			{o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0{e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_				{o4:x} shlqbii currAo___, currAo__, 0{e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_				{o4:x} shlqbii currAo__, currAo_, 0{e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop{e2:0} a endLineOffset, endLineOffset, endLineOffsetIncr			branch: {o?:0} brzendBlockMask, loop
SolutionGives about a 2x performance boost
Sometimes more or less
SSAO went from 2ms on all 6 SPUs to 1ms on all 6 SPUs
With practice, can do about 1-2 loops per day.
SSAO has 6 loops, and took a little over a week.
Same process for:
PostFX
Fullscreen Lighting
Ambient Cubemaps
Many moreFull TimelineHand-Optimized by ICE TeamFull TimelineHand-Optimized by Uncharted 2 TeamPerformanceFinal RenderPerformanceWithout Fullscreen LightingPerformanceFullscreen Lighting ResultsPerformanceFullscreen Lighting / SSAO PerformancePerformanceFullscreen Lighting / SSAO PerformanceFullscreen LightingSSAO
PerformanceFullscreen Lighting / SSAO PerformancePerformanceFullscreen Lighting / SSAO PerformancePerformanceFullscreen Lighting / SSAO PerformancePerformanceFullscreen Lighting / SSAO PerformanceFullscreen LightingSSAO
Architecture Conclusions“So, you are the scheduler that has been nipping at my heels...”
Final Final ThoughtsGamma is super-duper important.
Filmic Tonemapping changes your life.
SSAO is cool, and keep it out of your sunlight.
SPUs are awesome.
They don’t optimize themselves.
For a tight for-loop, you can do about 2x better than the compiler.Btw...Naughty Dog is hiring!
We’re also looking for
Senior Lighting Artist.
Graphics ProgrammerThat’s it!Questions…
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting

More Related Content

What's hot

Taking Killzone Shadow Fall Image Quality Into The Next Generation
Taking Killzone Shadow Fall Image Quality Into The Next GenerationTaking Killzone Shadow Fall Image Quality Into The Next Generation
Taking Killzone Shadow Fall Image Quality Into The Next Generation
Guerrilla
 
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
Electronic Arts / DICE
 
Lighting Shading by John Hable
Lighting Shading by John HableLighting Shading by John Hable
Lighting Shading by John Hable
Naughty Dog
 
스크린 스페이스 데칼에 대해 자세히 알아보자(워햄머 40,000: 스페이스 마린)
스크린 스페이스 데칼에 대해 자세히 알아보자(워햄머 40,000: 스페이스 마린)스크린 스페이스 데칼에 대해 자세히 알아보자(워햄머 40,000: 스페이스 마린)
스크린 스페이스 데칼에 대해 자세히 알아보자(워햄머 40,000: 스페이스 마린)
포프 김
 
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
Electronic Arts / DICE
 
DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3
Electronic Arts / DICE
 
Rendering AAA-Quality Characters of Project A1
Rendering AAA-Quality Characters of Project A1Rendering AAA-Quality Characters of Project A1
Rendering AAA-Quality Characters of Project A1
Ki Hyunwoo
 
Screen Space Decals in Warhammer 40,000: Space Marine
Screen Space Decals in Warhammer 40,000: Space MarineScreen Space Decals in Warhammer 40,000: Space Marine
Screen Space Decals in Warhammer 40,000: Space Marine
Pope Kim
 
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Johan Andersson
 
Stochastic Screen-Space Reflections
Stochastic Screen-Space ReflectionsStochastic Screen-Space Reflections
Stochastic Screen-Space Reflections
Electronic Arts / DICE
 
Screen Space Reflections in The Surge
Screen Space Reflections in The SurgeScreen Space Reflections in The Surge
Screen Space Reflections in The Surge
Michele Giacalone
 
An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...
An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...
An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...
Codemotion
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
AMD Developer Central
 
FrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteFrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in Frostbite
Electronic Arts / DICE
 
Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666
Tiago Sousa
 
Photogrammetry and Star Wars Battlefront
Photogrammetry and Star Wars BattlefrontPhotogrammetry and Star Wars Battlefront
Photogrammetry and Star Wars Battlefront
Electronic Arts / DICE
 
Physically Based Lighting in Unreal Engine 4
Physically Based Lighting in Unreal Engine 4Physically Based Lighting in Unreal Engine 4
Physically Based Lighting in Unreal Engine 4
Lukas Lang
 
A Scalable Real-Time Many-Shadowed-Light Rendering System
A Scalable Real-Time Many-Shadowed-Light Rendering SystemA Scalable Real-Time Many-Shadowed-Light Rendering System
A Scalable Real-Time Many-Shadowed-Light Rendering System
Bo Li
 
Destruction Masking in Frostbite 2 using Volume Distance Fields
Destruction Masking in Frostbite 2 using Volume Distance FieldsDestruction Masking in Frostbite 2 using Volume Distance Fields
Destruction Masking in Frostbite 2 using Volume Distance Fields
Electronic Arts / DICE
 
Star Ocean 4 - Flexible Shader Managment and Post-processing
Star Ocean 4 - Flexible Shader Managment and Post-processingStar Ocean 4 - Flexible Shader Managment and Post-processing
Star Ocean 4 - Flexible Shader Managment and Post-processing
umsl snfrzb
 

What's hot (20)

Taking Killzone Shadow Fall Image Quality Into The Next Generation
Taking Killzone Shadow Fall Image Quality Into The Next GenerationTaking Killzone Shadow Fall Image Quality Into The Next Generation
Taking Killzone Shadow Fall Image Quality Into The Next Generation
 
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
 
Lighting Shading by John Hable
Lighting Shading by John HableLighting Shading by John Hable
Lighting Shading by John Hable
 
스크린 스페이스 데칼에 대해 자세히 알아보자(워햄머 40,000: 스페이스 마린)
스크린 스페이스 데칼에 대해 자세히 알아보자(워햄머 40,000: 스페이스 마린)스크린 스페이스 데칼에 대해 자세히 알아보자(워햄머 40,000: 스페이스 마린)
스크린 스페이스 데칼에 대해 자세히 알아보자(워햄머 40,000: 스페이스 마린)
 
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
 
DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3
 
Rendering AAA-Quality Characters of Project A1
Rendering AAA-Quality Characters of Project A1Rendering AAA-Quality Characters of Project A1
Rendering AAA-Quality Characters of Project A1
 
Screen Space Decals in Warhammer 40,000: Space Marine
Screen Space Decals in Warhammer 40,000: Space MarineScreen Space Decals in Warhammer 40,000: Space Marine
Screen Space Decals in Warhammer 40,000: Space Marine
 
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
 
Stochastic Screen-Space Reflections
Stochastic Screen-Space ReflectionsStochastic Screen-Space Reflections
Stochastic Screen-Space Reflections
 
Screen Space Reflections in The Surge
Screen Space Reflections in The SurgeScreen Space Reflections in The Surge
Screen Space Reflections in The Surge
 
An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...
An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...
An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
 
FrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteFrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in Frostbite
 
Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666
 
Photogrammetry and Star Wars Battlefront
Photogrammetry and Star Wars BattlefrontPhotogrammetry and Star Wars Battlefront
Photogrammetry and Star Wars Battlefront
 
Physically Based Lighting in Unreal Engine 4
Physically Based Lighting in Unreal Engine 4Physically Based Lighting in Unreal Engine 4
Physically Based Lighting in Unreal Engine 4
 
A Scalable Real-Time Many-Shadowed-Light Rendering System
A Scalable Real-Time Many-Shadowed-Light Rendering SystemA Scalable Real-Time Many-Shadowed-Light Rendering System
A Scalable Real-Time Many-Shadowed-Light Rendering System
 
Destruction Masking in Frostbite 2 using Volume Distance Fields
Destruction Masking in Frostbite 2 using Volume Distance FieldsDestruction Masking in Frostbite 2 using Volume Distance Fields
Destruction Masking in Frostbite 2 using Volume Distance Fields
 
Star Ocean 4 - Flexible Shader Managment and Post-processing
Star Ocean 4 - Flexible Shader Managment and Post-processingStar Ocean 4 - Flexible Shader Managment and Post-processing
Star Ocean 4 - Flexible Shader Managment and Post-processing
 

Similar to Hable John Uncharted2 Hdr Lighting

【Unite Tokyo 2019】Unityプログレッシブライトマッパー2019
【Unite Tokyo 2019】Unityプログレッシブライトマッパー2019【Unite Tokyo 2019】Unityプログレッシブライトマッパー2019
【Unite Tokyo 2019】Unityプログレッシブライトマッパー2019
UnityTechnologiesJapan002
 
Hdr Meets Black And White 2
Hdr Meets Black And White 2 Hdr Meets Black And White 2
Hdr Meets Black And White 2
Francesco Carucci
 
CS 354 Understanding Color
CS 354 Understanding ColorCS 354 Understanding Color
CS 354 Understanding Color
Mark Kilgard
 
Computer Graphics Part1
Computer Graphics Part1Computer Graphics Part1
Computer Graphics Part1
qpqpqp
 
Analyzing color imaging failure on consumer-grade cameras
Analyzing color imaging failure on consumer-grade camerasAnalyzing color imaging failure on consumer-grade cameras
Analyzing color imaging failure on consumer-grade cameras
SaiTedla1
 
“CMOS Image Sensors: A Guide to Building the Eyes of a Vision System,” a Pres...
“CMOS Image Sensors: A Guide to Building the Eyes of a Vision System,” a Pres...“CMOS Image Sensors: A Guide to Building the Eyes of a Vision System,” a Pres...
“CMOS Image Sensors: A Guide to Building the Eyes of a Vision System,” a Pres...
Edge AI and Vision Alliance
 
Shooting High Dynamic Range
Shooting High Dynamic RangeShooting High Dynamic Range
Shooting High Dynamic Range
Jessica Young
 
Real-time Shadowing Techniques: Shadow Volumes
Real-time Shadowing Techniques: Shadow VolumesReal-time Shadowing Techniques: Shadow Volumes
Real-time Shadowing Techniques: Shadow Volumes
Mark Kilgard
 
Hdr magic
Hdr magicHdr magic
Hdr magic
SunRidge Photo
 
Deferred shading
Deferred shadingDeferred shading
Deferred shading
Frank Chao
 
Scratch a pixel - Reflection
Scratch a pixel - ReflectionScratch a pixel - Reflection
Scratch a pixel - Reflection
Yiwei Gong
 
new_age_graphics_android_x86
new_age_graphics_android_x86new_age_graphics_android_x86
new_age_graphics_android_x86
Droidcon Berlin
 
Penner pre-integrated skin rendering (siggraph 2011 advances in real-time r...
Penner   pre-integrated skin rendering (siggraph 2011 advances in real-time r...Penner   pre-integrated skin rendering (siggraph 2011 advances in real-time r...
Penner pre-integrated skin rendering (siggraph 2011 advances in real-time r...
JP Lee
 
Image processing Presentation
Image processing PresentationImage processing Presentation
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open ProblemsHPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
Electronic Arts / DICE
 
Unity: Next Level Rendering Quality
Unity: Next Level Rendering QualityUnity: Next Level Rendering Quality
Unity: Next Level Rendering Quality
Unity Technologies
 
Rendering basics
Rendering basicsRendering basics
Rendering basics
icedmaster
 
Crysis 2-key-rendering-features
Crysis 2-key-rendering-featuresCrysis 2-key-rendering-features
Crysis 2-key-rendering-features
Raimundo Renato
 
Chapter01 (2)
Chapter01 (2)Chapter01 (2)
Chapter01 (2)
shabanam tamboli
 
A Practical and Robust Bump-mapping Technique for Today’s GPUs (slides)
A Practical and Robust Bump-mapping Technique for Today’s GPUs (slides)A Practical and Robust Bump-mapping Technique for Today’s GPUs (slides)
A Practical and Robust Bump-mapping Technique for Today’s GPUs (slides)
Mark Kilgard
 

Similar to Hable John Uncharted2 Hdr Lighting (20)

【Unite Tokyo 2019】Unityプログレッシブライトマッパー2019
【Unite Tokyo 2019】Unityプログレッシブライトマッパー2019【Unite Tokyo 2019】Unityプログレッシブライトマッパー2019
【Unite Tokyo 2019】Unityプログレッシブライトマッパー2019
 
Hdr Meets Black And White 2
Hdr Meets Black And White 2 Hdr Meets Black And White 2
Hdr Meets Black And White 2
 
CS 354 Understanding Color
CS 354 Understanding ColorCS 354 Understanding Color
CS 354 Understanding Color
 
Computer Graphics Part1
Computer Graphics Part1Computer Graphics Part1
Computer Graphics Part1
 
Analyzing color imaging failure on consumer-grade cameras
Analyzing color imaging failure on consumer-grade camerasAnalyzing color imaging failure on consumer-grade cameras
Analyzing color imaging failure on consumer-grade cameras
 
“CMOS Image Sensors: A Guide to Building the Eyes of a Vision System,” a Pres...
“CMOS Image Sensors: A Guide to Building the Eyes of a Vision System,” a Pres...“CMOS Image Sensors: A Guide to Building the Eyes of a Vision System,” a Pres...
“CMOS Image Sensors: A Guide to Building the Eyes of a Vision System,” a Pres...
 
Shooting High Dynamic Range
Shooting High Dynamic RangeShooting High Dynamic Range
Shooting High Dynamic Range
 
Real-time Shadowing Techniques: Shadow Volumes
Real-time Shadowing Techniques: Shadow VolumesReal-time Shadowing Techniques: Shadow Volumes
Real-time Shadowing Techniques: Shadow Volumes
 
Hdr magic
Hdr magicHdr magic
Hdr magic
 
Deferred shading
Deferred shadingDeferred shading
Deferred shading
 
Scratch a pixel - Reflection
Scratch a pixel - ReflectionScratch a pixel - Reflection
Scratch a pixel - Reflection
 
new_age_graphics_android_x86
new_age_graphics_android_x86new_age_graphics_android_x86
new_age_graphics_android_x86
 
Penner pre-integrated skin rendering (siggraph 2011 advances in real-time r...
Penner   pre-integrated skin rendering (siggraph 2011 advances in real-time r...Penner   pre-integrated skin rendering (siggraph 2011 advances in real-time r...
Penner pre-integrated skin rendering (siggraph 2011 advances in real-time r...
 
Image processing Presentation
Image processing PresentationImage processing Presentation
Image processing Presentation
 
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open ProblemsHPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
 
Unity: Next Level Rendering Quality
Unity: Next Level Rendering QualityUnity: Next Level Rendering Quality
Unity: Next Level Rendering Quality
 
Rendering basics
Rendering basicsRendering basics
Rendering basics
 
Crysis 2-key-rendering-features
Crysis 2-key-rendering-featuresCrysis 2-key-rendering-features
Crysis 2-key-rendering-features
 
Chapter01 (2)
Chapter01 (2)Chapter01 (2)
Chapter01 (2)
 
A Practical and Robust Bump-mapping Technique for Today’s GPUs (slides)
A Practical and Robust Bump-mapping Technique for Today’s GPUs (slides)A Practical and Robust Bump-mapping Technique for Today’s GPUs (slides)
A Practical and Robust Bump-mapping Technique for Today’s GPUs (slides)
 

More from ozlael ozlael

Unity & VR (Unity Roadshow 2016)
Unity & VR (Unity Roadshow 2016)Unity & VR (Unity Roadshow 2016)
Unity & VR (Unity Roadshow 2016)
ozlael ozlael
 
뭣이 중헌디? 성능 프로파일링도 모름서 - 유니티 성능 프로파일링 가이드 (IGC16)
뭣이 중헌디? 성능 프로파일링도 모름서 - 유니티 성능 프로파일링 가이드 (IGC16)뭣이 중헌디? 성능 프로파일링도 모름서 - 유니티 성능 프로파일링 가이드 (IGC16)
뭣이 중헌디? 성능 프로파일링도 모름서 - 유니티 성능 프로파일링 가이드 (IGC16)
ozlael ozlael
 
Optimizing mobile applications - Ian Dundore, Mark Harkness
Optimizing mobile applications - Ian Dundore, Mark HarknessOptimizing mobile applications - Ian Dundore, Mark Harkness
Optimizing mobile applications - Ian Dundore, Mark Harkness
ozlael ozlael
 
그래픽 최적화로 가...가버렷! (부제: 배치! 배칭을 보자!) , Batch! Let's take a look at Batching! -...
그래픽 최적화로 가...가버렷! (부제: 배치! 배칭을 보자!) , Batch! Let's take a look at Batching! -...그래픽 최적화로 가...가버렷! (부제: 배치! 배칭을 보자!) , Batch! Let's take a look at Batching! -...
그래픽 최적화로 가...가버렷! (부제: 배치! 배칭을 보자!) , Batch! Let's take a look at Batching! -...
ozlael ozlael
 
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) Unite Seoul Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) Unite Seoul Ver.유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) Unite Seoul Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) Unite Seoul Ver.
ozlael ozlael
 
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.
ozlael ozlael
 
Infinity Blade and beyond
Infinity Blade and beyondInfinity Blade and beyond
Infinity Blade and beyond
ozlael ozlael
 
스티브잡스처럼 프레젠테이션하기
스티브잡스처럼 프레젠테이션하기스티브잡스처럼 프레젠테이션하기
스티브잡스처럼 프레젠테이션하기
ozlael ozlael
 
유니티의 라이팅이 안 이쁘다구요? (A to Z of Lighting)
유니티의 라이팅이 안 이쁘다구요? (A to Z of Lighting)유니티의 라이팅이 안 이쁘다구요? (A to Z of Lighting)
유니티의 라이팅이 안 이쁘다구요? (A to Z of Lighting)
ozlael ozlael
 
Introduce coco2dx with cookingstar
Introduce coco2dx with cookingstarIntroduce coco2dx with cookingstar
Introduce coco2dx with cookingstar
ozlael ozlael
 
Deferred rendering case study
Deferred rendering case studyDeferred rendering case study
Deferred rendering case study
ozlael ozlael
 
Kgc make stereo game on pc
Kgc make stereo game on pcKgc make stereo game on pc
Kgc make stereo game on pc
ozlael ozlael
 
mssao presentation
mssao presentationmssao presentation
mssao presentation
ozlael ozlael
 
Modern gpu optimize blog
Modern gpu optimize blogModern gpu optimize blog
Modern gpu optimize blog
ozlael ozlael
 
Modern gpu optimize
Modern gpu optimizeModern gpu optimize
Modern gpu optimize
ozlael ozlael
 
Bickerstaff benson making3d games on the playstation3
Bickerstaff benson making3d games on the playstation3Bickerstaff benson making3d games on the playstation3
Bickerstaff benson making3d games on the playstation3
ozlael ozlael
 
DOF Depth of Field
DOF Depth of FieldDOF Depth of Field
DOF Depth of Field
ozlael ozlael
 
Hable uncharted2(siggraph%202010%20 advanced%20realtime%20rendering%20course)
Hable uncharted2(siggraph%202010%20 advanced%20realtime%20rendering%20course)Hable uncharted2(siggraph%202010%20 advanced%20realtime%20rendering%20course)
Hable uncharted2(siggraph%202010%20 advanced%20realtime%20rendering%20course)
ozlael ozlael
 
Deferred rendering in_leadwerks_engine[1]
Deferred rendering in_leadwerks_engine[1]Deferred rendering in_leadwerks_engine[1]
Deferred rendering in_leadwerks_engine[1]
ozlael ozlael
 
Deferred shading
Deferred shadingDeferred shading
Deferred shading
ozlael ozlael
 

More from ozlael ozlael (20)

Unity & VR (Unity Roadshow 2016)
Unity & VR (Unity Roadshow 2016)Unity & VR (Unity Roadshow 2016)
Unity & VR (Unity Roadshow 2016)
 
뭣이 중헌디? 성능 프로파일링도 모름서 - 유니티 성능 프로파일링 가이드 (IGC16)
뭣이 중헌디? 성능 프로파일링도 모름서 - 유니티 성능 프로파일링 가이드 (IGC16)뭣이 중헌디? 성능 프로파일링도 모름서 - 유니티 성능 프로파일링 가이드 (IGC16)
뭣이 중헌디? 성능 프로파일링도 모름서 - 유니티 성능 프로파일링 가이드 (IGC16)
 
Optimizing mobile applications - Ian Dundore, Mark Harkness
Optimizing mobile applications - Ian Dundore, Mark HarknessOptimizing mobile applications - Ian Dundore, Mark Harkness
Optimizing mobile applications - Ian Dundore, Mark Harkness
 
그래픽 최적화로 가...가버렷! (부제: 배치! 배칭을 보자!) , Batch! Let's take a look at Batching! -...
그래픽 최적화로 가...가버렷! (부제: 배치! 배칭을 보자!) , Batch! Let's take a look at Batching! -...그래픽 최적화로 가...가버렷! (부제: 배치! 배칭을 보자!) , Batch! Let's take a look at Batching! -...
그래픽 최적화로 가...가버렷! (부제: 배치! 배칭을 보자!) , Batch! Let's take a look at Batching! -...
 
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) Unite Seoul Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) Unite Seoul Ver.유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) Unite Seoul Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) Unite Seoul Ver.
 
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.
 
Infinity Blade and beyond
Infinity Blade and beyondInfinity Blade and beyond
Infinity Blade and beyond
 
스티브잡스처럼 프레젠테이션하기
스티브잡스처럼 프레젠테이션하기스티브잡스처럼 프레젠테이션하기
스티브잡스처럼 프레젠테이션하기
 
유니티의 라이팅이 안 이쁘다구요? (A to Z of Lighting)
유니티의 라이팅이 안 이쁘다구요? (A to Z of Lighting)유니티의 라이팅이 안 이쁘다구요? (A to Z of Lighting)
유니티의 라이팅이 안 이쁘다구요? (A to Z of Lighting)
 
Introduce coco2dx with cookingstar
Introduce coco2dx with cookingstarIntroduce coco2dx with cookingstar
Introduce coco2dx with cookingstar
 
Deferred rendering case study
Deferred rendering case studyDeferred rendering case study
Deferred rendering case study
 
Kgc make stereo game on pc
Kgc make stereo game on pcKgc make stereo game on pc
Kgc make stereo game on pc
 
mssao presentation
mssao presentationmssao presentation
mssao presentation
 
Modern gpu optimize blog
Modern gpu optimize blogModern gpu optimize blog
Modern gpu optimize blog
 
Modern gpu optimize
Modern gpu optimizeModern gpu optimize
Modern gpu optimize
 
Bickerstaff benson making3d games on the playstation3
Bickerstaff benson making3d games on the playstation3Bickerstaff benson making3d games on the playstation3
Bickerstaff benson making3d games on the playstation3
 
DOF Depth of Field
DOF Depth of FieldDOF Depth of Field
DOF Depth of Field
 
Hable uncharted2(siggraph%202010%20 advanced%20realtime%20rendering%20course)
Hable uncharted2(siggraph%202010%20 advanced%20realtime%20rendering%20course)Hable uncharted2(siggraph%202010%20 advanced%20realtime%20rendering%20course)
Hable uncharted2(siggraph%202010%20 advanced%20realtime%20rendering%20course)
 
Deferred rendering in_leadwerks_engine[1]
Deferred rendering in_leadwerks_engine[1]Deferred rendering in_leadwerks_engine[1]
Deferred rendering in_leadwerks_engine[1]
 
Deferred shading
Deferred shadingDeferred shading
Deferred shading
 

Hable John Uncharted2 Hdr Lighting

  • 1. Uncharted 2: HDR LightingJohn HableNaughty Dog
  • 3. Skeletons in the ClosetUsed to work for EA Black Box (Vancouver)‏
  • 6. Also used to work for EA Los Angeles
  • 8. Not PQRS (Boom Blox)‏
  • 11. GammaMostly solved in Film Industry after lots of kicking and screaming
  • 12. Never heard about it in college
  • 13. George Borshukov taught it to me back at EA
  • 15. Issue is getting more traction now
  • 16. Eugene d’Eon’s chapter in GPU Gems 3DemoWhat is halfway between white and black
  • 17. Suppose we show alternating light/dark lines.
  • 18. If we squint, we get half the linear intensity.DemoTest image with 4 rectangles.
  • 19. Real image of a computer screen shot from my camera.DemoAlternating lines of black and white (0 and 255).DemoSo if the left and right are alternating lines of 0 and 255, then what color is the rectangle with the blue dot?DemoCommon wrong answer: 127/128.
  • 20. That’s the box on the top.128
  • 23. Color RampYou have seen this before.0127255187
  • 24. Gamma Lines255F(x) = pow(x,2.2)‏1871270
  • 25. What is GammaGamma of 0.45, 1, 2.2
  • 26. Our monitors are the red oneWe all love mink...F(x) = xWe all love mink...F(x) = pow(x,0.45) note: .45 ~= 1/2.2‏We all love mink...F(x) = pow(x,2.2)‏What is GammaGet to know these curves…What is GammaBack and forth between curves.2.22.2.45.45
  • 27. How bad is it?Your monitor has a gamma of about 2.2
  • 28. So if you output the left image to the framebuffer, it will actually look like the one right.2.2
  • 29. Let’s build a Camera…A linear camera.ActualLightMonitorOutput1.02.2HardDrive
  • 30. Let’s build a Camera…Any consumer camera.ActualLightMonitorOutput.452.2HardDrive
  • 31. How to fix this?Your screen displays a gamma of 2.2
  • 32. This is stored on your hard drive.
  • 33. You never see this image, but it’s on your hard drive.2.2
  • 34. What is GammaCurves again.What is GammaQ) Why not just store as linear?A) Our eyes perceive more data in the blacks.
  • 35. How bad is it?Suppose we look at the 0-63 range.
  • 36. What if displays were linear?GammaGamma of 2.2 gives us more dark colors.How bad is it?If displays were linear:
  • 37. On a scale from 0-255
  • 39. 1 would equal our current value of 20
  • 40. 2 would equal our current value of 28
  • 41. 3 would equal our current value of 33
  • 42. Darkest Values:Ramp AgainSee any banding in the whites? 3D GraphicsIf your game is not gamma correct, it looks like the image on the left..
  • 43. Note: we’re talking about “correct”, not “good”3D GraphicsHere's what is going on.ScreenHardDriveLighting
  • 44. 3D GraphicsAccount for gamma when reading textures
  • 45. color = pow( tex2D( Sampler, Uv ), 2.2 )‏
  • 46. Do your lighting calculations
  • 47. Account for gamma on color output
  • 48. finalcolor = pow( color, 1/2.2 )‏3D GraphicsHere's what is going on.MonitorAdjustHardDriveLightingShaderCorrectGamma
  • 49. 3D GraphicsComparison again...3D GraphicsThe Wrong Shader?Spec = CalSpec();Diff = tex2D( Sampler, UV );Color = Diff * max( 0, dot( N, L ) ) + Spec;return Color;
  • 50. 3D GraphicsThe Right Shader?Spec = CalSpec();Diff = pow( tex2D( Sampler, UV ), 2.2 );Color = Diff * max( 0, dot( N, L ) ) + Spec;return pow( Color, 1/2.2);
  • 51. 3D GraphicsBut, there is hardware to do this for us.
  • 56. D3DRS_SRGBWRITEENABLE3D GraphicsWith those states we can remove the pow functions.Spec = CalSpec();Diff = tex2D( Sampler, UV );Color = Diff * max( 0, dot( N, L ) ) + Spec;return Color;
  • 57. 3D GraphicsWhat does the real world look like?
  • 58. Notice the harsh line.3D GraphicsAnother ExampleFAQsQ1) But I like the soft falloff!A1) Don’t be so sure.
  • 59. FAQsIt looks like the one on top before the monitor’s 2.2Harsh FalloffDevastating.Harsh FalloffDevastating.Harsh FalloffDevastating.Which Maps?Which maps should be Linear-Space vs Gamma-Space?
  • 61. Use sRGB hardware or pow(2.2) on read
  • 64. Don’t use sRGB or pow(2.2) on read
  • 65. 128 = 0.5Which Maps?Diffuse Map
  • 69. Uncharted 2 had them as Linear
  • 70. Artists have trouble tweaking them to look right
  • 71. Probably should be in GammaWhich Maps?Ambient Occlusion
  • 72. Technically, it’s a mathematical value like a normal map.
  • 73. But artists tweak them a lot and bake extra lighting into them.
  • 74. Uncharted 2 had them as Linear
  • 75. Probably should be GammaExercise for the ReaderGamma 2.2 != sRGB != Xenon PWL
  • 76. sRGB: PC and PS3 gamma
  • 77. Xenon PWL: Piecewise linear gammaXenon GotchasXbox 360 Gamma Curve is wonky
  • 78. Way too bright at the low end.
  • 79. HDR the Bungie Way, Chris Tchou
  • 80. Post Processing in The Orange Box, Alex Vlachos
  • 81. Output curve is extra contrasty
  • 82. Henry LaBounta was going to talk about it.
  • 83. Try hooking up your Xenon to a waveform monitor and display a test pattern. Prepare to be mortified.
  • 84. Both factors counteract each otherLinear-Space Lighting: Conclusion“Drake! You must believe in linear-space lighting!”
  • 85. Part 2: Filmic Tonemaping
  • 86. Filmic TonemappingLet's talk about the magic of the human eye.Filmic TonemappingReal example from my apartment.
  • 90. My eye somehow adjusts...Filmic TonemappingAdaptiveness of human eye.Filmic TonemappingAdaptiveness of human eye.Filmic TonemappingAdaptiveness of human eye.Filmic TonemappingAdaptiveness of human eye.Filmic TonemappingAdaptiveness of human eye.Filmic TonemappingAdaptiveness of human eye.Filmic TonemappingAdaptiveness of human eye.Filmic TonemappingAdaptiveness of human eye.Filmic TonemappingOne moreFilmic TonemappingOne moreGammaLet’s take an HDR image
  • 91. Note: 1 F-Stop is a power of two
  • 92. So 4 F-stops is 2^4=16x difference in intensityGammaExposure: -4GammaExposure: -2GammaExposure: 0GammaExposure: 2GammaExposure: 4Now what?We actually need 8 stops
  • 95. Re-tweaking exposure won't helpGammaThis is what Photography people call tonemapping.
  • 97. This one is local
  • 98. The one we’ll use is globalPoint of ConfusionTwo problems to solve.
  • 100. Dynamic Range in different shots
  • 105. Within a Shot – HDR Tonemapping
  • 106. Different Shots – Choosing the correct exposure?
  • 108. Within a Shot – HDR Tonemapping
  • 109. Different Shots – HDR Tonemapping?TerminologyFor this presentation
  • 110. Within a Shot – HDR Tonemapping
  • 113. Dynamic IrisLinear CurveApartment of Habib Zargarpour
  • 115. Used to be Art Director on LMNO at EALinear CurveOutColor = pow(x,1/2.2)‏ReinhardMost common tonemapping operator is Reinhard
  • 118. Yes, I’m oversimplifiying for time.Fade!Let's fade from linear to Reinhard.Fade!Let's fade from linear to Reinhard.Fade!Let's fade from linear to Reinhard.Fade!Let's fade from linear to Reinhard.Fade!Let's fade from linear to Reinhard.Fade!Let's fade from linear to Reinhard.Why Does this happenStart with orange (255,88,21)‏Linear CurveLinear curve at the low end and high endBad GammaBetter blacks actually, but wrong color at the bottom and horrible hue-shifting.ReinhardDesaturated blacks, but nice top end.Color ChangesGee...if only we could get:
  • 119. The crisper, more saturated blacks of improper gamma.
  • 120. The nice soft highlights of Reinhard
  • 121. And get the input colors to actually match like pure linear.Voila!Guess what? There is a solution!Voila!Solution by Haarm-Peter Duiker
  • 122. CG Supervisor at Digital Domain
  • 123. Used to be a CG Supervisor on LMNO at EAFilmQ) Why do they still use film in movies?A) The look.
  • 125. FilmUsing a film curve solves tons of problems.
  • 126. Film vs. Digital purists.
  • 127. Who would’ve thought: Kodak knows how to make good film.Filmic CurveCrisp blacks, saturated dark end, and nice highlights.Filmic TonemappingThis technique works with any color.ReinhardLinearFilmic
  • 128. Filmic TonemappingThis technique works with any color.ReinhardLinearFilmic
  • 129. Filmic TonemappingThis technique works with any color.ReinhardLinearFilmic
  • 130. Filmic TonemappingThis technique works with any color.ReinhardLinearFilmic
  • 131. Filmic TonemappingThis technique works with any color.ReinhardLinearFilmic
  • 132. Filmic TonemappingThis technique works with any color.ReinhardLinearFilmic
  • 133. Filmic TonemappingThis technique works with any color.ReinhardLinearFilmic
  • 134. Filmic TonemappingThis technique works with any color.ReinhardLinearFilmic
  • 135. Filmic TonemappingThis technique works with any color.ReinhardLinearFilmic
  • 136. Filmic TonemappingIn terms of “Crisp Blacks” vs. “Milky Blacks”
  • 137. Filmic > Linear > Reinhard
  • 138. In terms of “Soft Highights” vs. “Clamped Highlights”
  • 139. Filmic = Reinhard > LinearFilmic TonemappingLinear.Filmic TonemappingFade from linear to filmic.Filmic TonemappingFade from linear to filmic.Filmic TonemappingFade from linear to filmic.Filmic TonemappingFade from linear to filmic.Filmic TonemappingFade from linear to filmic.Filmic TonemappingFade from linear to filmic.Filmic vs ReinhardFade from linear to Reinhard.Filmic TonemappingFade from Reinhard to filmic.Filmic TonemappingFade from Reinhard to filmic.Filmic TonemappingFade from Reinhard to filmic.Filmic TonemappingFade from Reinhard to filmic.Filmic TonemappingFade from Reinhard to filmic.Filmic TonemappingFade from Reinhard to filmic.Filmic TonemappingLet's do a comparison.Filmic TonemappingThis is with a linear curve.Filmic TonemappingAnd here is filmic.Filmic TonemappingFirst, we'll go up.Filmic TonemappingExposure 0Filmic TonemappingExposure +1Filmic TonemappingExposure +2Filmic TonemappingExposure +3Filmic TonemappingExposure +4Filmic TonemappingOk, now that's bad.
  • 140. What happens if we go down?
  • 141. Notice the crisper blacks in the filmic version.Filmic TonemappingExposure: 0Filmic TonemappingExposure: -1Filmic TonemappingExposure: -2Filmic TonemappingExposure: -3Filmic TonemappingExposure: -4Filmic TonemappingColors get crisper and more saturated at the bottom end.Filmic Tonemappingfloat3 ld = 0.002;float linReference = 0.18;float logReference = 444;float logGamma = 0.45;outColor.rgb = (log10(0.4*outColor.rgb/linReference)/ld*logGamma + logReference)/1023.f;outColor.rgb = clamp(outColor.rgb , 0, 1);float FilmLutWidth = 256;float Padding = .5/FilmLutWidth;outColor.r = tex2D(FilmLutSampler, float2( lerp(Padding,1-Padding,outColor.r), .5)).r;outColor.g = tex2D(FilmLutSampler, float2( lerp(Padding,1-Padding,outColor.g), .5)).r;outColor.b = tex2D(FilmLutSampler, float2( lerp(Padding,1-Padding,outColor.b), .5)).r;
  • 142. Filmic TonemappingVariation in white section:254248224240252
  • 145. Jim Hejl to the rescue Principle Member of Technical Staff GPU Group, Office of the CTO AMD ResearchUsed to work for EAMore MagicAwesome approximation with only ALU ops.
  • 146. Replaces the entire Lin/Log and Texture LUT
  • 147. Includes the pow(x,1/2.2)‏x = max(0, LinearColor-0.004);GammaColor = (x*(6.2*x+0.5))/(x*(6.2*x+1.7)+0.06);
  • 150. Most monitors are too contrasty, strong toe makes it worse
  • 151. May want less shoulder
  • 152. The more range, the more shoulder
  • 153. If your scene lacks range, you want to tone down the shoulder a bitFilmic TonemappingA = Shoulder StrengthB = Linear StrengthC = Linear AngleD = Toe StrengthE = Toe NumeratorF = Toe DenominatorNote: E/F = Toe AngleLinearWhite = Linear White Point ValueF(x) = ((x*(A*x+C*B)+D*E)/(x*(A*x+B)+D*F)) - E/F;FinalColor = F(LinearColor)/F(LinearWhite)‏
  • 154. Filmic TonemappingShoulder Strength = 0.22Linear Strength = 0.30Linear Angle = 0.10Toe Strength = 0.20Toe Numerator = 0.01Toe Denominator = 0.30Linear White Point Value = 11.2These numbers DO NOT have the pow(x,1/2.2) baked in
  • 156. IMO: The most important post effect
  • 157. Changes your life completely
  • 158. Once you have it, you can't live without it
  • 159. If you implement it, you have to redo all your lighting
  • 160. You can't just switch it on if your lighting is tweaked for no dynamic range.Filmic Tonemapping“At least I’m not getting choked in gamma space...”
  • 164. Let’s talk about why you need AO.Why AO?The shadow “grounds” the car.Why AO?Unless you’re already in the shadow!Why AO?Your shadow can't help you now!Why AO?Your shadow can't help you now!Why AO?Your shadow can't help you now!Why AO?Your shadow can't help you now!How important is it?The shadow from the sun grounds object.
  • 165. But if you are already in shadow, you need something else...
  • 166. It's the AO that grounds the car in those shots.
  • 167. So what if we took the AO out?
  • 168. Trick taught to me by Habib (the guy with the awesome condo earlier)‏Why AO?Your shadow can't help you now!Why AO?Your shadow can't help you now!Why AO?Your shadow can't help you now!Why AO?Your shadow can't help you now!Why AO?Your shadow can't help you now!Why AO?Your shadow can't help you now!Why AO?Your shadow can't help you now!Why AO?Your shadow can't help you now!How to use AO?Sun is Directional Light
  • 169. Sky is Hemisphere Light (ish)‏How to use AO?Sun is Yellow (ish)‏
  • 170. Sky is Blue (ish)‏How to use AO?Sun is Directional
  • 171. Approximate Directional Shadow with Shadow MappingHow to use AO?Sky is Hemisphere (sorta)‏
  • 172. Approximate Hemisphere Shadow with AOHow to use AO?Sun is yellow, sky is blue.How to use AO?
  • 173. How to use AO?
  • 174. How to use AO?
  • 175. How to use AO?
  • 176. How to use AO?
  • 177. SunlightQ) Why not have it affect sunlight?A) It does the exact opposite of what you want it to do.
  • 178. SSAO ExampleOffSSAO ExampleOnSSAO ExampleSSAO Darkens Shadows – Good
  • 182. My Opinion: SSAO should NOT affect Direct LightThe AlgorithmOnly Input is Half-Res Depth
  • 184. No special filtering (I.e. Bilateral)‏
  • 186. Adjacent Depth tiles included as wellThe AlgorithmCalculate Base AO
  • 189. Apply Low Pass FilterSSAO Passes0) Base Image
  • 191. SSAO Passes2) Dilate Horizontal
  • 193. SSAO Passes4) Low Pass Filter (I.e. blur the $&%@ out of it)‏
  • 194. SSAO PassesStep One: The Base AO
  • 196. Calculate AO based on nearby depth samplesSSAO PassesPink point on a black surface.SSAO PassesCorrect way is to look at the hemisphere...SSAO Passes...and find the area that is visible to the point.
  • 197. Our solution is much hackier.SSAO PassesWho needs sphere? A box will do just as well.SSAO PassesInstead of area though, we will approximate volume.SSAO PassesWe can estimate volume without knowing the normal.
  • 198. Half Volume = Fully UnoccludedSSAO PassesWe can estimate volume without knowing the normal.
  • 199. Half Volume = Fully UnoccludedSSAO PassesWe can estimate volume without knowing the normal.
  • 200. Half Volume = Fully UnoccludedSSAO PassesWe can estimate volume without knowing the normal.
  • 201. Half Volume = Fully UnoccludedSSAO PassesStart with a pointSSAO PassesAnd sample pairs of points.SSAO PassesWhat if we have an occluder?SSAO PassesWhat if we have an occluder?SSAO PassesWe lose depth info.SSAO PassesWe could call it occluded.SSAO PassesBut, is that correct? Maybe not.SSAO PassesLet's choose a cutoff, and mark the pair as invalid.SSAO PassesSample pattern?SSAO PassesSample pattern. Discard in pairs.SSAO PassesNote the single light outline.SSAO PassesFind discontinuties...SSAO PassesFind discontinuties...SSAO PassesFind discontinuties...SSAO PassesFind discontinuties...SSAO Passes...and dilate at discontinuties. Here is original.SSAO PassesAfter X dilate.SSAO PassesAfter Y dilate. And the light outline is gone.SSAO PassesAnd do a gaussian blur.SSAO PassesFinalArtifactsWhite OutlineCrossing PatternDepth Discontinuities
  • 203. Crossing PatternArtifactsMore Crossing patternsArtifactsDepth Discontinuities. Notice a little anti-halo.ArtifactsHard to see in real levels though.ArtifactsArtifacts are usually only noticeable to the trained eye
  • 205. Have option to turn it off per surface
  • 206. Most snow fades at distance
  • 207. Some objects apply sqrt() to brighten it
  • 208. Some objects just have it offMore ShotsOnMore ShotsOffMore ShotsHere's a few more shots.More ShotsHere's a few more shots.More ShotsHere's a few more shots.More ShotsHere's a few more shots.More ShotsHere's a few more shots.More ShotsHere's a few more shots.More ShotsHere's a few more shots.More ShotsHere's a few more shots.More ShotsHere's a few more shots.More ShotsHere's a few more shots.SSAO ConclusionGrounds characters and other objects
  • 210. Artifacts are not too bad
  • 211. Could use better filtering for halos
  • 212. Runs on SPUs, and doesn’t need normal bufferPart 4: Architecture
  • 214. Here is the base overview.
  • 215. Skip some thingsRendering PassesFinal outputRendering PassesDepth and Normal, 2x MSAARendering PassesShadow depths for cascadesRendering PassesCascade calculation to screen-spaceRendering PassesMeanwhile on the SPUs...Rendering PassesSSAORendering PassesFullscreen lighting. Diffuse and specular.
  • 216. Calculate lighting in TilesRendering PassesBack on the GPU...Rendering PassesRender to RGBM, 2x MSAARendering PassesResolve RGBM 2x MSAA to FP16 1xRendering PassesTransparent objects and post.Full TimelineFull Rendering PipelineFull TimelineFull Render in 3 framesFrame 1Begin Frame 1.Frame 1TimelineSPUsGPUPPU
  • 217. Frame 1Begin Frame 1.Frame 1All PPU stuff.Frame 1PPU kicks off SPU jobsFrame 2Does most of the rendering.Frame 2Vertex processing on SPUs...Frame 2...kicks of 2x DepthNormal pass.Frame 2Resolve 2x MSAA buffers to 1x and copy.Frame 2Spotlight shadow depth pass.Frame 2Sunlight shadow depth pass.Frame 2Kick Fullscreen Lighting SPU jobs.Frame 2SSAO Jobs.Frame 2Motion BlurFrame 2Resolve shadow to screen. Copy Fullscreen lighting.Frame 2Add Fullscreen shadowed lights.Frame 2Standard Pass Vertex Processing on SPUs.Frame 2Standard pass. Writes to 2x RGBM.Frame 2Resolve 2x RGBM to 1x FP16Frame 2Full-Res Alpha-Blended effects.Frame 2Half-Res Particles and Composite.Frame 3Now for the 3rd frame.Frame 3Start PostFX jobs at end of Frame 2.Frame 3Fill in PostFX jobs in Frame 3 with low priority.Frame 3Read from main memory, do distortion and HUD.Full TimelineAll together.SPU OptimizationQuick notes on how we optimize SPU code.PerformanceYes! You can optimize SPU code by hand!
  • 218. This was news to me when I started here.PerformanceLet’s talk about Laundry
  • 219. Say you have 5 loads to do
  • 220. 1 washer, 1 dryerIn-OrderLoad 1 in Washer
  • 221. Load 1 in Dryer
  • 222. Load 2 in Washer
  • 223. Load 2 in Dryer
  • 224. Load 3 in Washer
  • 225. Load 3 in Dryer
  • 226. Load 4 in Washer
  • 227. Load 4 in Dryer
  • 228. Load 5 in Washer
  • 229. Load 5 in DryerPipelinedLoad 1 in Washer
  • 230. Load 2 in Washer, Load 1 in Dryer
  • 231. Load 3 in Washer, Load 2 in Dryer
  • 232. Load 4 in Washer, Load 3 in Dryer
  • 233. Load 5 in Washer, Load 4 in Dryer
  • 234. Load 5 in DryerSoftware PipeliningWe can do this in software
  • 236. It’s how we optimize SPU code by hand
  • 237. Pal Engstad (our Lead Graphics Programmer) will be putting a doc online explaining the full process.
  • 238. Should be online: look for the post on the blog
  • 241. www.naughtydog.com/docs/gdc2010/intro-spu-optimizations-part-2.pdfSPU C++ Intrinsicsfor( int y = 0; y < clampHeight; y++ ) {intlColorOffset = y * kAoBufferLineWidth; VF32 currWordAo = pvColor[ lColorOffset / 4 ];prevWordAo[0] = prevWordAo[1] = prevWordAo[2] = prevWordAo[3] = currWordAo[0]; ao_n4 = prevWordAo; ao_n3 = (VF32)si_shufb( (qword)prevWordAo, (qword)currWordAo, (qword)s_BCDa ); ao_n2 = (VF32)si_shufb( (qword)prevWordAo, (qword)currWordAo, (qword)s_CDab ); ao_n1 = (VF32)si_shufb( (qword)prevWordAo, (qword)currWordAo, (qword)s_Dabc );ao_curr = currWordAo; for (int x = 0; x < kAoBufferLineWidth; x+=4) { VF32 nextWordAo = pvColor[ (lColorOffset + x + 4)/ 4 ]; VF32 ao_p1 = (VF32)si_shufb( (qword)currWordAo, (qword)nextWordAo, (qword)s_BCDa ); VF32 ao_p2 = (VF32)si_shufb( (qword)currWordAo, (qword)nextWordAo, (qword)s_CDab ); VF32 ao_p3 = (VF32)si_shufb( (qword)currWordAo, (qword)nextWordAo, (qword)s_Dabc ); VF32 ao_p4 = nextWordAo; VF32 blurAo = ao_n4*w4 + ao_n3*w3 + ao_n2*w2 + ao_n1*w1 + ao_curr*w0; blurAo += ao_p1*w1 + ao_p2*w2 + ao_p3*w3 + ao_p4*w4;pvColor[ (lColorOffset + x)/ 4 ] = blurAo;prevWordAo = currWordAo;currWordAo = nextWordAo; ao_n4 = prevWordAo; ao_n3 = ao_p1; ao_n2 = ao_p2; ao_n1 = ao_p3;ao_curr = currWordAo; } }
  • 242. SPU Assembly 1/2EVENODDloop: {nop} {o6} lqxcurrAo, pvColor, colorOffset {nop} {o4} shufb prevAo0, currAo, currAo, s_AAAA {nop} {o4} shufb ao_n30, prevAo0, currAo, s_BCDa {nop} {o4} shufb ao_n20, prevAo0, currAo, s_CDab {nop} {o4} shufb ao_n10, prevAo0, currAo, s_Dabc {e2} selb ao_n4, ao_n4, prevAo0, endLineMask {lnop} {e2} selb ao_n3, ao_n3, ao_n30, endLineMask {lnop} {e2} selb ao_n2, ao_n2, ao_n20, endLineMask {lnop} {e2} selb ao_n1, ao_n1, ao_n10, endLineMask {lnop} {nop} {o6} lqxnextAo, pvNextColor, colorOffset {nop} {o4} shufb ao_p1, currAo, nextAo, s_BCDa {nop} {o4} shufb ao_p2, currAo, nextAo, s_CDab {nop} {o4} shufb ao_p3, currAo, nextAo, s_Dabc {e6} fa ao_n4, ao_n4, nextAo {lnop} {e6} fa ao_n3, ao_n3, ao_p3 {lnop} {e6} fa ao_n2, ao_n2, ao_p2 {lnop} {e6} fa ao_n1, ao_n1, ao_p1 {lnop}
  • 243. SPU Assembly 2/2EVENODD {e6} fm blurAo, ao_n4, w4 {lnop} {e6} fma blurAo, ao_n3, w3, blurAo {lnop} {e6} fma blurAo, ao_n2, w2, blurAo {lnop} {e6} fma blurAo, ao_n1, w1, blurAo {lnop} {e6} fmablurAo, currAo, w0, blurAo {lnop} {nop} {o6} stqxblurAo, pvColor, colorOffset {nop} {o4} shlqbii ao_n4, currAo, 0 {nop} {o4} shlqbii ao_n3, ao_p1, 0 {nop} {o4} shlqbii ao_n2, ao_p2, 0 {nop} {o4} shlqbii ao_n1, ao_p3, 0 {e2} ceqendLineMask, colorOffset, endLineOffset {lnop} {e2} ceqendBlockMask, colorOffset, endOffset {lnop} {e2} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask {lnop} {e2} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask {lnop} {e2} a endLineOffset, endLineOffset, endLineOffsetIncr {lnop} {e2} a colorOffset, colorOffset, colorOffsetIncr {lnop} {nop} branch: {o?} brzendBlockMask, loop
  • 244. SPU Assembly{e6} faao_n3, ao_n3, ao_p3{o6}lqxnextAo, pvNextColor, colorOffsetHere is a sample line of assembly code
  • 245. SPUs issue one even and odd instruction per cycle
  • 246. One line = One cycle
  • 247. e6 = even, 6 cycle latency
  • 248. o6 = odd, 6 cycle latencyIteration 1loop:nop lnopnop lnop nop lnopnop lnopnop lnop nop lnopnop lnopnop lnopnop lnop nop lnopnop lnopnop lnop nop lnopnop lnopnop lnop nop lnopnop lnop nop lnopnop lnopnop branch: {o?:0} brzendBlockMask, loop
  • 249. Iteration 1loop:nop lnopnop {o6:0} lqx currAo, pvColor, colorOffsetnop {o6:0} lqx nextAo, pvNextColor, colorOffsetnop {o4:0} shlqbiiendLineMask_, endLineMask, 0nop lnop nop lnopnop lnopnop {o4:0} shufb prevAo0, currAo, currAo, s_AAAA{e2:0} ceqendLineMask, colorOffset, endLineOffset {o4:0} shufb ao_p1, currAo, nextAo, s_BCDa{e2:0} ceqendBlockMask, colorOffset_, endOffset {o4:0} shufb ao_p2, currAo, nextAo, s_CDab{e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask {o4:0} shufb ao_p3, currAo, nextAo, s_Dabc{e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_ {o4:0} shufb ao_n30, prevAo0, currAo, s_BCDanop {o4:0} shufb ao_n20, prevAo0, currAo, s_CDabnop {o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc{e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask lnop{e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_ {o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0{e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_lnop{e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_ lnop{e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop{e2:0} a endLineOffset, endLineOffset, endLineOffsetIncrbranch: {o?:0} brzendBlockMask, loop
  • 250. Iteration 1loop:nop lnopnop {o6:0} lqx currAo, pvColor, colorOffsetnop {o6:0} lqx nextAo, pvNextColor, colorOffsetnop {o4:0} shlqbiiendLineMask_, endLineMask, 0nop lnop nop lnopnop lnopnop {o4:0} shufb prevAo0, currAo, currAo, s_AAAA{e2:0} ceqendLineMask, colorOffset, endLineOffset {o4:0} shufb ao_p1, currAo, nextAo, s_BCDa{e2:0} ceqendBlockMask, colorOffset_, endOffset {o4:0} shufb ao_p2, currAo, nextAo, s_CDab{e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask {o4:0} shufb ao_p3, currAo, nextAo, s_Dabc{e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_ {o4:0} shufb ao_n30, prevAo0, currAo, s_BCDanop {o4:0} shufb ao_n20, prevAo0, currAo, s_CDabnop {o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc{e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask lnop{e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_ {o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0{e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_lnop{e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_ lnop{e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop{e2:0} a endLineOffset, endLineOffset, endLineOffsetIncrbranch: {o?:0} brzendBlockMask, loop
  • 251. ProblemsTake this line: {o6} lqxcurrAo, pvColor, colorOffset {o4} shufb prevAo0, currAo, currAo, s_AAAALoad a value into currAo
  • 253. currAo won’t be ready yet
  • 254. Stall for 5 cyclesIteration 1loop:nop lnopnop {o6:0} lqx currAo, pvColor, colorOffsetnop {o6:0} lqx nextAo, pvNextColor, colorOffsetnop {o4:0} shlqbiiendLineMask_, endLineMask, 0nop lnop nop lnopnop lnopnop {o4:0} shufb prevAo0, currAo, currAo, s_AAAA{e2:0} ceqendLineMask, colorOffset, endLineOffset {o4:0} shufb ao_p1, currAo, nextAo, s_BCDa{e2:0} ceqendBlockMask, colorOffset_, endOffset {o4:0} shufb ao_p2, currAo, nextAo, s_CDab{e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask {o4:0} shufb ao_p3, currAo, nextAo, s_Dabc{e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_ {o4:0} shufb ao_n30, prevAo0, currAo, s_BCDanop {o4:0} shufb ao_n20, prevAo0, currAo, s_CDabnop {o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc{e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask lnop{e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_ {o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0{e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_lnop{e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_ lnop{e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop{e2:0} a endLineOffset, endLineOffset, endLineOffsetIncrbranch: {o?:0} brzendBlockMask, loop
  • 255. Iteration 1loop:nop lnopnop {o6:0} lqx currAo, pvColor, colorOffsetnop {o6:0} lqx nextAo, pvNextColor, colorOffsetnop {o4:0} shlqbiiendLineMask_, endLineMask, 0nop lnop nop lnopnop lnopnop {o4:0} shufb prevAo0, currAo, currAo, s_AAAA{e2:0} ceqendLineMask, colorOffset, endLineOffset {o4:0} shufb ao_p1, currAo, nextAo, s_BCDa{e2:0} ceqendBlockMask, colorOffset_, endOffset {o4:0} shufb ao_p2, currAo, nextAo, s_CDab{e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask {o4:0} shufb ao_p3, currAo, nextAo, s_Dabc{e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_ {o4:0} shufb ao_n30, prevAo0, currAo, s_BCDanop {o4:0} shufb ao_n20, prevAo0, currAo, s_CDabnop {o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc{e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask lnop{e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_ {o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0{e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_lnop{e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_ lnop{e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop{e2:0} a endLineOffset, endLineOffset, endLineOffsetIncrbranch: {o?:0} brzendBlockMask, loop
  • 256. Iteration 2loop:nop lnop{e6:1} fa ao_n4, ao_n4, nextAo {o6:0} lqx currAo, pvColor, colorOffset{e6:1} fa ao_n3, ao_n3, ao_p3{o6:0} lqx nextAo, pvNextColor, colorOffsetnop{o4:0} shlqbiiendLineMask_, endLineMask, 0{e6:1} fa ao_n1_, ao_n1, ao_p1 lnop{e6:1} fa ao_n2_, ao_n2, ao_p2 lnopnoplnop{e6:1} fm blurAo, ao_n4, w4 {o4:0} shufb prevAo0, currAo, currAo, s_AAAA{e2:0} ceqendLineMask, colorOffset, endLineOffset {o4:0} shufb ao_p1, currAo, nextAo, s_BCDa{e2:0} ceqendBlockMask, colorOffset_, endOffset {o4:0} shufb ao_p2, currAo, nextAo, s_CDab{e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask {o4:0} shufb ao_p3, currAo, nextAo, s_Dabc{e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_ {o4:0} shufb ao_n30, prevAo0, currAo, s_BCDanop{o4:0} shufb ao_n20, prevAo0, currAo, s_CDab{e6:1} fma blurAo, ao_n3, w3, blurAo {o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc{e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMasklnop {e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_ {o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0{e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_lnop{e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_ lnop{e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop{e2:0} a endLineOffset, endLineOffset, endLineOffsetIncrbranch: {o?:0} brzendBlockMask, loop
  • 257. Iteration 3loop:nop lnop{e6:1} fa ao_n4, ao_n4, nextAo {o6:0} lqx currAo, pvColor, colorOffset{e6:1} fa ao_n3, ao_n3, ao_p3{o6:0} lqx nextAo, pvNextColor, colorOffset{e6:2} fmablurAo_, ao_n2_, w2, blurAo{o4:0} shlqbiiendLineMask_, endLineMask, 0{e6:1} fa ao_n1_, ao_n1, ao_p1 lnop{e6:1} fa ao_n2_, ao_n2, ao_p2 lnopnop lnop{e6:1} fm blurAo, ao_n4, w4 {o4:0} shufb prevAo0, currAo, currAo, s_AAAA{e2:0} ceqendLineMask, colorOffset, endLineOffset {o4:0} shufb ao_p1, currAo, nextAo, s_BCDa{e2:0} ceqendBlockMask, colorOffset_, endOffset {o4:0} shufb ao_p2, currAo, nextAo, s_CDab{e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask {o4:0} shufb ao_p3, currAo, nextAo, s_Dabc{e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_ {o4:0} shufb ao_n30, prevAo0, currAo, s_BCDa{e6:2} fma blurAo__, ao_n1__, w1, blurAo_{o4:0} shufb ao_n20, prevAo0, currAo, s_CDab{e6:1} fma blurAo, ao_n3, w3, blurAo {o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc{e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMasklnop {e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_ {o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0{e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_lnop{e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_ lnop{e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop{e2:0} a endLineOffset, endLineOffset, endLineOffsetIncrbranch: {o?:0} brzendBlockMask, loop
  • 258. Iteration 4loop:nop lnop{e6:1} fa ao_n4, ao_n4, nextAo {o6:0} lqx currAo, pvColor, colorOffset{e6:1} fa ao_n3, ao_n3, ao_p3{o6:0} lqx nextAo, pvNextColor, colorOffset{e6:2} fmablurAo_, ao_n2_, w2, blurAo{o4:0} shlqbiiendLineMask_, endLineMask, 0{e6:1} fa ao_n1_, ao_n1, ao_p1 lnop{e6:1} fa ao_n2_, ao_n2, ao_p2 lnop{e6:3} fmablurAo___, currAo___, w0, blurAo__lnop{e6:1} fm blurAo, ao_n4, w4 {o4:0} shufb prevAo0, currAo, currAo, s_AAAA{e2:0} ceqendLineMask, colorOffset, endLineOffset {o4:0} shufb ao_p1, currAo, nextAo, s_BCDa{e2:0} ceqendBlockMask, colorOffset_, endOffset {o4:0} shufb ao_p2, currAo, nextAo, s_CDab{e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask {o4:0} shufb ao_p3, currAo, nextAo, s_Dabc{e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_ {o4:0} shufb ao_n30, prevAo0, currAo, s_BCDa{e6:2} fma blurAo__, ao_n1__, w1, blurAo_{o4:0} shufb ao_n20, prevAo0, currAo, s_CDab{e6:1} fma blurAo, ao_n3, w3, blurAo {o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc{e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask{o6:3} stqxblurAo___, pvColor, colorOffset_{e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_ {o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0{e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_lnop{e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_ lnop{e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop{e2:0} a endLineOffset, endLineOffset, endLineOffsetIncrbranch: {o?:0} brzendBlockMask, loop
  • 259. Copy Operationsloop:{e2:x} ai ao_n1__, ao_n1_, 0 {o4:x} shlqbii currAo_, currAo, 0{e6:1} fa ao_n4, ao_n4, nextAo {o6:0} lqx currAo, pvColor, colorOffset{e6:1} fa ao_n3, ao_n3, ao_p3{o6:0} lqx nextAo, pvNextColor, colorOffset{e6:2} fmablurAo_, ao_n2_, w2, blurAo{o4:0} shlqbiiendLineMask_, endLineMask, 0{e6:1} fa ao_n1_, ao_n1, ao_p1 {o4:x} shlqbii ao_p1_, ao_p1, 0{e6:1} fa ao_n2_, ao_n2, ao_p2 {o4:x} shlqbii ao_p2_, ao_p2, 0{e6:3} fmablurAo___, currAo___, w0, blurAo__ {o4:x} shlqbii ao_p3_, ao_p3, 0{e6:1} fm blurAo, ao_n4, w4 {o4:0} shufb prevAo0, currAo, currAo, s_AAAA{e2:0} ceqendLineMask, colorOffset, endLineOffset {o4:0} shufb ao_p1, currAo, nextAo, s_BCDa{e2:0} ceqendBlockMask, colorOffset_, endOffset {o4:0} shufb ao_p2, currAo, nextAo, s_CDab{e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask {o4:0} shufb ao_p3, currAo, nextAo, s_Dabc{e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_ {o4:0} shufb ao_n30, prevAo0, currAo, s_BCDa{e6:2} fma blurAo__, ao_n1__, w1, blurAo_{o4:0} shufb ao_n20, prevAo0, currAo, s_CDab{e6:1} fma blurAo, ao_n3, w3, blurAo {o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc{e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask{o6:3} stqxblurAo___, pvColor, colorOffset_{e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_ {o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0{e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_{o4:x} shlqbii currAo___, currAo__, 0{e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_ {o4:x} shlqbii currAo__, currAo_, 0{e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop{e2:0} a endLineOffset, endLineOffset, endLineOffsetIncrbranch: {o?:0} brzendBlockMask, loop
  • 260. Optimizedloop:{e2:x} ai ao_n1__, ao_n1_, 0 {o4:x} shlqbii currAo_, currAo, 0{e6:1} fa ao_n4, ao_n4, nextAo {o6:0} lqx currAo, pvColor, colorOffset{e6:1} fa ao_n3, ao_n3, ao_p3 {o6:0} lqx nextAo, pvNextColor, colorOffset{e6:2} fmablurAo_, ao_n2_, w2, blurAo {o4:0} shlqbiiendLineMask_, endLineMask, 0{e6:1} fa ao_n1_, ao_n1, ao_p1 {o4:x} shlqbii ao_p1_, ao_p1, 0{e6:1} fa ao_n2_, ao_n2, ao_p2 {o4:x} shlqbii ao_p2_, ao_p2, 0{e6:3} fmablurAo___, currAo___, w0, blurAo__ {o4:x} shlqbii ao_p3_, ao_p3, 0{e6:1} fm blurAo, ao_n4, w4 {o4:0} shufb prevAo0, currAo, currAo, s_AAAA{e2:0} ceqendLineMask, colorOffset, endLineOffset {o4:0} shufb ao_p1, currAo, nextAo, s_BCDa{e2:0} ceqendBlockMask, colorOffset_, endOffset {o4:0} shufb ao_p2, currAo, nextAo, s_CDab{e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask {o4:0} shufb ao_p3, currAo, nextAo, s_Dabc{e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_ {o4:0} shufb ao_n30, prevAo0, currAo, s_BCDa{e6:2} fma blurAo__, ao_n1__, w1, blurAo_ {o4:0} shufb ao_n20, prevAo0, currAo, s_CDab{e6:1} fma blurAo, ao_n3, w3, blurAo {o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc{e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask {o6:3} stqxblurAo___, pvColor, colorOffset_{e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_ {o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0{e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_ {o4:x} shlqbii currAo___, currAo__, 0{e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_ {o4:x} shlqbii currAo__, currAo_, 0{e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop{e2:0} a endLineOffset, endLineOffset, endLineOffsetIncr branch: {o?:0} brzendBlockMask, loop
  • 261. OptimizedEVENODDloop:{e2:x} ai ao_n1__, ao_n1_, 0 {o4:x} shlqbii currAo_, currAo, 0{e6:1} fa ao_n4, ao_n4, nextAo {o6:0} lqx currAo, pvColor, colorOffset{e6:1} fa ao_n3, ao_n3, ao_p3 {o6:0} lqx nextAo, pvNextColor, colorOffset{e6:2} fmablurAo_, ao_n2_, w2, blurAo {o4:0} shlqbiiendLineMask_, endLineMask, 0{e6:1} fa ao_n1_, ao_n1, ao_p1 {o4:x} shlqbii ao_p1_, ao_p1, 0{e6:1} fa ao_n2_, ao_n2, ao_p2 {o4:x} shlqbii ao_p2_, ao_p2, 0{e6:3} fmablurAo___, currAo___, w0, blurAo__ {o4:x} shlqbii ao_p3_, ao_p3, 0{e6:1} fm blurAo, ao_n4, w4 {o4:0} shufb prevAo0, currAo, currAo, s_AAAA{e2:0} ceqendLineMask, colorOffset, endLineOffset {o4:0} shufb ao_p1, currAo, nextAo, s_BCDa{e2:0} ceqendBlockMask, colorOffset_, endOffset {o4:0} shufb ao_p2, currAo, nextAo, s_CDab{e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask {o4:0} shufb ao_p3, currAo, nextAo, s_Dabc{e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_ {o4:0} shufb ao_n30, prevAo0, currAo, s_BCDa{e6:2} fma blurAo__, ao_n1__, w1, blurAo_ {o4:0} shufb ao_n20, prevAo0, currAo, s_CDab{e6:1} fma blurAo, ao_n3, w3, blurAo {o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc{e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask {o6:3} stqxblurAo___, pvColor, colorOffset_{e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_ {o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0{e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_ {o4:x} shlqbii currAo___, currAo__, 0{e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_ {o4:x} shlqbii currAo__, currAo_, 0{e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop{e2:0} a endLineOffset, endLineOffset, endLineOffsetIncr branch: {o?:0} brzendBlockMask, loop
  • 262. OptimizedEVENODDloop:{e2:x} ai ao_n1__, ao_n1_, 0 {o4:x} shlqbii currAo_, currAo, 0{e6:1} fa ao_n4, ao_n4, nextAo {o6:0} lqx currAo, pvColor, colorOffset{e6:1} fa ao_n3, ao_n3, ao_p3 {o6:0} lqx nextAo, pvNextColor, colorOffset{e6:2} fmablurAo_, ao_n2_, w2, blurAo {o4:0} shlqbiiendLineMask_, endLineMask, 0{e6:1} fa ao_n1_, ao_n1, ao_p1 {o4:x} shlqbii ao_p1_, ao_p1, 0{e6:1} fa ao_n2_, ao_n2, ao_p2 {o4:x} shlqbii ao_p2_, ao_p2, 0{e6:3} fmablurAo___, currAo___, w0, blurAo__ {o4:x} shlqbii ao_p3_, ao_p3, 0{e6:1} fm blurAo, ao_n4, w4 {o4:0} shufb prevAo0, currAo, currAo, s_AAAA{e2:0} ceqendLineMask, colorOffset, endLineOffset {o4:0} shufb ao_p1, currAo, nextAo, s_BCDa{e2:0} ceqendBlockMask, colorOffset_, endOffset {o4:0} shufb ao_p2, currAo, nextAo, s_CDab{e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask {o4:0} shufb ao_p3, currAo, nextAo, s_Dabc{e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_ {o4:0} shufb ao_n30, prevAo0, currAo, s_BCDa{e6:2} fma blurAo__, ao_n1__, w1, blurAo_ {o4:0} shufb ao_n20, prevAo0, currAo, s_CDab{e6:1} fma blurAo, ao_n3, w3, blurAo {o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc{e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask {o6:3} stqxblurAo___, pvColor, colorOffset_{e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_ {o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0{e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_ {o4:x} shlqbii currAo___, currAo__, 0{e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_ {o4:x} shlqbii currAo__, currAo_, 0{e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop{e2:0} a endLineOffset, endLineOffset, endLineOffsetIncr branch: {o?:0} brzendBlockMask, loop
  • 263. OptimizedEVENODDloop:{e2:x} ai ao_n1__, ao_n1_, 0 {o4:x} shlqbii currAo_, currAo, 0{e6:1} fa ao_n4, ao_n4, nextAo {o6:0} lqx currAo, pvColor, colorOffset{e6:1} fa ao_n3, ao_n3, ao_p3 {o6:0} lqx nextAo, pvNextColor, colorOffset{e6:2} fmablurAo_, ao_n2_, w2, blurAo {o4:0} shlqbiiendLineMask_, endLineMask, 0{e6:1} fa ao_n1_, ao_n1, ao_p1 {o4:x} shlqbii ao_p1_, ao_p1, 0{e6:1} fa ao_n2_, ao_n2, ao_p2 {o4:x} shlqbii ao_p2_, ao_p2, 0{e6:3} fmablurAo___, currAo___, w0, blurAo__ {o4:x} shlqbii ao_p3_, ao_p3, 0{e6:1} fm blurAo, ao_n4, w4 {o4:0} shufb prevAo0, currAo, currAo, s_AAAA{e2:0} ceqendLineMask, colorOffset, endLineOffset {o4:0} shufb ao_p1, currAo, nextAo, s_BCDa{e2:0} ceqendBlockMask, colorOffset_, endOffset {o4:0} shufb ao_p2, currAo, nextAo, s_CDab{e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask {o4:0} shufb ao_p3, currAo, nextAo, s_Dabc{e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_ {o4:0} shufb ao_n30, prevAo0, currAo, s_BCDa{e6:2} fma blurAo__, ao_n1__, w1, blurAo_ {o4:0} shufb ao_n20, prevAo0, currAo, s_CDab{e6:1} fma blurAo, ao_n3, w3, blurAo {o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc{e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask {o6:3} stqxblurAo___, pvColor, colorOffset_{e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_ {o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0{e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_ {o4:x} shlqbii currAo___, currAo__, 0{e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_ {o4:x} shlqbii currAo__, currAo_, 0{e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop{e2:0} a endLineOffset, endLineOffset, endLineOffsetIncr branch: {o?:0} brzendBlockMask, loop
  • 264. SolutionGives about a 2x performance boost
  • 266. SSAO went from 2ms on all 6 SPUs to 1ms on all 6 SPUs
  • 267. With practice, can do about 1-2 loops per day.
  • 268. SSAO has 6 loops, and took a little over a week.
  • 270. PostFX
  • 273. Many moreFull TimelineHand-Optimized by ICE TeamFull TimelineHand-Optimized by Uncharted 2 TeamPerformanceFinal RenderPerformanceWithout Fullscreen LightingPerformanceFullscreen Lighting ResultsPerformanceFullscreen Lighting / SSAO PerformancePerformanceFullscreen Lighting / SSAO PerformanceFullscreen LightingSSAO
  • 274. PerformanceFullscreen Lighting / SSAO PerformancePerformanceFullscreen Lighting / SSAO PerformancePerformanceFullscreen Lighting / SSAO PerformancePerformanceFullscreen Lighting / SSAO PerformanceFullscreen LightingSSAO
  • 275. Architecture Conclusions“So, you are the scheduler that has been nipping at my heels...”
  • 276. Final Final ThoughtsGamma is super-duper important.
  • 278. SSAO is cool, and keep it out of your sunlight.
  • 280. They don’t optimize themselves.
  • 281. For a tight for-loop, you can do about 2x better than the compiler.Btw...Naughty Dog is hiring!