Culling the Battlefield  Daniel Collin (DICE)Monday, March 7, 2011
OverviewMonday, March 7, 2011
Overview    ›BackgroundMonday, March 7, 2011
Overview    ›Background    ›Why rewrite it?Monday, March 7, 2011
Overview    ›Background    ›Why rewrite it?    ›RequirementsMonday, March 7, 2011
Overview    ›Background    ›Why rewrite it?    ›Requirements    ›Target hardwareMonday, March 7, 2011
Overview    ›Background    ›Why rewrite it?    ›Requirements    ›Target hardware    ›DetailsMonday, March 7, 2011
Overview    ›Background    ›Why rewrite it?    ›Requirements    ›Target hardware    ›Details    ›Software OcclusionMonday,...
Overview    ›Background    ›Why rewrite it?    ›Requirements    ›Target hardware    ›Details    ›Software Occlusion    ›Co...
Background of the old cullingMonday, March 7, 2011
Background of the old culling    ›Hierarchical Sphere TreesMonday, March 7, 2011
Background of the old culling    ›Hierarchical Sphere Trees    ›StaticCullTreeMonday, March 7, 2011
Background of the old culling    ›Hierarchical Sphere Trees    ›StaticCullTree    ›DynamicCullTreeMonday, March 7, 2011
Background of the old cullingMonday, March 7, 2011
Why rewrite it?Monday, March 7, 2011
Why rewrite it?    ›DynamicCullTree scalingMonday, March 7, 2011
Why rewrite it?    ›DynamicCullTree scaling    ›Sub-levelsMonday, March 7, 2011
Why rewrite it?    ›DynamicCullTree scaling    ›Sub-levels    ›Pipeline dependenciesMonday, March 7, 2011
Why rewrite it?    ›DynamicCullTree scaling    ›Sub-levels    ›Pipeline dependencies    ›Hard to scaleMonday, March 7, 2011
Why rewrite it?    ›DynamicCullTree scaling    ›Sub-levels    ›Pipeline dependencies    ›Hard to scale    ›One job per fru...
Job graph (Old Culling)      Job 0      Job 1      Job 2      Job 3BitmasksMonday, March 7, 2011
Job graph (Old Culling)      Job 0             DynamicCullJob (Shadow 1, 2, View frustum)      Job 1      Job 2      Job 3...
Job graph (Old Culling)      Job 0                   DynamicCullJob (Shadow 1, 2, View frustum)                        Sha...
Job graph (Old Culling)      Job 0                   DynamicCullJob (Shadow 1, 2, View frustum)                        Sha...
Job graph (Old Culling)      Job 0                   DynamicCullJob (Shadow 1, 2, View frustum)                        Sha...
Job graph (Old Culling)      Job 0                   DynamicCullJob (Shadow 1, 2, View frustum)                        Sha...
Job graph (Old Culling)      Job 0                   DynamicCullJob (Shadow 1, 2, View frustum)                        Sha...
Job graph (Old Culling)      Job 0                   DynamicCullJob (Shadow 1, 2, View frustum)                        Sha...
Job graph (Old Culling)      Job 0                   DynamicCullJob (Shadow 1, 2, View frustum)                        Sha...
Job graph (Old Culling)      Job 0                   DynamicCullJob (Shadow 1, 2, View frustum)                        Sha...
Requirements for new systemMonday, March 7, 2011
Requirements for new system›Better scalingMonday, March 7, 2011
Requirements for new system›Better scaling›DestructionMonday, March 7, 2011
Requirements for new system›Better scaling›Destruction›Real-time editingMonday, March 7, 2011
Requirements for new system›Better scaling›Destruction›Real-time editing›Simpler codeMonday, March 7, 2011
Requirements for new system›Better scaling›Destruction›Real-time editing›Simpler code›Unification of sub-systemsMonday, Mar...
Target hardwareMonday, March 7, 2011
What doesn’t work well on these systems?Monday, March 7, 2011
What doesn’t work well on these systems?›Non-local dataMonday, March 7, 2011
What doesn’t work well on these systems?›Non-local data›BranchesMonday, March 7, 2011
What doesn’t work well on these systems?›Non-local data›Branches›Switching between register types (LHS)Monday, March 7, 2011
What doesn’t work well on these systems?›Non-local data›Branches›Switching between register types (LHS)›Tree based structu...
What doesn’t work well on these systems?›Non-local data›Branches›Switching between register types (LHS)›Tree based structu...
What does work well on these systems?Monday, March 7, 2011
What does work well on these systems?›Local dataMonday, March 7, 2011
What does work well on these systems?›Local data›(SIMD) Computing powerMonday, March 7, 2011
What does work well on these systems?›Local data›(SIMD) Computing power›ParallelismMonday, March 7, 2011
The new cullingMonday, March 7, 2011
The new culling›Our worlds usually has max ~15000 objectsMonday, March 7, 2011
The new culling›Our worlds usually has max ~15000 objects›First try was to just use parallel brute forceMonday, March 7, 2...
The new culling›Our worlds usually has max ~15000 objects›First try was to just use parallel brute force›3x times faster t...
The new culling›Our worlds usually has max ~15000 objects›First try was to just use parallel brute force›3x times faster t...
The new culling›Our worlds usually has max ~15000 objects›First try was to just use parallel brute force›3x times faster t...
The new cullingMonday, March 7, 2011
The new culling›Linear arrays scale greatMonday, March 7, 2011
Upcoming SlideShare
Loading in...5
×

Culling the Battlefield: Data Oriented Design in Practice

76,524

Published on

This talk will highlight the evolution of the object culling system used in the Frostbite engine over the years and why we decide to rewrite a system for BATTLEFIELD 3 that had worked well for 4 shipping titles. The new culling system is developed using a data oriented design that favors simple data layouts which enables very efficient computation using pipelined vector instructions. Concrete examples of how code is developed with this approach and the implications and benefits compared to traditional tree-based systems will be given.

Published in: Entertainment & Humor
0 Comments
17 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
76,524
On Slideshare
0
From Embeds
0
Number of Embeds
66
Actions
Shares
0
Downloads
253
Comments
0
Likes
17
Embeds 0
No embeds

No notes for slide

Culling the Battlefield: Data Oriented Design in Practice

  1. 1. Culling the Battlefield Daniel Collin (DICE)Monday, March 7, 2011
  2. 2. OverviewMonday, March 7, 2011
  3. 3. Overview ›BackgroundMonday, March 7, 2011
  4. 4. Overview ›Background ›Why rewrite it?Monday, March 7, 2011
  5. 5. Overview ›Background ›Why rewrite it? ›RequirementsMonday, March 7, 2011
  6. 6. Overview ›Background ›Why rewrite it? ›Requirements ›Target hardwareMonday, March 7, 2011
  7. 7. Overview ›Background ›Why rewrite it? ›Requirements ›Target hardware ›DetailsMonday, March 7, 2011
  8. 8. Overview ›Background ›Why rewrite it? ›Requirements ›Target hardware ›Details ›Software OcclusionMonday, March 7, 2011
  9. 9. Overview ›Background ›Why rewrite it? ›Requirements ›Target hardware ›Details ›Software Occlusion ›ConclusionMonday, March 7, 2011
  10. 10. Background of the old cullingMonday, March 7, 2011
  11. 11. Background of the old culling ›Hierarchical Sphere TreesMonday, March 7, 2011
  12. 12. Background of the old culling ›Hierarchical Sphere Trees ›StaticCullTreeMonday, March 7, 2011
  13. 13. Background of the old culling ›Hierarchical Sphere Trees ›StaticCullTree ›DynamicCullTreeMonday, March 7, 2011
  14. 14. Background of the old cullingMonday, March 7, 2011
  15. 15. Why rewrite it?Monday, March 7, 2011
  16. 16. Why rewrite it? ›DynamicCullTree scalingMonday, March 7, 2011
  17. 17. Why rewrite it? ›DynamicCullTree scaling ›Sub-levelsMonday, March 7, 2011
  18. 18. Why rewrite it? ›DynamicCullTree scaling ›Sub-levels ›Pipeline dependenciesMonday, March 7, 2011
  19. 19. Why rewrite it? ›DynamicCullTree scaling ›Sub-levels ›Pipeline dependencies ›Hard to scaleMonday, March 7, 2011
  20. 20. Why rewrite it? ›DynamicCullTree scaling ›Sub-levels ›Pipeline dependencies ›Hard to scale ›One job per frustumMonday, March 7, 2011
  21. 21. Job graph (Old Culling) Job 0 Job 1 Job 2 Job 3BitmasksMonday, March 7, 2011
  22. 22. Job graph (Old Culling) Job 0 DynamicCullJob (Shadow 1, 2, View frustum) Job 1 Job 2 Job 3BitmasksMonday, March 7, 2011
  23. 23. Job graph (Old Culling) Job 0 DynamicCullJob (Shadow 1, 2, View frustum) Shadow 1 Job 1 frustum Job 2 Job 3BitmasksMonday, March 7, 2011
  24. 24. Job graph (Old Culling) Job 0 DynamicCullJob (Shadow 1, 2, View frustum) Shadow 1 Job 1 frustum Job 2 Job 3BitmasksMonday, March 7, 2011
  25. 25. Job graph (Old Culling) Job 0 DynamicCullJob (Shadow 1, 2, View frustum) Shadow 1 Job 1 frustum Shadow 2 Job 2 frustum Job 3BitmasksMonday, March 7, 2011
  26. 26. Job graph (Old Culling) Job 0 DynamicCullJob (Shadow 1, 2, View frustum) Shadow 1 Job 1 frustum Shadow 2 Job 2 frustum Job 3BitmasksMonday, March 7, 2011
  27. 27. Job graph (Old Culling) Job 0 DynamicCullJob (Shadow 1, 2, View frustum) Shadow 1 Job 1 frustum Shadow 2 Job 2 frustum Job 3 View frustumBitmasksMonday, March 7, 2011
  28. 28. Job graph (Old Culling) Job 0 DynamicCullJob (Shadow 1, 2, View frustum) Shadow 1 Job 1 frustum Shadow 2 Job 2 frustum Job 3 View frustumBitmasksMonday, March 7, 2011
  29. 29. Job graph (Old Culling) Job 0 DynamicCullJob (Shadow 1, 2, View frustum) Shadow 1 Job 1 frustum Merge Job Shadow 2 Job 2 frustum Merge Job Job 3 View frustum Merge JobBitmasksMonday, March 7, 2011
  30. 30. Job graph (Old Culling) Job 0 DynamicCullJob (Shadow 1, 2, View frustum) Shadow 1 Job 1 frustum Merge Job Shadow 2 Job 2 frustum Merge Job Job 3 View frustum Merge JobBitmasksMonday, March 7, 2011
  31. 31. Requirements for new systemMonday, March 7, 2011
  32. 32. Requirements for new system›Better scalingMonday, March 7, 2011
  33. 33. Requirements for new system›Better scaling›DestructionMonday, March 7, 2011
  34. 34. Requirements for new system›Better scaling›Destruction›Real-time editingMonday, March 7, 2011
  35. 35. Requirements for new system›Better scaling›Destruction›Real-time editing›Simpler codeMonday, March 7, 2011
  36. 36. Requirements for new system›Better scaling›Destruction›Real-time editing›Simpler code›Unification of sub-systemsMonday, March 7, 2011
  37. 37. Target hardwareMonday, March 7, 2011
  38. 38. What doesn’t work well on these systems?Monday, March 7, 2011
  39. 39. What doesn’t work well on these systems?›Non-local dataMonday, March 7, 2011
  40. 40. What doesn’t work well on these systems?›Non-local data›BranchesMonday, March 7, 2011
  41. 41. What doesn’t work well on these systems?›Non-local data›Branches›Switching between register types (LHS)Monday, March 7, 2011
  42. 42. What doesn’t work well on these systems?›Non-local data›Branches›Switching between register types (LHS)›Tree based structures are usually branch heavyMonday, March 7, 2011
  43. 43. What doesn’t work well on these systems?›Non-local data›Branches›Switching between register types (LHS)›Tree based structures are usually branch heavy›Data is the most important thing to addressMonday, March 7, 2011
  44. 44. What does work well on these systems?Monday, March 7, 2011
  45. 45. What does work well on these systems?›Local dataMonday, March 7, 2011
  46. 46. What does work well on these systems?›Local data›(SIMD) Computing powerMonday, March 7, 2011
  47. 47. What does work well on these systems?›Local data›(SIMD) Computing power›ParallelismMonday, March 7, 2011
  48. 48. The new cullingMonday, March 7, 2011
  49. 49. The new culling›Our worlds usually has max ~15000 objectsMonday, March 7, 2011
  50. 50. The new culling›Our worlds usually has max ~15000 objects›First try was to just use parallel brute forceMonday, March 7, 2011
  51. 51. The new culling›Our worlds usually has max ~15000 objects›First try was to just use parallel brute force›3x times faster than the old cullingMonday, March 7, 2011
  52. 52. The new culling›Our worlds usually has max ~15000 objects›First try was to just use parallel brute force›3x times faster than the old culling›1/5 code sizeMonday, March 7, 2011
  53. 53. The new culling›Our worlds usually has max ~15000 objects›First try was to just use parallel brute force›3x times faster than the old culling›1/5 code size›Easier to optimize even furtherMonday, March 7, 2011
  54. 54. The new cullingMonday, March 7, 2011
  55. 55. The new culling›Linear arrays scale greatMonday, March 7, 2011
  56. 56. The new culling›Linear arrays scale great›Predictable dataMonday, March 7, 2011
  57. 57. The new culling›Linear arrays scale great›Predictable data›Few branchesMonday, March 7, 2011
  58. 58. The new culling›Linear arrays scale great›Predictable data›Few branches›Uses the computing powerMonday, March 7, 2011
  59. 59. The new cullingMonday, March 7, 2011
  60. 60. The new cullingMonday, March 7, 2011
  61. 61. The new cullingMonday, March 7, 2011
  62. 62. Performance numbers (no occlusion) 15000 SpheresMonday, March 7, 2011
  63. 63. Performance numbers (no occlusion) 15000 Spheres Platform 1 Job 4 Jobs Xbox 360 1.55 ms (2.10 ms / 4) = 0.52 x86 (Core i7, 2.66 GHz) 1.0 ms (1.30 ms / 4) = 0.32 Playstation 3 0.85 ms ((0.95 ms / 4) = 0.23 Playstation 3 (SPA) 0.63 ms (0.75 ms / 4) = 0.18Monday, March 7, 2011
  64. 64. Performance numbers (no occlusion) 15000 Spheres Platform 1 Job 4 Jobs Xbox 360 1.55 ms (2.10 ms / 4) = 0.52 x86 (Core i7, 2.66 GHz) 1.0 ms (1.30 ms / 4) = 0.32 Playstation 3 0.85 ms ((0.95 ms / 4) = 0.23 Playstation 3 (SPA) 0.63 ms (0.75 ms / 4) = 0.18Monday, March 7, 2011
  65. 65. Performance numbers (no occlusion) 15000 Spheres Platform 1 Job 4 Jobs Xbox 360 1.55 ms (2.10 ms / 4) = 0.52 x86 (Core i7, 2.66 GHz) 1.0 ms (1.30 ms / 4) = 0.32 Playstation 3 0.85 ms ((0.95 ms / 4) = 0.23 Playstation 3 (SPA) 0.63 ms (0.75 ms / 4) = 0.18Monday, March 7, 2011
  66. 66. Performance numbers (no occlusion) 15000 Spheres Platform 1 Job 4 Jobs Xbox 360 1.55 ms (2.10 ms / 4) = 0.52 x86 (Core i7, 2.66 GHz) 1.0 ms (1.30 ms / 4) = 0.32 Playstation 3 0.85 ms ((0.95 ms / 4) = 0.23 Playstation 3 (SPA) 0.63 ms (0.75 ms / 4) = 0.18Monday, March 7, 2011
  67. 67. Performance numbers (no occlusion) 15000 Spheres Platform 1 Job 4 Jobs Xbox 360 1.55 ms (2.10 ms / 4) = 0.52 x86 (Core i7, 2.66 GHz) 1.0 ms (1.30 ms / 4) = 0.32 Playstation 3 0.85 ms ((0.95 ms / 4) = 0.23 Playstation 3 (SPA) 0.63 ms (0.75 ms / 4) = 0.18Monday, March 7, 2011
  68. 68. Details of the new cullingMonday, March 7, 2011
  69. 69. Details of the new culling›Improve performance with a simple gridMonday, March 7, 2011
  70. 70. Details of the new culling›Improve performance with a simple grid›Really an AABB assigned to a “cell” with spheresMonday, March 7, 2011
  71. 71. Details of the new culling›Improve performance with a simple grid›Really an AABB assigned to a “cell” with spheres›Separate grids forMonday, March 7, 2011
  72. 72. Details of the new culling›Improve performance with a simple grid›Really an AABB assigned to a “cell” with spheres›Separate grids for›Monday, March 7, 2011
  73. 73. Details of the new culling›Improve performance with a simple grid›Really an AABB assigned to a “cell” with spheres›Separate grids for›› Rendering: StaticMonday, March 7, 2011
  74. 74. Details of the new culling›Improve performance with a simple grid›Really an AABB assigned to a “cell” with spheres›Separate grids for›› Rendering: Static› Rendering: DynamicMonday, March 7, 2011
  75. 75. Details of the new culling›Improve performance with a simple grid›Really an AABB assigned to a “cell” with spheres›Separate grids for›› Rendering: Static› Rendering: Dynamic› Physics: StaticMonday, March 7, 2011
  76. 76. Details of the new culling›Improve performance with a simple grid›Really an AABB assigned to a “cell” with spheres›Separate grids for›› Rendering: Static› Rendering: Dynamic› Physics: Static› Physics: DynamicMonday, March 7, 2011
  77. 77. Data layout EntityGridCell BlockBlock* Pointer Pointer Pointer positions x, y, z, r x, y, z, r … u8 Count Count Count entityInfo handle, Handle, … u32 Total Count transformData … … … struct TransformData { half rotation[4]; half minAabb[3]; half pad[1]; half maxAabb[3]; half scale[3]; };Monday, March 7, 2011
  78. 78. Adding objects • Pre-allocated array that we can grab data from AtomicAdd(…) to “alloc” new block 4k 4k … EntityGridCell Block* Pointer Pointer Pointer u8 Count Count Count u32 Total CountMonday, March 7, 2011
  79. 79. Adding objects Cell 0 Cell 1 Cell 2 Cell 3Monday, March 7, 2011
  80. 80. Adding objects Cell 0 Cell 1 Cell 4 Cell 2 Cell 3Monday, March 7, 2011
  81. 81. Adding objects Cell 0 Cell 1 Cell 4 Cell 2 Cell 3Monday, March 7, 2011
  82. 82. Removing objectsMonday, March 7, 2011
  83. 83. Removing objects›Use the “swap trick”Monday, March 7, 2011
  84. 84. Removing objects›Use the “swap trick”›Data doesn’t need to be sortedMonday, March 7, 2011
  85. 85. Removing objects›Use the “swap trick”›Data doesn’t need to be sorted›Just swap with the last entry and decrease the countMonday, March 7, 2011
  86. 86. Rendering culling • Let’s look at what the rendering expects struct EntityRenderCullInfo { Handle entity; // handle to the entity u16 visibleViews; // bits of which frustums that was visible u16 classId; // type of mesh float screenArea; // at which screen area entity should be culled };Monday, March 7, 2011
  87. 87. Culling code while (1) { uint blockIter = interlockedIncrement(currentBlockIndex) - 1;   if (blockIter >= blockCount) break;   u32 masks[EntityGridCell::Block::MaxCount] = {}, frustumMask = 1;   block = gridCell->blocks[blockIter];   foreach (frustum in frustums, frustumMask <<= 1) { for (i = 0; i < gridCell->blockCounts[blockIter]; ++i) { u32 inside = intersect(frustum, block->postition[i]); masks[i] |= frustumMask & inside; } }   for (i = 0; i < gridCell->blockCounts[blockIter]; ++i) { // filter list here (if masks[i] is zero it should be skipped) // ... } }Monday, March 7, 2011
  88. 88. Culling code while (1) { uint blockIter = interlockedIncrement(currentBlockIndex) - 1;   if (blockIter >= blockCount) break;   u32 masks[EntityGridCell::Block::MaxCount] = {}, frustumMask = 1;   block = gridCell->blocks[blockIter];   foreach (frustum in frustums, frustumMask <<= 1) { for (i = 0; i < gridCell->blockCounts[blockIter]; ++i) { u32 inside = intersect(frustum, block->postition[i]); masks[i] |= frustumMask & inside; } }   for (i = 0; i < gridCell->blockCounts[blockIter]; ++i) { // filter list here (if masks[i] is zero it should be skipped) // ... } }Monday, March 7, 2011
  89. 89. Culling code while (1) { uint blockIter = interlockedIncrement(currentBlockIndex) - 1;   if (blockIter >= blockCount) break;   u32 masks[EntityGridCell::Block::MaxCount] = {}, frustumMask = 1;   block = gridCell->blocks[blockIter];   foreach (frustum in frustums, frustumMask <<= 1) { for (i = 0; i < gridCell->blockCounts[blockIter]; ++i) { u32 inside = intersect(frustum, block->postition[i]); masks[i] |= frustumMask & inside; } }   for (i = 0; i < gridCell->blockCounts[blockIter]; ++i) { // filter list here (if masks[i] is zero it should be skipped) // ... } }Monday, March 7, 2011
  90. 90. Culling code while (1) { uint blockIter = interlockedIncrement(currentBlockIndex) - 1;   if (blockIter >= blockCount) break;   u32 masks[EntityGridCell::Block::MaxCount] = {}, frustumMask = 1;   block = gridCell->blocks[blockIter];   foreach (frustum in frustums, frustumMask <<= 1) { for (i = 0; i < gridCell->blockCounts[blockIter]; ++i) { u32 inside = intersect(frustum, block->postition[i]); masks[i] |= frustumMask & inside; } }   for (i = 0; i < gridCell->blockCounts[blockIter]; ++i) { // filter list here (if masks[i] is zero it should be skipped) // ... } }Monday, March 7, 2011
  91. 91. Culling code while (1) { uint blockIter = interlockedIncrement(currentBlockIndex) - 1;   if (blockIter >= blockCount) break;   u32 masks[EntityGridCell::Block::MaxCount] = {}, frustumMask = 1;   block = gridCell->blocks[blockIter];   foreach (frustum in frustums, frustumMask <<= 1) { for (i = 0; i < gridCell->blockCounts[blockIter]; ++i) { u32 inside = intersect(frustum, block->postition[i]); masks[i] |= frustumMask & inside; } }   for (i = 0; i < gridCell->blockCounts[blockIter]; ++i) { // filter list here (if masks[i] is zero it should be skipped) // ... } }Monday, March 7, 2011
  92. 92. Intersection Code bool intersect(const Plane* frustumPlanes, Vec4 pos) { float radius = pos.w; if (distance(frustumPlanes[Frustum::Far], pos) > radius) return false; if (distance(frustumPlanes[Frustum::Near], pos) > radius) return false; if (distance(frustumPlanes[Frustum::Right], pos) > radius) return false; if (distance(frustumPlanes[Frustum::Left], pos) > radius) return false; if (distance(frustumPlanes[Frustum::Upper], pos) > radius) return false; if (distance(frustumPlanes[Frustum::Lower], pos) > radius) return false;   return true; }Monday, March 7, 2011
  93. 93. Monday, March 7, 2011
  94. 94. See “Typical C++ Bullshit” by @mike_actonMonday, March 7, 2011
  95. 95. Intersection Code bool intersect(const Plane* frustumPlanes, Vec4 pos) { float radius = pos.w; if (distance(frustumPlanes[Frustum::Far], pos) > radius) return false; if (distance(frustumPlanes[Frustum::Near], pos) > radius) return false; if (distance(frustumPlanes[Frustum::Right], pos) > radius) return false; if (distance(frustumPlanes[Frustum::Left], pos) > radius) return false; if (distance(frustumPlanes[Frustum::Upper], pos) > radius) return false; if (distance(frustumPlanes[Frustum::Lower], pos) > radius) return false; return true; }Monday, March 7, 2011
  96. 96. Intersection Code LHS! bool intersect(const Plane* frustumPlanes, Vec4 pos) { float radius = pos.w; if (distance(frustumPlanes[Frustum::Far], pos) > radius) return false; if (distance(frustumPlanes[Frustum::Near], pos) > radius) return false; if (distance(frustumPlanes[Frustum::Right], pos) > radius) return false; if (distance(frustumPlanes[Frustum::Left], pos) > radius) return false; if (distance(frustumPlanes[Frustum::Upper], pos) > radius) return false; if (distance(frustumPlanes[Frustum::Lower], pos) > radius) return false; return true; }Monday, March 7, 2011
  97. 97. Intersection Code LHS! bool intersect(const Plane* frustumPlanes, Vec4 pos) { float radius = pos.w; if (distance(frustumPlanes[Frustum::Far], pos) > radius) return false; if (distance(frustumPlanes[Frustum::Near], pos) > radius) LHS! return false; if (distance(frustumPlanes[Frustum::Right], pos) > radius) return false; if (distance(frustumPlanes[Frustum::Left], pos) > radius) return false; if (distance(frustumPlanes[Frustum::Upper], pos) > radius) return false; if (distance(frustumPlanes[Frustum::Lower], pos) > radius) return false; return true; }Monday, March 7, 2011
  98. 98. Intersection Code LHS! bool intersect(const Plane* frustumPlanes, Vec4 pos) { float radius = pos.w; if (distance(frustumPlanes[Frustum::Far], pos) > radius) return false; if (distance(frustumPlanes[Frustum::Near], pos) > radius) LHS! return false; if (distance(frustumPlanes[Frustum::Right], pos) > radius) return false; if (distance(frustumPlanes[Frustum::Left], pos) > radius) return false; if (distance(frustumPlanes[Frustum::Upper], pos) > radius) LHS! return false; if (distance(frustumPlanes[Frustum::Lower], pos) > radius) return false; return true; }Monday, March 7, 2011
  99. 99. Intersection Code LHS! bool intersect(const Plane* frustumPlanes, Vec4 pos) Float branch! { float radius = pos.w; if (distance(frustumPlanes[Frustum::Far], pos) > radius) return false; if (distance(frustumPlanes[Frustum::Near], pos) > radius) LHS! return false; if (distance(frustumPlanes[Frustum::Right], pos) > radius) return false; if (distance(frustumPlanes[Frustum::Left], pos) > radius) return false; if (distance(frustumPlanes[Frustum::Upper], pos) > radius) LHS! return false; if (distance(frustumPlanes[Frustum::Lower], pos) > radius) return false; return true; }Monday, March 7, 2011
  100. 100. Intersection Code LHS! bool intersect(const Plane* frustumPlanes, Vec4 pos) Float branch! { float radius = pos.w; if (distance(frustumPlanes[Frustum::Far], pos) > radius) return false; if (distance(frustumPlanes[Frustum::Near], pos) > radius) LHS! return false; if (distance(frustumPlanes[Frustum::Right], pos) >Float branch! radius) return false; if (distance(frustumPlanes[Frustum::Left], pos) > radius) return false; if (distance(frustumPlanes[Frustum::Upper], pos) > radius) LHS! return false; if (distance(frustumPlanes[Frustum::Lower], pos) > radius) return false; return true; }Monday, March 7, 2011
  101. 101. Intersection Code LHS! bool intersect(const Plane* frustumPlanes, Vec4 pos) Float branch! { float radius = pos.w; if (distance(frustumPlanes[Frustum::Far], pos) > radius) return false; if (distance(frustumPlanes[Frustum::Near], pos) > radius) LHS! return false; if (distance(frustumPlanes[Frustum::Right], pos) >Float branch! radius) return false; if (distance(frustumPlanes[Frustum::Left], pos) > radius) return false; if (distance(frustumPlanes[Frustum::Upper], pos) > radius) LHS! return false; if (distance(frustumPlanes[Frustum::Lower], pos) > radius) return false; Float branch! return true; }Monday, March 7, 2011
  102. 102. Intersection Code LHS! bool intersect(const Plane* frustumPlanes, Vec4 pos) Float branch! { float radius = pos.w; if (distance(frustumPlanes[Frustum::Far], pos) > radius) return false; if (distance(frustumPlanes[Frustum::Near], pos) > radius) LHS! return false; if (distance(frustumPlanes[Frustum::Right], pos) >Float branch! radius) return false; if (distance(frustumPlanes[Frustum::Left], pos) > radius) return false; if (distance(frustumPlanes[Frustum::Upper], pos) > radius) LHS! return false; if (distance(frustumPlanes[Frustum::Lower], pos) > radius) return false; Float branch! return true; } LHS!Monday, March 7, 2011
  103. 103. Intersection Code LHS! bool intersect(const Plane* frustumPlanes, Vec4 pos) Float branch! { float radius = pos.w; if (distance(frustumPlanes[Frustum::Far], pos) > radius) return false; if (distance(frustumPlanes[Frustum::Near], pos) > radius) LHS! return false; if (distance(frustumPlanes[Frustum::Right], pos) >Float branch! radius) return false; if (distance(frustumPlanes[Frustum::Left], pos) > radius) return false; if (distance(frustumPlanes[Frustum::Upper], pos) > radius) LHS! return false; if (distance(frustumPlanes[Frustum::Lower], pos) > radius) return false; Float branch! return true; } LHS! Float branch!Monday, March 7, 2011
  104. 104. Intersection Code LHS! bool intersect(const Plane* frustumPlanes, Vec4 pos) Float branch! { float radius = pos.w; if (distance(frustumPlanes[Frustum::Far], pos) > radius) return So what do the consoles false; think? if (distance(frustumPlanes[Frustum::Near], pos) > radius) LHS! return false; if (distance(frustumPlanes[Frustum::Right], pos) >Float branch! :( radius) return false; if (distance(frustumPlanes[Frustum::Left], pos) > radius) return false; if (distance(frustumPlanes[Frustum::Upper], pos) > radius) LHS! return false; if (distance(frustumPlanes[Frustum::Lower], pos) > radius) return false; Float branch! return true; } LHS! Float branch!Monday, March 7, 2011
  105. 105. Intersection Code bool intersect(const Plane* frustumPlanes, Vec4 pos) { float radius = pos.w; if (distance(frustumPlanes[Frustum::Far], pos) > radius) return false; if (distance(frustumPlanes[Frustum::Near], pos) > radius) return false; if (distance(frustumPlanes[Frustum::Right], pos) > radius) return false; if (distance(frustumPlanes[Frustum::Left], pos) > radius) return false; if (distance(frustumPlanes[Frustum::Upper], pos) > radius) return false; if (distance(frustumPlanes[Frustum::Lower], pos) > radius) return false; return true; }Monday, March 7, 2011
  106. 106. Intersection CodeMonday, March 7, 2011
  107. 107. Intersection Code›How can we improve this?Monday, March 7, 2011
  108. 108. Intersection Code›How can we improve this?›Dot products are not very SIMD friendlyMonday, March 7, 2011
  109. 109. Intersection Code›How can we improve this?›Dot products are not very SIMD friendly›Usually need to shuffle data around to get resultMonday, March 7, 2011
  110. 110. Intersection Code›How can we improve this?›Dot products are not very SIMD friendly›Usually need to shuffle data around to get result›(x0 * x1 + y0 * y1 + z0 * z1 + w0 * w1)Monday, March 7, 2011
  111. 111. Intersection Code›How can we improve this?›Dot products are not very SIMD friendly›Usually need to shuffle data around to get result›(x0 * x1 + y0 * y1 + z0 * z1 + w0 * w1)Monday, March 7, 2011
  112. 112. Intersection CodeMonday, March 7, 2011
  113. 113. Intersection Code›Rearrange the data from AoS to SoAMonday, March 7, 2011
  114. 114. Intersection Code›Rearrange the data from AoS to SoA Vec 0 X0 Y0 Z0 W0 Vec 1 X1 Y1 Z1 W1 Vec 2 X2 Y2 Z2 W2 Vec 3 X3 Y3 Z3 W3Monday, March 7, 2011
  115. 115. Intersection Code›Rearrange the data from AoS to SoA Vec 0 X0 Y0 Z0 W0 Vec 1 X1 Y1 Z1 W1 Vec 2 X2 Y2 Z2 W2 Vec 3 X3 Y3 Z3 W3Monday, March 7, 2011
  116. 116. Intersection Code›Rearrange the data from AoS to SoA Vec 0 X0 Y0 Z0 W0 VecX X0 X1 X2 X3 Vec 1 X1 Y1 Z1 W1 VecY Y0 Y1 Y2 Y3 Vec 2 X2 Y2 Z2 W2 VecZ Z0 Z1 Z2 Z3 Vec 3 X3 Y3 Z3 W3 VecW W0 W1 W2 W3Monday, March 7, 2011
  117. 117. Intersection Code›Rearrange the data from AoS to SoA Vec 0 X0 Y0 Z0 W0 VecX X0 X1 X2 X3 Vec 1 X1 Y1 Z1 W1 VecY Y0 Y1 Y2 Y3 Vec 2 X2 Y2 Z2 W2 VecZ Z0 Z1 Z2 Z3 Vec 3 X3 Y3 Z3 W3 VecW W0 W1 W2 W3›Now we only need 3 instructions for 4 dots!Monday, March 7, 2011
  118. 118. Rearrange the frustum planesMonday, March 7, 2011
  119. 119. Rearrange the frustum planes Plane 0 X0 Y0 Z0 W0 Plane 1 X1 Y1 Z1 W1 Plane 2 X2 Y2 Z2 W2 Plane 3 X3 Y3 Z3 W3 Plane 4 X4 Y4 Z4 W4 Plane 5 X5 Y5 Z5 W5Monday, March 7, 2011
  120. 120. Rearrange the frustum planes Plane 0 X0 Y0 Z0 W0 Plane 1 X1 Y1 Z1 W1 Plane 2 X2 Y2 Z2 W2 Plane 3 X3 Y3 Z3 W3 Plane 4 X4 Y4 Z4 W4 Plane 5 X5 Y5 Z5 W5Monday, March 7, 2011
  121. 121. Rearrange the frustum planes X0 X1 X2 X3 Plane 0 X0 Y0 Z0 W0 Y0 Y1 Y2 Y3 Plane 1 X1 Y1 Z1 W1 Z0 Z1 Z2 Z3 Plane 2 X2 Y2 Z2 W2 W0 W1 W2 W3 Plane 3 X3 Y3 Z3 W3 Plane 4 X4 Y4 Z4 W4 X4 X5 X4 X5 Plane 5 X5 Y5 Z5 W5 Y4 Y5 Y4 Y5 Z4 Z5 Z4 Z5 W4 W5 W4 W5Monday, March 7, 2011
  122. 122. New intersection codeMonday, March 7, 2011
  123. 123. New intersection code›Two frustum vs Sphere intersections per loopMonday, March 7, 2011
  124. 124. New intersection code›Two frustum vs Sphere intersections per loop›4 * 3 dot products with 9 instructionsMonday, March 7, 2011
  125. 125. New intersection code›Two frustum vs Sphere intersections per loop›4 * 3 dot products with 9 instructions›Loop over all frustums and merge the resultMonday, March 7, 2011
  126. 126. New intersection code›Two frustum vs Sphere intersections per loop›4 * 3 dot products with 9 instructions›Loop over all frustums and merge the resultMonday, March 7, 2011
  127. 127. New intersection code (1/4) Vec posA_xxxx = vecShuffle<VecMask::_xxxx>(posA); Vec posA_yyyy = vecShuffle<VecMask::_yyyy>(posA); Vec posA_zzzz = vecShuffle<VecMask::_zzzz>(posA); Vec posA_rrrr = vecShuffle<VecMask::_wwww>(posA); // 4 dot products dotA_0123 = vecMulAdd(posA_zzzz, pl_z0z1z2z3, pl_w0w1w2w3); dotA_0123 = vecMulAdd(posA_yyyy, pl_y0y1y2y3, dotA_0123); dotA_0123 = vecMulAdd(posA_xxxx, pl_x0x1x2x3, dotA_0123);Monday, March 7, 2011
  128. 128. New intersection code (1/4) Vec posA_xxxx = vecShuffle<VecMask::_xxxx>(posA); Vec posA_yyyy = vecShuffle<VecMask::_yyyy>(posA); Vec posA_zzzz = vecShuffle<VecMask::_zzzz>(posA); Vec posA_rrrr = vecShuffle<VecMask::_wwww>(posA); // 4 dot products dotA_0123 = vecMulAdd(posA_zzzz, pl_z0z1z2z3, pl_w0w1w2w3); dotA_0123 = vecMulAdd(posA_yyyy, pl_y0y1y2y3, dotA_0123); dotA_0123 = vecMulAdd(posA_xxxx, pl_x0x1x2x3, dotA_0123);Monday, March 7, 2011
  129. 129. New intersection code (1/4) Vec posA_xxxx = vecShuffle<VecMask::_xxxx>(posA); Vec posA_yyyy = vecShuffle<VecMask::_yyyy>(posA); Vec posA_zzzz = vecShuffle<VecMask::_zzzz>(posA); Vec posA_rrrr = vecShuffle<VecMask::_wwww>(posA); // 4 dot products dotA_0123 = vecMulAdd(posA_zzzz, pl_z0z1z2z3, pl_w0w1w2w3); dotA_0123 = vecMulAdd(posA_yyyy, pl_y0y1y2y3, dotA_0123); dotA_0123 = vecMulAdd(posA_xxxx, pl_x0x1x2x3, dotA_0123);Monday, March 7, 2011
  130. 130. New intersection code (2/4) Vec posB_xxxx = vecShuffle<VecMask::_xxxx>(posB); Vec posB_yyyy = vecShuffle<VecMask::_yyyy>(posB); Vec posB_zzzz = vecShuffle<VecMask::_zzzz>(posB); Vec posB_rrrr = vecShuffle<VecMask::_wwww>(posB); // 4 dot products dotB_0123 = vecMulAdd(posB_zzzz, pl_z0z1z2z3, pl_w0w1w2w3); dotB_0123 = vecMulAdd(posB_yyyy, pl_y0y1y2y3, dotB_0123); dotB_0123 = vecMulAdd(posB_xxxx, pl_x0x1x2x3, dotB_0123Monday, March 7, 2011
  131. 131. New intersection code (2/4) Vec posB_xxxx = vecShuffle<VecMask::_xxxx>(posB); Vec posB_yyyy = vecShuffle<VecMask::_yyyy>(posB); Vec posB_zzzz = vecShuffle<VecMask::_zzzz>(posB); Vec posB_rrrr = vecShuffle<VecMask::_wwww>(posB); // 4 dot products dotB_0123 = vecMulAdd(posB_zzzz, pl_z0z1z2z3, pl_w0w1w2w3); dotB_0123 = vecMulAdd(posB_yyyy, pl_y0y1y2y3, dotB_0123); dotB_0123 = vecMulAdd(posB_xxxx, pl_x0x1x2x3, dotB_0123Monday, March 7, 2011
  132. 132. New intersection code (2/4) Vec posB_xxxx = vecShuffle<VecMask::_xxxx>(posB); Vec posB_yyyy = vecShuffle<VecMask::_yyyy>(posB); Vec posB_zzzz = vecShuffle<VecMask::_zzzz>(posB); Vec posB_rrrr = vecShuffle<VecMask::_wwww>(posB); // 4 dot products dotB_0123 = vecMulAdd(posB_zzzz, pl_z0z1z2z3, pl_w0w1w2w3); dotB_0123 = vecMulAdd(posB_yyyy, pl_y0y1y2y3, dotB_0123); dotB_0123 = vecMulAdd(posB_xxxx, pl_x0x1x2x3, dotB_0123Monday, March 7, 2011
  133. 133. New intersection code (3/4) Vec posAB_xxxx = vecInsert<VecMask::_0011>(posA_xxxx, posB_xxxx); Vec posAB_yyyy = vecInsert<VecMask::_0011>(posA_yyyy, posB_yyyy); Vec posAB_zzzz = vecInsert<VecMask::_0011>(posA_zzzz, posB_zzzz); Vec posAB_rrrr = vecInsert<VecMask::_0011>(posA_rrrr, posB_rrrr); // 4 dot products dotA45B45 = vecMulAdd(posAB_zzzz, pl_z4z5z4z5, pl_w4w5w4w5); dotA45B45 = vecMulAdd(posAB_yyyy, pl_y4y5y4y5, dotA45B45); dotA45B45 = vecMulAdd(posAB_xxxx, pl_x4x5x4x5, dotA45B45);Monday, March 7, 2011
  134. 134. New intersection code (3/4) Vec posAB_xxxx = vecInsert<VecMask::_0011>(posA_xxxx, posB_xxxx); Vec posAB_yyyy = vecInsert<VecMask::_0011>(posA_yyyy, posB_yyyy); Vec posAB_zzzz = vecInsert<VecMask::_0011>(posA_zzzz, posB_zzzz); Vec posAB_rrrr = vecInsert<VecMask::_0011>(posA_rrrr, posB_rrrr); // 4 dot products dotA45B45 = vecMulAdd(posAB_zzzz, pl_z4z5z4z5, pl_w4w5w4w5); dotA45B45 = vecMulAdd(posAB_yyyy, pl_y4y5y4y5, dotA45B45); dotA45B45 = vecMulAdd(posAB_xxxx, pl_x4x5x4x5, dotA45B45);Monday, March 7, 2011
  135. 135. New intersection code (3/4) Vec posAB_xxxx = vecInsert<VecMask::_0011>(posA_xxxx, posB_xxxx); Vec posAB_yyyy = vecInsert<VecMask::_0011>(posA_yyyy, posB_yyyy); Vec posAB_zzzz = vecInsert<VecMask::_0011>(posA_zzzz, posB_zzzz); Vec posAB_rrrr = vecInsert<VecMask::_0011>(posA_rrrr, posB_rrrr); // 4 dot products dotA45B45 = vecMulAdd(posAB_zzzz, pl_z4z5z4z5, pl_w4w5w4w5); dotA45B45 = vecMulAdd(posAB_yyyy, pl_y4y5y4y5, dotA45B45); dotA45B45 = vecMulAdd(posAB_xxxx, pl_x4x5x4x5, dotA45B45);Monday, March 7, 2011
  136. 136. New intersection code (4/4) // Compare against radius dotA_0123 = vecCmpGTMask(dotA_0123, posA_rrrr); dotB_0123 = vecCmpGTMask(dotB_0123, posB_rrrr); dotA45B45 = vecCmpGTMask(dotA45B45, posAB_rrrr); Vec dotA45 = vecInsert<VecMask::_0011>(dotA45B45, zero); Vec dotB45 = vecInsert<VecMask::_0011>(zero, dotA45B45); // collect the results Vec resA = vecOrx(dotA_0123); Vec resB = vecOrx(dotB_0123); resA = vecOr(resA, vecOrx(dotA45)); resB = vecOr(resB, vecOrx(dotB45)); // resA = inside or outside of frustum for point A, resB for point B Vec rA = vecNotMask(resA); Vec rB = vecNotMask(resB); masksCurrent[0] |= frustumMask & rA; masksCurrent[1] |= frustumMask & rB;Monday, March 7, 2011
  137. 137. New intersection code (4/4) // Compare against radius dotA_0123 = vecCmpGTMask(dotA_0123, posA_rrrr); dotB_0123 = vecCmpGTMask(dotB_0123, posB_rrrr); dotA45B45 = vecCmpGTMask(dotA45B45, posAB_rrrr); Vec dotA45 = vecInsert<VecMask::_0011>(dotA45B45, zero); Vec dotB45 = vecInsert<VecMask::_0011>(zero, dotA45B45); // collect the results Vec resA = vecOrx(dotA_0123); Vec resB = vecOrx(dotB_0123); resA = vecOr(resA, vecOrx(dotA45)); resB = vecOr(resB, vecOrx(dotB45)); // resA = inside or outside of frustum for point A, resB for point B Vec rA = vecNotMask(resA); Vec rB = vecNotMask(resB); masksCurrent[0] |= frustumMask & rA; masksCurrent[1] |= frustumMask & rB;Monday, March 7, 2011
  138. 138. New intersection code (4/4) // Compare against radius dotA_0123 = vecCmpGTMask(dotA_0123, posA_rrrr); dotB_0123 = vecCmpGTMask(dotB_0123, posB_rrrr); dotA45B45 = vecCmpGTMask(dotA45B45, posAB_rrrr); Vec dotA45 = vecInsert<VecMask::_0011>(dotA45B45, zero); Vec dotB45 = vecInsert<VecMask::_0011>(zero, dotA45B45); // collect the results Vec resA = vecOrx(dotA_0123); Vec resB = vecOrx(dotB_0123); resA = vecOr(resA, vecOrx(dotA45)); resB = vecOr(resB, vecOrx(dotB45)); // resA = inside or outside of frustum for point A, resB for point B Vec rA = vecNotMask(resA); Vec rB = vecNotMask(resB); masksCurrent[0] |= frustumMask & rA; masksCurrent[1] |= frustumMask & rB;Monday, March 7, 2011
  139. 139. New intersection code (4/4) // Compare against radius dotA_0123 = vecCmpGTMask(dotA_0123, posA_rrrr); dotB_0123 = vecCmpGTMask(dotB_0123, posB_rrrr); dotA45B45 = vecCmpGTMask(dotA45B45, posAB_rrrr); Vec dotA45 = vecInsert<VecMask::_0011>(dotA45B45, zero); Vec dotB45 = vecInsert<VecMask::_0011>(zero, dotA45B45); // collect the results Vec resA = vecOrx(dotA_0123); Vec resB = vecOrx(dotB_0123); resA = vecOr(resA, vecOrx(dotA45)); resB = vecOr(resB, vecOrx(dotB45)); // resA = inside or outside of frustum for point A, resB for point B Vec rA = vecNotMask(resA); Vec rB = vecNotMask(resB); masksCurrent[0] |= frustumMask & rA; masksCurrent[1] |= frustumMask & rB;Monday, March 7, 2011
  140. 140. SPU Pipelining Assembler (SPA)Monday, March 7, 2011
  141. 141. SPU Pipelining Assembler (SPA)›Like VCL (for PS2) but for PS3 SPUMonday, March 7, 2011
  142. 142. SPU Pipelining Assembler (SPA)›Like VCL (for PS2) but for PS3 SPU›Can give you that extra boost if neededMonday, March 7, 2011
  143. 143. SPU Pipelining Assembler (SPA)›Like VCL (for PS2) but for PS3 SPU›Can give you that extra boost if needed›Does software pipelining for youMonday, March 7, 2011
  144. 144. SPU Pipelining Assembler (SPA)›Like VCL (for PS2) but for PS3 SPU›Can give you that extra boost if needed›Does software pipelining for you›Gives about 35% speed boost in the cullingMonday, March 7, 2011
  145. 145. SPU Pipelining Assembler (SPA)›Like VCL (for PS2) but for PS3 SPU›Can give you that extra boost if needed›Does software pipelining for you›Gives about 35% speed boost in the culling›Not really that different from using intrinsicsMonday, March 7, 2011
  146. 146. SPU Pipelining Assembler (SPA)›Like VCL (for PS2) but for PS3 SPU›Can give you that extra boost if needed›Does software pipelining for you›Gives about 35% speed boost in the culling›Not really that different from using intrinsics›And coding assembler is fun :)Monday, March 7, 2011
  147. 147. SPU Pipelining Assembler (SPA)›Like VCL (for PS2) but for PS3 SPU›Can give you that extra boost if needed›Does software pipelining for you›Gives about 35% speed boost in the culling›Not really that different from using intrinsics›And coding assembler is fun :)Monday, March 7, 2011
  148. 148. SPA Inner loop (partly) lqd posA, -0x20(currentPos) lqd posB, -0x10(currentPos) shufb posA_xxxx, posA, posA, Mask_xxxx shufb posA_yyyy, posA, posA, Mask_yyyy shufb posA_zzzz, posA, posA, Mask_zzzz shufb posA_rrrr, posA, posA, Mask_wwww // 4 dot products fma dotA_0123, posA_zzzz, pl_z0z1z2z3, pl_w0w1w2w3 fma dotA_0123, posA_yyyy, pl_y0y1y2y3, dotA_0123 fma dotA_0123, posA_xxxx, pl_x0x1x2x3, dotA_0123 shufb posB_xxxx, posB, posB, Mask_xxxx shufb posB_yyyy, posB, posB, Mask_yyyy shufb posB_zzzz, posB, posB, Mask_zzzz shufb posB_rrrr, posB, posB, Mask_wwww // 4 dot products fma dotB_0123, posB_zzzz, pl_z0z1z2z3, pl_w0w1w2w3 fma dotB_0123, posB_yyyy, pl_y0y1y2y3, dotB_0123 fma dotB_0123, posB_xxxx, pl_x0x1x2x3, dotB_0123Monday, March 7, 2011
  149. 149. SPA Inner loop # Loop stats - frustumCull::loop # (ims enabled, sms disabled, optimisation level 2) # resmii : 24 (*) (resource constrained) # recmii : 2 (recurrence constrained) # resource usage: # even pipe : 24 inst. (100% use) (*) # FX[15] SP[9] # odd pipe : 24 inst. (100% use) (*) # SH[17] LS[6] BR[1] # misc: # linear schedule = 57 cycles (for information only) # software pipelining: # best pipelined schedule = 24 cycles (pipelined, 3 iterations in parallel) # software pipelining adjustments: # not generating non-pipelined loop since trip count >=3 (3) # estimated loop performance: # =24*n+59 cyclesMonday, March 7, 2011
  150. 150. _local_c0de000000000002: SPA Inner loop fma $46,$42,$30,$29; /* +1 */ shufb $47,$44,$37,$33 /* +2 */ fcgt $57,$20,$24; /* +2 */ orx $48,$15 /* +2 */ selb $55,$37,$44,$33; /* +2 */ shufb $56,$21,$16,$33 /* +1 */ fma $52,$16,$28,$41; /* +1 */ orx $49,$57 /* +2 */ ai $4,$4,32; orx $54,$47 /* +2 */ fma $51,$19,$26,$45; /* +1 */ orx $53,$55 /* +2 */ fma $50,$56,$27,$46; /* +1 */ shufb $24,$23,$23,$34 /* +1 */ ai $2,$2,32; /* +2 */ lqd $13,-32($4) or $69,$48,$54; /* +2 */ lqd $23,-16($4) fma $20,$18,$26,$52; /* +1 */ lqd $12,-32($2) /* +2 */ nor $60,$69,$69; /* +2 */ lqd $43,-16($2) /* +2 */ or $62,$49,$53; /* +2 */ shufb $59,$22,$17,$33 /* +1 */ and $39,$60,$35; /* +2 */ shufb $11,$14,$24,$33 /* +1 */ nor $61,$62,$60; /* +2 */ shufb $22,$13,$13,$36 fcgt $15,$51,$14; /* +1 */ shufb $17,$23,$23,$36 and $58,$61,$35; /* +2 */ shufb $19,$13,$13,$3 fma $10,$59,$25,$50; /* +1 */ shufb $18,$23,$23,$3 fma $9,$22,$32,$31; shufb $16,$23,$23,$38 or $8,$58,$43; /* +2 */ shufb $21,$13,$13,$38 or $40,$39,$12; /* +2 */ shufb $14,$13,$13,$34 ai $7,$7,-1; /* +2 */ shufb $42,$19,$18,$33 fma $41,$17,$32,$31; stqd $8,-16($2) /* +2 */ fcgt $44,$10,$11; /* +1 */ stqd $40,-32($2) /* +2 */ fma $45,$21,$28,$9; brnz $7,_local_c0de000000000002 /* +2 */ nop ; hbrrMonday, March 7, 2011
  151. 151. Additional cullingMonday, March 7, 2011
  152. 152. Additional culling›Frustum vs AABBMonday, March 7, 2011
  153. 153. Additional culling›Frustum vs AABB›Project AABB to screen spaceMonday, March 7, 2011
  154. 154. Additional culling›Frustum vs AABB›Project AABB to screen space›Software OcclusionMonday, March 7, 2011
  155. 155. Additional culling›Frustum vs AABB›Project AABB to screen space›Software OcclusionMonday, March 7, 2011
  156. 156. Project AABB to screen spaceMonday, March 7, 2011
  157. 157. Project AABB to screen space›Calculate the area of the AABB in screen spaceMonday, March 7, 2011
  158. 158. Project AABB to screen space›Calculate the area of the AABB in screen space›If area is smaller than setting just skip itMonday, March 7, 2011
  159. 159. Project AABB to screen space›Calculate the area of the AABB in screen space›If area is smaller than setting just skip it›Due to FOV taking distance doesn’t workMonday, March 7, 2011
  160. 160. Project AABB to screen space›Calculate the area of the AABB in screen space›If area is smaller than setting just skip it›Due to FOV taking distance doesn’t workMonday, March 7, 2011
  161. 161. Software OcclusionMonday, March 7, 2011
  162. 162. Software Occlusion›Used in Frostbite for 3 yearsMonday, March 7, 2011
  163. 163. Software Occlusion›Used in Frostbite for 3 years›Cross platformMonday, March 7, 2011
  164. 164. Software Occlusion›Used in Frostbite for 3 years›Cross platform›Artist made occludersMonday, March 7, 2011
  165. 165. Software Occlusion›Used in Frostbite for 3 years›Cross platform›Artist made occluders›TerrainMonday, March 7, 2011
  166. 166. Software Occlusion›Used in Frostbite for 3 years›Cross platform›Artist made occluders›TerrainMonday, March 7, 2011
  167. 167. Why Software Occlusion?Monday, March 7, 2011
  168. 168. Why Software Occlusion?›Want to remove CPU time not just GPUMonday, March 7, 2011
  169. 169. Why Software Occlusion?›Want to remove CPU time not just GPU›Cull as early as possibleMonday, March 7, 2011
  170. 170. Why Software Occlusion?›Want to remove CPU time not just GPU›Cull as early as possible›GPU queries troublesome as lagging behind CPUMonday, March 7, 2011
  171. 171. Why Software Occlusion?›Want to remove CPU time not just GPU›Cull as early as possible›GPU queries troublesome as lagging behind CPU›Must support destructionMonday, March 7, 2011
  172. 172. Why Software Occlusion?›Want to remove CPU time not just GPU›Cull as early as possible›GPU queries troublesome as lagging behind CPU›Must support destruction›Easy for artists to controlMonday, March 7, 2011
  173. 173. Why Software Occlusion?›Want to remove CPU time not just GPU›Cull as early as possible›GPU queries troublesome as lagging behind CPU›Must support destruction›Easy for artists to controlMonday, March 7, 2011
  174. 174. Software OcclusionMonday, March 7, 2011
  175. 175. Software Occlusion›So how does it work?Monday, March 7, 2011
  176. 176. Software Occlusion›So how does it work?›Render PS1 style geometry to a zbuffer using software renderingMonday, March 7, 2011
  177. 177. Software Occlusion›So how does it work?›Render PS1 style geometry to a zbuffer using software rendering›The zbuffer is 256 x 114 floatMonday, March 7, 2011
  178. 178. Software Occlusion›So how does it work?›Render PS1 style geometry to a zbuffer using software rendering›The zbuffer is 256 x 114 floatMonday, March 7, 2011
  179. 179. Software OcclusionMonday, March 7, 2011
  180. 180. Software Occlusion›Occluder triangle setupMonday, March 7, 2011
  181. 181. Software Occlusion›Occluder triangle setup›Terrain triangle setupMonday, March 7, 2011
  182. 182. Software Occlusion›Occluder triangle setup›Terrain triangle setup›Rasterize trianglesMonday, March 7, 2011
  183. 183. Software Occlusion›Occluder triangle setup›Terrain triangle setup›Rasterize triangles›CullingMonday, March 7, 2011
  184. 184. Software Occlusion›Occluder triangle setup›Terrain triangle setup›Rasterize triangles›CullingMonday, March 7, 2011
  185. 185. Software Occlusion (Occluders)Monday, March 7, 2011
  186. 186. Software Occlusion (In-game)Monday, March 7, 2011
  187. 187. Software Occlusion (In-game)Monday, March 7, 2011
  188. 188. Software Occlusion (In-game)Monday, March 7, 2011
  189. 189. Culling jobs Culling Jobs Job 0 Job 1 Job 2 Job 3 Job 4Monday, March 7, 2011
  190. 190. Culling jobs Culling Jobs Job 0 Occluder Triangles Job 1 Occluder Triangles Job 2 Occluder Triangles Job 3 Occluder Triangles Job 4 Terrain TrianglesMonday, March 7, 2011
  191. 191. Culling jobs Culling Jobs Job 0 Occluder Triangles Rasterize Triangles Job 1 Occluder Triangles Rasterize Triangles Job 2 Occluder Triangles Rasterize Triangles Job 3 Occluder Triangles Rasterize Triangles Job 4 Terrain Triangles Rasterize TrianglesMonday, March 7, 2011
  192. 192. Culling jobs Culling Jobs Job 0 Occluder Triangles Rasterize Triangles Culling Z-buffer Test Job 1 Occluder Triangles Rasterize Triangles Culling Z-buffer Test Job 2 Occluder Triangles Rasterize Triangles Culling Z-buffer Test Job 3 Occluder Triangles Rasterize Triangles Culling Z-buffer Test Job 4 Terrain Triangles Rasterize Triangles Culling Z-buffer TestMonday, March 7, 2011
  193. 193. Occluder triangles Job 0 Job 1 Job 2 Job 3 OutputMonday, March 7, 2011
  194. 194. Occluder triangles Job 0 Job 1 Job 2 Job 3 OutputMonday, March 7, 2011
  195. 195. Occluder triangles Job 0 Inside, Project triangles Job 1 Job 2 Job 3 OutputMonday, March 7, 2011
  196. 196. Occluder triangles Job 0 Inside, Project triangles Job 1 Job 2 Job 3 OutputMonday, March 7, 2011
  197. 197. Occluder triangles Job 0 Inside, Project triangles Job 1 Inside, Project triangles Job 2 Job 3 OutputMonday, March 7, 2011
  198. 198. Occluder triangles Job 0 Inside, Project triangles Job 1 Inside, Project triangles Job 2 Job 3 OutputMonday, March 7, 2011
  199. 199. Occluder triangles Job 0 Inside, Project triangles Job 1 Inside, Project triangles Job 2 Intersecting, Clip, Project Job 3 OutputMonday, March 7, 2011
  200. 200. Occluder triangles Job 0 Inside, Project triangles Job 1 Inside, Project triangles Job 2 Intersecting, Clip, Project Job 3 OutputMonday, March 7, 2011
  201. 201. Occluder triangles Job 0 Inside, Project triangles Job 1 Inside, Project triangles Job 2 Intersecting, Clip, Project Job 3 Outside, Skip OutputMonday, March 7, 2011
  202. 202. Occluder triangles Rasterize Triangles Input 16 Triangles 16 Triangles 16 Triangles 16 Triangles 256 x 114 zbuffer 256 x 114 zbuffer 256 x 114 zbuffer Job 0 Job 1 Merge step <Todo: Fix Colors>Monday, March 7, 2011
  203. 203. Occluder triangles Rasterize Triangles Input 16 Triangles 16 Triangles 16 Triangles 16 Triangles 256 x 114 zbuffer 256 x 114 zbuffer 256 x 114 zbuffer Job 0 Job 1 Merge step <Todo: Fix Colors>Monday, March 7, 2011
  204. 204. Occluder triangles Rasterize Triangles Input 16 Triangles 16 Triangles 16 Triangles 16 Triangles 256 x 114 zbuffer 256 x 114 zbuffer 256 x 114 zbuffer Job 0 Job 1 Merge step <Todo: Fix Colors>Monday, March 7, 2011
  205. 205. Occluder triangles Rasterize Triangles Input 16 Triangles 16 Triangles 16 Triangles 16 Triangles 256 x 114 zbuffer 256 x 114 zbuffer 256 x 114 zbuffer Job 0 Job 1 Merge step <Todo: Fix Colors>Monday, March 7, 2011
  206. 206. Occluder triangles Rasterize Triangles Input 16 Triangles 16 Triangles 16 Triangles 16 Triangles 256 x 114 zbuffer 256 x 114 zbuffer 256 x 114 zbuffer Job 0 Job 1 Merge step <Todo: Fix Colors>Monday, March 7, 2011
  207. 207. Occluder triangles Rasterize Triangles Input 16 Triangles 16 Triangles 16 Triangles 16 Triangles 256 x 114 zbuffer 256 x 114 zbuffer 256 x 114 zbuffer Job 0 Job 1 Merge step <Todo: Fix Colors>Monday, March 7, 2011
  208. 208. Occluder triangles Rasterize Triangles Input 16 Triangles 16 Triangles 16 Triangles 16 Triangles 256 x 114 zbuffer 256 x 114 zbuffer 256 x 114 zbuffer Job 0 Job 1 Merge step <Todo: Fix Colors>Monday, March 7, 2011
  209. 209. Occluder triangles Rasterize Triangles Input 16 Triangles 16 Triangles 16 Triangles 16 Triangles 256 x 114 zbuffer 256 x 114 zbuffer 256 x 114 zbuffer Job 0 Job 1 Merge step <Todo: Fix Colors>Monday, March 7, 2011
  210. 210. Occluder triangles Rasterize Triangles Input 16 Triangles 16 Triangles 16 Triangles 16 Triangles 256 x 114 zbuffer 256 x 114 zbuffer 256 x 114 zbuffer Job 0 Job 1 Merge step <Todo: Fix Colors>Monday, March 7, 2011
  211. 211. Occluder triangles Rasterize Triangles Input 16 Triangles 16 Triangles 16 Triangles 16 Triangles 256 x 114 zbuffer 256 x 114 zbuffer 256 x 114 zbuffer Job 0 Job 1 Merge step <Todo: Fix Colors>Monday, March 7, 2011
  212. 212. Occluder triangles Rasterize Triangles Input 16 Triangles 16 Triangles 16 Triangles 16 Triangles 256 x 114 zbuffer 256 x 114 zbuffer 256 x 114 zbuffer Job 0 Job 1 Merge step <Todo: Fix Colors>Monday, March 7, 2011
  213. 213. z-buffer testingMonday, March 7, 2011
  214. 214. z-buffer testing›Calculate screen space AABB for objectMonday, March 7, 2011
  215. 215. z-buffer testing›Calculate screen space AABB for object›Get single distance valueMonday, March 7, 2011
  216. 216. z-buffer testing›Calculate screen space AABB for object›Get single distance value›Test the square against the z-bufferMonday, March 7, 2011
  217. 217. z-buffer testing›Calculate screen space AABB for object›Get single distance value›Test the square against the z-bufferMonday, March 7, 2011
  218. 218. z-buffer testing›Calculate screen space AABB for object›Get single distance value›Test the square against the z-bufferMonday, March 7, 2011
  219. 219. z-buffer testing›Calculate screen space AABB for object›Get single distance value›Test the square against the z-bufferMonday, March 7, 2011
  220. 220. z-buffer testing›Calculate screen space AABB for object›Get single distance value›Test the square against the z-bufferMonday, March 7, 2011
  221. 221. z-buffer testing›Calculate screen space AABB for object›Get single distance value›Test the square against the z-bufferMonday, March 7, 2011
  222. 222. z-buffer testing›Calculate screen space AABB for object›Get single distance value›Test the square against the z-bufferMonday, March 7, 2011
  223. 223. ConclusionMonday, March 7, 2011
  224. 224. Conclusion›Accurate and high performance culling is essentialMonday, March 7, 2011
  225. 225. Conclusion›Accurate and high performance culling is essential›Reduces pressure on low-level systems/renderingMonday, March 7, 2011
  226. 226. Conclusion›Accurate and high performance culling is essential›Reduces pressure on low-level systems/rendering›It’s all about dataMonday, March 7, 2011
  227. 227. Conclusion›Accurate and high performance culling is essential›Reduces pressure on low-level systems/rendering›It’s all about data›Simple data often means simple codeMonday, March 7, 2011
  228. 228. Conclusion›Accurate and high performance culling is essential›Reduces pressure on low-level systems/rendering›It’s all about data›Simple data often means simple code›Understanding your target hardwareMonday, March 7, 2011
  229. 229. Conclusion›Accurate and high performance culling is essential›Reduces pressure on low-level systems/rendering›It’s all about data›Simple data often means simple code›Understanding your target hardwareMonday, March 7, 2011
  230. 230. Thanks to›Andreas Fredriksson (@deplinenoise)›Christina Coffin (@christinacoffin)›Johan Andersson (@repi)›Stephen Hill (@self_shadow)›Steven Tovey (@nonchaotic)›Halldor Fannar›Evelyn DonisMonday, March 7, 2011
  231. 231. Questions? Email: daniel@collin.com Blog: zenic.org Twitter: @daniel_collinBattlefield 3 & Frostbite 2 talks at GDC’11:Mon 1:45 DX11 Rendering in Battlefield 3 Johan AnderssonWed 10:30 SPU-based Deferred Shading in Battlefield 3 Christina Coffin for PlayStation 3Wed 3:00 Culling the Battlefield: Data Oriented Design Daniel Collin in PracticeThu 1:30 Lighting You Up in Battlefield 3 Kenny MagnussonFri 4:05 Approximating Translucency for a Fast, Colin Barré-Brisebois Cheap & Convincing Subsurface Scattering Look For more DICE talks: http://publications.dice.seMonday, March 7, 2011
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×