Paris Game/AI Conference 2011

1,639 views

Published on

Slides from the Paris Game/AI Conference 2011 talk by Neil Henning - covering the

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,639
On SlideShare
0
From Embeds
0
Number of Embeds
349
Actions
Shares
0
Downloads
18
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Paris Game/AI Conference 2011

  1. 1. Preparing AI for Parallelism<br />Lessons from NASCAR The Game 2011<br />Neil Henning – Technology Lead<br />Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  2. 2. Introduction<br />Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  3. 3. Introduction<br />I am sure some of you are wondering...<br />Why a guy from<br />is doing a talk about<br />which was developed by<br />Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  4. 4. Introduction<br /> Team from Codeplay worked for 15 months on game<br />Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  5. 5. Introduction<br /><ul><li> NASCAR isn’t just about driving straight, then turning left
  6. 6. 43 cars on screen at the same time
  7. 7. Cars race in tight packs on the circuit
  8. 8. Overtaking is all about navigating through these packs</li></ul> Cannot simply make the AI use LODs, nearly always in view<br />Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  9. 9. Agenda<br />Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  10. 10. Agenda<br /> How to prepare AI for parallelism<br /><ul><li> …by investigating NASCAR the Game 2011's AI</li></ul>Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  11. 11. Agenda<br /><ul><li> During the investigation I will answer the questions:
  12. 12. Why prepare your AI for parallelism?
  13. 13. What changes should be made?
  14. 14. What common issues are there?
  15. 15. How did these changes help when optimizing NASCAR?
  16. 16. How did we make use of the PS3's unique hardware?</li></ul> What performance improvement was achieved?<br />Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  17. 17. Why prepare your AI for parallelism?<br />Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  18. 18. Why prepare your AI for parallelism?<br /><ul><li> Without parallelism, tighter limits on number of bots
  19. 19. Say we have four bots
  20. 20. In serial – can easily fit in a frame</li></ul>frame length<br />Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  21. 21. Why prepare your AI for parallelism?<br /> Without parallelism, tighter limits on number of bots<br /><ul><li> Want to increase bots by 3x?
  22. 22. Have to either optimize or parallelize (or both)</li></ul>frame length<br />Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  23. 23. Why prepare your AI for parallelism?<br /> Without parallelism, tighter limits on number of bots<br /><ul><li> Split work between threads
  24. 24. Only possible with parallelism</li></ul>frame length<br />Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  25. 25. Why prepare your AI for parallelism?<br /><ul><li> Multicore is the future (has been for some time)
  26. 26. This generation of consoles are multicore
  27. 27. Sony's new PS Vita is quad core
  28. 28. Even iPad usesdual core processors now!
  29. 29. Being able to split work amongst cores is key
  30. 30. Might not be required yet, but could be essential later</li></ul>Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  31. 31. Why prepare your AI for parallelism?<br /><ul><li> Helps during crunch time
  32. 32. Optimization being sought throughout engine
  33. 33. Either optimize engine or cut features
  34. 34. Have AI prepared to become parallel
  35. 35. Optimization folks will love you!</li></ul>Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  36. 36. What changes should be made?<br />Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  37. 37. What changes should be made?<br /><ul><li> Split work into manageable chunks
  38. 38. In NASCAR, had 18 components for each car</li></ul>Driving<br />Controllers<br />Stay<br />Behind<br /> Obstacle<br />Detection<br />Stay<br />Beside<br />Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  39. 39. What changes should be made?<br /><ul><li> Components are in groups
  40. 40. All components in a group can be run in parallel</li></ul>Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  41. 41. What changes should be made?<br /><ul><li> 43 cars = 43 AIs
  42. 42. Each car’s groups can be run in parallel too</li></ul>0<br />1<br />2<br />…<br />…<br />42<br />Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  43. 43. What changes should be made?<br /><ul><li> Read/Write phases
  44. 44. Two phases for your AI
  45. 45. Read phase can read world/other car state
  46. 46. Write phase can modify own car state</li></ul>Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  47. 47. What changes should be made?<br /><ul><li>Use temporary data to store read values from environment
  48. 48. In read phase, store needed reads into temporary data
  49. 49. In write phase, read from the temporary data
  50. 50. AI is one frame behind world events
  51. 51. Effect on AI is minimal</li></ul>Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  52. 52. What changes should be made?<br /><ul><li> In NASCAR a read/write phase was used
  53. 53. Write phase uses data from previous frames read phase</li></ul>Write Phase<br />Read Phase<br /><ul><li>Minimal set of components in read/write phase group
  54. 54. Only components that required world/other car state</li></ul>Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  55. 55. What changes should be made?<br /><ul><li> Remove large stack locals
  56. 56. Having two or more threads means lots of duplicate locals</li></ul>void func()<br />{<br /> char localBuffer[1024];<br />// … do something with localBuffer<br />}<br /><ul><li> If func is called from many threads, many times data use!</li></ul>Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  57. 57. What changes should be made?<br /><ul><li> Document code – describe relationship between data</li></ul>structFoo<br />{<br />Bar* bar;<br />};<br />many : one?<br /> one : one?<br /> one : many?<br />Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  58. 58. What changes should be made?<br /><ul><li> Document code – describe relationship between data</li></ul>structFoo<br />{<br />Bar* bar;<br />};<br /><ul><li> Knowing how data is shared critical for threading
  59. 59. Documenting the relationship saves time and effort later</li></ul>Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  60. 60. What common issues are there?<br />Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  61. 61. What common issues are there?<br /><ul><li> Virtual functions – can have a high runtime cost
  62. 62. ~500-1200 cycles on PowerPC if virtual lookup misses cache
  63. 63. Can equate to a large amount of time doing no work</li></ul>Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  64. 64. What common issues are there?<br /><ul><li> In NASCAR, components had virtual update method
  65. 65. Based on previous game (Supercar Challenge)
  66. 66. 16 cars in previous, now 43 cars
  67. 67. 5 component types in previous, now 18 component types
  68. 68. Now read/write phase too
  69. 69. 80 virtual calls to update became 1333 virtual calls!</li></ul>Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  70. 70. What common issues are there?<br /><ul><li> In NASCAR, components had virtual update method
  71. 71. In real terms, 3ms of virtual function lookup per frame
  72. 72. First optimization was to have typed buckets of components
  73. 73. 1333 virtual calls went to 31 virtual calls
  74. 74. Platform agnostic (PS3, 360 and Wii all sped up)</li></ul>Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  75. 75. What common issues are there?<br /><ul><li> Virtual functions not just a code abstraction</li></ul>structFoo { virtual void func(); };<br />structBar : public Foo { virtual void func(); };<br />Foo * foo;<br />foo->func();<br />// don’t know size of foo! Could be sizeof(Foo) || sizeof(Bar)<br /><ul><li> Virtual functions hide data too
  76. 76. Not knowing the size of data kills SPU/Compute development</li></ul>Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  77. 77. What common issues are there?<br /><ul><li> Naïve multithreading – locks galore
  78. 78. Avoid/reduce/remove locks if possible
  79. 79. Locks can be a solution, be very careful of use though </li></ul>void func()<br />{<br />lock->lock();<br />// … do something<br />lock->unlock();<br />}<br /><ul><li> Read/write phases allow removal of most (if not all) locks</li></ul>Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  80. 80. What common issues are there?<br /><ul><li> Physics subsystem caused issues with NASCAR
  81. 81. AI required knowledge of obstacles
  82. 82. Physics system used, raycast to find problematic obstacles
  83. 83. Each call to raycast used a mutex, every thread would halt!
  84. 84. Had to refactor code to remove need for locking</li></ul>Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  85. 85. What common issues are there?<br /><ul><li> Know your data – how is it accessed? Where is it shared?</li></ul>structRaceCar{ Brain * brain; };<br />structBrain { RaceCar * raceCar; Obstacle ** obstacles; };<br />structObstacle { BrainInterface * interface; };<br />structBrainInterface{ RaceCar * raceCar; Brain * brain; };<br /><ul><li> Very easy for systems grown over time to have convoluted struct layouts</li></ul>Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  86. 86. How did these changes help when optimizing NASCAR?<br />Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  87. 87. How did these changes help when optimizing NASCAR?<br />...<br />barriers<br /><ul><li> Read/Write phase was key to performance on Xbox 360
  88. 88. Allowed work to be split across all 6 threads
  89. 89. Each thread was given 1/6th of the cars to process
  90. 90. Takes 2ms of all CPU resources on 360 in a frame</li></ul>Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  91. 91. How did these changes help when optimizing NASCAR?<br /><ul><li> Tried the same approach on PS3
  92. 92. Only 2 threads on PS3, but have 6 sub processors (the SPUs)
  93. 93. Both threads on PS3 were completely full
  94. 94. Any multithreading speedup has to be on the SPUs</li></ul> Each SPU has 256kb local storage (for code & data)<br /> Code was ~2Mb and data was ~8Mb – far too large!<br /><ul><li> Unfeasible to mimic 360 approach</li></ul>Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  95. 95. How did these changes help when optimizing NASCAR?<br /> On PS3 most costly components were targeted<br />Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  96. 96. How did these changes help when optimizing NASCAR?<br /><ul><li> PS3 version relied on components being run in parallel
  97. 97. And all components in a group being able to be run in parallel
  98. 98. Costly groups were made to use the SPUs
  99. 99. Knowing relationship between data was key
  100. 100. Well documented code made life so much easier!</li></ul>Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  101. 101. How did we make use of the PS3's unique hardware?<br />Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  102. 102. How did we make use of the PS3's unique hardware?<br /><ul><li>Codeplay was asked by Eutechnyx to optimize the AI
  103. 103. Very tight deadlines, 1 month to reduce time taken in AI
  104. 104. No main thread time left – have to use the SPUs
  105. 105. Our Offload compiler technology crucial</li></ul>Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  106. 106. How did we make use of the PS3's unique hardware?<br /><ul><li> For those unfamiliar with coding for the SPU…
  107. 107. They are amazingly fast, if you code correctly for them
  108. 108. Normally requires total rewrite of existing codebase
  109. 109. Painful to access global variables
  110. 110. Virtual functions are a complete write off</li></ul>Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  111. 111. How did we make use of the PS3's unique hardware?<br /><ul><li> SPU development typically takes many months
  112. 112. Common to have 4-5 SPU programmers for ~10 months
  113. 113. Not feasible for late-in-cycle development
  114. 114. Offload aims to mitigate the issues with getting code onto SPU
  115. 115. Can offload code to SPU much quicker (typically a few man days)
  116. 116. Much easier to move existing code bases to SPU</li></ul>Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  117. 117. How did we make use of the PS3's unique hardware?<br />__blockingoffload()<br />{<br />// do some work on SPU, PPU waits for completion!<br />};<br />offloadThread_t handle = __offload()<br />{<br />// do some work on SPU!<br />};<br />// can do some work on PPU before waiting for SPU<br />offloadThreadJoin(handle);<br /><ul><li> Small language extension moves work from PPU to SPU
  118. 118. Any work within an offload block is performed on the SPU
  119. 119. All PPU code is duplicated for the SPU</li></ul>Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  120. 120. How did we make use of the PS3's unique hardware?<br />intaGlobalVariable;<br />__blockingoffload()<br />{<br />intaLocalVariable = aGlobalVariable;<br />};<br /><ul><li> Offload allows access to global variables
  121. 121. Just use them as normal!</li></ul>Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  122. 122. How did we make use of the PS3's unique hardware?<br />structFoo { virtual void bar() {} };<br />__blockingoffload[Foo::bar this]()<br />{<br />Foo foo;<br />foo.bar();<br />};<br /><ul><li>Offload allows virtual function calls too
  123. 123. Just have to specify which virtual functions may be called</li></ul>Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  124. 124. How did we make use of the PS3's unique hardware?<br />Stay Behind Other Car<br />Driving Controllers<br />Obstacle Detection<br />Stay Beside Other Car<br /><ul><li> First, profiled the AI during a typical race
  125. 125. Four components taking most of the frame time</li></ul>Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  126. 126. How did we make use of the PS3's unique hardware?<br />Stay Behind Other Car<br />Driving Controllers<br />Obstacle Detection<br />Stay Beside Other Car<br /><ul><li>Used four slightly different strategies when multithreading</li></ul>Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  127. 127. How did we make use of the PS3's unique hardware?<br />Obstacle Detection<br /> Obstacle Detection only component in its group<br /><ul><li>Very inefficient code for the SPU, but moved 1/3 onto 4 SPUs</li></ul>Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  128. 128. How did we make use of the PS3's unique hardware?<br />Stay Behind Other Car<br />Stay Beside Other Car<br /> Looked at Stay Behind/Beside Other Car together<br /> In the same group, can be run in parallel<br />Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  129. 129. How did we make use of the PS3's unique hardware?<br />Stay Behind Other Car<br />Stay Beside Other Car<br /> Moved Stay Behind component to SPU<br /> Stay Beside component would continue to be run on PPU<br />Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  130. 130. How did we make use of the PS3's unique hardware?<br />Stay Behind Other Car<br />Stay Beside Other Car<br /> As long as SPU work was less time than the PPU work, no cost!<br /> Effectively ‘hid’ the cost of calculating Stay Behind component<br />Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  131. 131. How did we make use of the PS3's unique hardware?<br />Driving Controllers<br /><ul><li> Lastly, driving controllers took 1/3 of AI cost alone</li></ul> Split the cars across 4 SPUs, and ran in parallel<br />Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  132. 132. How did we make use of the PS3's unique hardware?<br /><ul><li> In total ~170 source code changed
  133. 133. Changes were purely optimization</li></ul>AIObstacle** obstacles;<br />unsigned intnumObstacles;<br />offloadThread_thandle = __offload(obstacles, numObstacles)<br />{<br /> for(unsigned int i = 0; i < numObstacles; i++)<br /> {<br />AIObstacle* obstacle = obstacles[i];<br />// use obstacle for calculations<br />}<br />};<br />Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  134. 134. How did we make use of the PS3's unique hardware?<br /><ul><li> In total ~170 source code changed
  135. 135. Changes were purely optimization</li></ul>// array of AIObstacle* ’s on main memory<br />AIObstacle** obstacles;<br />unsigned intnumObstacles;<br />offloadThread_thandle = __offload(obstacles, numObstacles)<br />{<br /> for(unsigned int i = 0; i < numObstacles; i++)<br /> {<br />// AIObstacle* points to main memory<br />AIObstacle* obstacle = obstacles[i];<br />// use obstacle for calculations<br />}<br />};<br />Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  136. 136. How did we make use of the PS3's unique hardware?<br /><ul><li> In total ~170 source code changed
  137. 137. Changes were purely optimization</li></ul>// array of AIObstacle* ’s on main memory<br />AIObstacle** obstacles;<br />unsigned intnumObstacles;<br />offloadThread_thandle = __offload(obstacles, numObstacles)<br />{<br />CachedPointer<AIObstacle*><br />innerObstacles(obstacles, numObstacles);<br /> for(unsigned int i = 0; i < numObstacles; i++)<br /> {<br />// AIObstacle* points to main memory<br />CachedPointer<AIObstacle> obstacle(innerObstacles[i]);<br />// use obstacle for calculations<br />}<br />};<br />Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  138. 138. What performance improvement was achieved?<br />Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  139. 139. What performance improvement was achieved?<br />Obstacle Detection<br /><ul><li> Obstacle detection went from 2ms -> 1.1ms
  140. 140. ~100 lines of source code changed
  141. 141. 2½ weeks development time</li></ul>Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  142. 142. What performance improvement was achieved?<br />Obstacle Detection<br /><ul><li> Obstacle detection went from 2ms -> 1.1ms
  143. 143. ~100 lines of source code changed
  144. 144. 2½ weeks development time</li></ul>Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  145. 145. What performance improvement was achieved?<br />Stay Behind Other Car<br />Stay Beside Other Car<br /><ul><li> Stay Behind went from 1.1ms -> 0ms (hidden behind other)
  146. 146. ~50 lines of source code changed
  147. 147. 1 week development time</li></ul>Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  148. 148. What performance improvement was achieved?<br />Stay Behind Other Car<br />Stay Beside Other Car<br /><ul><li> Stay Behind went from 1.1ms -> 0ms (hidden behind other)
  149. 149. ~50 lines of source code changed
  150. 150. 1 week development time</li></ul>Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  151. 151. What performance improvement was achieved?<br />Driving Controllers<br /><ul><li> Driving Controllers went from 4ms -> 0.6ms
  152. 152. ~20 lines of source code changed
  153. 153. 8 hours development time</li></ul>Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  154. 154. What performance improvement was achieved?<br />Driving Controllers<br /><ul><li> Driving Controllers went from 4ms -> 0.6ms
  155. 155. ~20 lines of source code changed
  156. 156. 8 hours development time</li></ul>Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  157. 157. What performance improvement was achieved?<br /><ul><li> Performance speaks for itself!
  158. 158. 50% speed improvement on PS3</li></ul>Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  159. 159. Takeaway<br />Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  160. 160. Takeaway<br /><ul><li> It is possible to parallelise late in development
  161. 161. But need code ready to be parallelised
  162. 162. Small changes in coding style lead to hugely better results
  163. 163. Better to plan systems from beginning with multicore in mind</li></ul>Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />
  164. 164. Questions?<br />Can also catch me on twitter @sheredom<br />Neil Henning<br />neil@codeplay.com<br />Paris Game AI Conference 2011<br />

×