Successfully reported this slideshow.
Your SlideShare is downloading. ×

Get Multicore Differentiation and Great Integrated Graphics Performance

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Upcoming SlideShare
READER/WRITER SOLUTION
READER/WRITER SOLUTION
Loading in …3
×

Check these out next

1 of 98 Ad

Get Multicore Differentiation and Great Integrated Graphics Performance

Download to read offline

While developing Total War*: THREE KINGDOMS, Creative Assembly* collaborated with Intel to get the most out of modern, multicore processors. Join us as team members share their tips and tricks for achieving great performance on the latest integrated GPUs.

While developing Total War*: THREE KINGDOMS, Creative Assembly* collaborated with Intel to get the most out of modern, multicore processors. Join us as team members share their tips and tricks for achieving great performance on the latest integrated GPUs.

Advertisement
Advertisement

More Related Content

Similar to Get Multicore Differentiation and Great Integrated Graphics Performance (20)

More from Intel® Software (20)

Advertisement

Recently uploaded (20)

Get Multicore Differentiation and Great Integrated Graphics Performance

  1. 1. Campaign is turn-based
  2. 2. Battle is RTS
  3. 3. Game Tick thread is running ~10 fps
  4. 4. Main Thread and Render Thread are running in sync
  5. 5. Game Tick builds game state and passes it to main thread
  6. 6. Worker Threads
  7. 7. We calculate intersection with all the frustums at once
  8. 8. Intersection results are stored in a bitfield
  9. 9. We calculate the area of the projected bounding box and cull small models
  10. 10. Then we select the LOD level based on the approximated average triangle size on screen
  11. 11. Then we select the LOD level based on the approximated average triangle size on screen
  12. 12. Next step is building instance data for the individual parts
  13. 13. Based on the mesh and instance data and the material we figure out which parts of the pipeline the mesh needs to be rendered to.
  14. 14. Each pipeline stage uses the same instance data.
  15. 15. Every single mesh has a number of instance lists. One instance list per pipeline stage. The instance is added to all the relevant instance lists.
  16. 16. The instance lists are duplicated to every worker thread to ensure lockless access.
  17. 17. And double buffered because while the render thread is rendering the current frame, the main thread (and workers) Is adding instances to the next frame’s instance lists.
  18. 18. We have one mesh list per mesh type. Every mesh with at least a single instance in the frame will get added to exactly one mesh list matching its mesh type.
  19. 19. Just like the instance lists we have one mesh list for every single worker thread.
  20. 20. Double buffered too.
  21. 21. We start by processing the mesh lists We run one task per mesh type
  22. 22. Per-thread mesh lists are combined to a single list of meshes
  23. 23. Per-thread mesh lists are combined to a single list of meshes
  24. 24. Then we process all the meshes in the list one by one
  25. 25. Each mesh has instance lists per pipeline stage
  26. 26. We start with the first pipeline stage and combine the instances there into a single list
  27. 27. Then we process the instances
  28. 28. Then process all subsequent stages sequentially
  29. 29. Instance data is now uploaded to the GPU and prepared for batching
  30. 30. The actual rendering follows the different pipeline stages
  31. 31. Each stage renders a selection of mesh types
  32. 32. The meshes are prepared by the render thread workers
  33. 33. We have to start by waiting for the preparation tasks to be finished
  34. 34. We have to start by waiting for the preparation tasks to be finished
  35. 35. The tasks process all meshes/instances for all pipeline stages We can use an atomic counter per pipeline stage
  36. 36. No need to further split the tasks We just increased the granularity of waiting for other sub-tasks
  37. 37. Usually everything is ready after the short initial wait
  38. 38. Usually everything is ready after the short initial wait
  39. 39. Usually everything is ready after the short initial wait
  40. 40. The basic building blocks of the system are emitters
  41. 41. Emitters are emitting a number of particles over time
  42. 42. Particles are often just sorted per emitter
  43. 43. Problems start happening when the emitters overlap
  44. 44. Problems start happening when the emitters overlap
  45. 45. We sort all the particles together
  46. 46. Emission takes place on the CPU
  47. 47. Sorting is responsible for moving dead particles to one end of the GPU buffer
  48. 48. This make uploading new particles trivial
  49. 49. Running out of particle space stomps over particles farthest from camera
  50. 50. (for Total War:THREE KINGDOMS) We were confident that order-independent transparency is the solution
  51. 51. First try was WBOIT
  52. 52. Second try was MBOIT
  53. 53. TW:3K was released with MBOIT

×