Squeezing Every Drop Of Performance Out Of The iPhone

20,023 views
17,872 views

Published on

GDC Austin 2009 slides
https://www.cmpevents.com/GDAU09/a.asp?option=C&V=11&SessID=9869

This session will describe the iPhone performance optimization lessons learned through many hours of tuning. We'll start with an overview of the performance analysis tools available on the iPhone SDK to help you narrow down your performance bottlenecks. Then we'll cover the best way to set up your render loops, rendering best practices, how to deal with the limited memory, or even how to drop down to assembly to use the forgotten vector floating point unit.

Published in: Technology, News & Politics
2 Comments
14 Likes
Statistics
Notes
No Downloads
Views
Total views
20,023
On SlideShare
0
From Embeds
0
Number of Embeds
342
Actions
Shares
0
Downloads
265
Comments
2
Likes
14
Embeds 0
No embeds

No notes for slide
  • In the last 11 years I’ve made games for almost every major platform out there
  • In the last 11 years I’ve made games for almost every major platform out there
  • In the last 11 years I’ve made games for almost every major platform out there
  • In the last 11 years I’ve made games for almost every major platform out there
  • In the last 11 years I’ve made games for almost every major platform out there
  • In the last 11 years I’ve made games for almost every major platform out there
  • During that time, working on engine technology and trying to achieve maximum performance
    Performance always very important
  • Almost a year ago I started Snappy Touch
  • Almost a year ago I started Snappy Touch
  • Almost a year ago I started Snappy Touch
  • When you’re on the engine team of a AAA game, you can spend a lot of time on performance.
    On the iPhone, I didn’t think I had the time
    In a few days I had the prototype ported over with one small problem
  • When you’re on the engine team of a AAA game, you can spend a lot of time on performance.
    On the iPhone, I didn’t think I had the time
    In a few days I had the prototype ported over with one small problem
  • When you’re on the engine team of a AAA game, you can spend a lot of time on performance.
    On the iPhone, I didn’t think I had the time
    In a few days I had the prototype ported over with one small problem
  • When you’re on the engine team of a AAA game, you can spend a lot of time on performance.
    On the iPhone, I didn’t think I had the time
    In a few days I had the prototype ported over with one small problem
  • Lessons learned getting frame rate from 4 fps to ~30 fps
  • Lessons learned getting frame rate from 4 fps to ~30 fps
  • Lessons learned getting frame rate from 4 fps to ~30 fps
  • Lessons learned getting frame rate from 4 fps to ~30 fps
  • Lessons learned getting frame rate from 4 fps to ~30 fps
  • Best way to start.
    Frame rate over time plus CPU sampler
  • Digging deeper
  • Also good for tracking memory leaks (wish for more automated way though)
  • This is where the biggest bottlenecks were for Flower Garden at first
  • This is where the biggest bottlenecks were for Flower Garden at first
  • This is where the biggest bottlenecks were for Flower Garden at first
  • Still single core
  • Still single core
  • So plan accordingly!
  • So plan accordingly!
  • Main difference between games and apps is the constant amount of “stuff” happening on screen.
    Event-driven architecture is not a very good match for games
  • Main difference between games and apps is the constant amount of “stuff” happening on screen.
    Event-driven architecture is not a very good match for games
  • Main difference between games and apps is the constant amount of “stuff” happening on screen.
    Event-driven architecture is not a very good match for games
  • Main difference between games and apps is the constant amount of “stuff” happening on screen.
    Event-driven architecture is not a very good match for games
  • Main difference between games and apps is the constant amount of “stuff” happening on screen.
    Event-driven architecture is not a very good match for games
  • Main difference between games and apps is the constant amount of “stuff” happening on screen.
    Event-driven architecture is not a very good match for games
  • Main difference between games and apps is the constant amount of “stuff” happening on screen.
    Event-driven architecture is not a very good match for games
  • Main difference between games and apps is the constant amount of “stuff” happening on screen.
    Event-driven architecture is not a very good match for games
  • Main difference between games and apps is the constant amount of “stuff” happening on screen.
    Event-driven architecture is not a very good match for games
  • Main difference between games and apps is the constant amount of “stuff” happening on screen.
    Event-driven architecture is not a very good match for games
  • OK for games that don’t require a high/steady frame rate
  • OK for games that don’t require a high/steady frame rate
  • OK for games that don’t require a high/steady frame rate
  • The app becomes even less responsive
  • The app becomes even less responsive
  • The app becomes even less responsive
  • The app becomes even less responsive
  • The app becomes even less responsive
  • The app becomes even less responsive
  • The app becomes even less responsive
  • The app becomes even less responsive
  • That’s what I’m using in Flower Garden
  • That’s what I’m using in Flower Garden
  • No multiple cores, but we have a GPU
    How to parallelize the two?
  • No multiple cores, but we have a GPU
    How to parallelize the two?
  • No multiple cores, but we have a GPU
    How to parallelize the two?
  • No multiple cores, but we have a GPU
    How to parallelize the two?
  • Need to set correct scaling/biasing matrix on the other end
    Textures jumped around when made uvs a byte
  • Need to set correct scaling/biasing matrix on the other end
    Textures jumped around when made uvs a byte
  • Need to set correct scaling/biasing matrix on the other end
    Textures jumped around when made uvs a byte
  • You get the best performance without the degenerate verts of a strip
  • You get the best performance without the degenerate verts of a strip
  • Use multitexturing (2 texture combiners)
    Point sprites
  • Use multitexturing (2 texture combiners)
    Point sprites
  • This makes it clear that the iPhone is NOT a game console (unfortunately)
  • This makes it clear that the iPhone is NOT a game console (unfortunately)
  • This makes it clear that the iPhone is NOT a game console (unfortunately)
  • Apple wants you to play nice and work with any amount of memory
    Game developers want to count on the available memory and use it all
    Clash!
  • Apple wants you to play nice and work with any amount of memory
    Game developers want to count on the available memory and use it all
    Clash!
  • Apple wants you to play nice and work with any amount of memory
    Game developers want to count on the available memory and use it all
    Clash!
  • Apple wants you to play nice and work with any amount of memory
    Game developers want to count on the available memory and use it all
    Clash!
  • If you wait too long and nobody does anything, process is killed (no warning)
    It’s a giant game of chicken!
  • If you wait too long and nobody does anything, process is killed (no warning)
    It’s a giant game of chicken!
  • Be extremely careful with that! Potential for crashes in some phones if not enough memory.
  • Be extremely careful with that! Potential for crashes in some phones if not enough memory.
  • Be extremely careful with that! Potential for crashes in some phones if not enough memory.
  • Be extremely careful with that! Potential for crashes in some phones if not enough memory.
  • Be extremely careful with that! Potential for crashes in some phones if not enough memory.
  • Be extremely careful with that! Potential for crashes in some phones if not enough memory.
  • Squeezing Every Drop Of Performance Out Of The iPhone

    1. Squeezing Every Drop Of Performance Out Of The iPhone Noel Llopis Snappy Touch http://twitter.com/snappytouch noel@snappytouch.com http://gamesfromwithin.com
    2. Running at 40 FPS on first attempt...
    3. Running at 40 FPS on first attempt... ... on the simulator!
    4. Running at 40 FPS on first attempt... ... on the simulator! 4 FPS!
    5. • Performance analysis tools
    6. • Performance analysis tools • CPU
    7. • Performance analysis tools • CPU • Game loop architecture
    8. • Performance analysis tools • CPU • Game loop architecture • Rendering
    9. • Performance analysis tools • CPU • Game loop architecture • Rendering • Memory
    10. Performance Analysis Tools
    11. Exploratory
    12. Exploratory • Turning features on and off in real time
    13. Exploratory • Turning features on and off in real time • Works very well to get very rough estimates
    14. Exploratory • Turning features on and off in real time • Works very well to get very rough estimates • Simulation vs. geometry creation vs. rendering
    15. Instruments
    16. Instruments
    17. Instruments
    18. Instruments
    19. Instruments • Versatile
    20. Instruments • Versatile • Can display frame rate, object allocations, memory usage, or even OpenGL calls.
    21. Instruments • Versatile • Can display frame rate, object allocations, memory usage, or even OpenGL calls. • Rut it on the real device, not the simulator!
    22. Instruments
    23. Instruments
    24. Instruments • TODO: Screenshot digging deeper
    25. Instruments
    26. Instruments • TODO: Screenshot with memory leaks
    27. Instruments
    28. Instruments • First pass with Instruments confirmed that Flower Garden was simulation bound.
    29. Instruments • First pass with Instruments confirmed that Flower Garden was simulation bound. • Especially the matrix multiplies and geometry creation sections.
    30. Shark
    31. Shark • CPU usage
    32. Shark • CPU usage • More in depth than Instruments
    33. Shark • CPU usage • More in depth than Instruments • Can even report good info on cache misses
    34. Shark
    35. CPU
    36. CPU
    37. CPU • 32-bit RISC ARM 11
    38. CPU • 32-bit RISC ARM 11 • 400-535Mhz
    39. CPU • 32-bit RISC ARM 11 • 400-535Mhz • iPhone 2G/3G and iPod Touch 1st and 2nd gen
    40. CPU (iPhone 3GS)
    41. CPU (iPhone 3GS) • Cortex-A8 600MHz
    42. CPU (iPhone 3GS) • Cortex-A8 600MHz • More advanced architecture
    43. Thumb Mode
    44. Thumb Mode
    45. Thumb Mode • CPU has a special thumb mode.
    46. Thumb Mode • CPU has a special thumb mode. • Less memory, maybe better performance.
    47. Thumb Mode • CPU has a special thumb mode. • Less memory, maybe better performance. • No floating point support.
    48. Thumb Mode • CPU has a special thumb mode. • Less memory, maybe better performance. • No floating point support. • It’s on by default!
    49. Thumb Mode • CPU has a special thumb mode. • Less memory, maybe better performance. • No floating point support. • It’s on by default! • Potentially HUGE wins turning it off.
    50. Thumb Mode • CPU has a special thumb mode. • Less memory, maybe better performance. • No floating point support. • It’s on by default! • Potentially HUGE wins turning it off.
    51. Thumb Mode
    52. Thumb Mode • Turning off Thumb mode increased performance in Flower Garden by over 2x
    53. Thumb Mode • Turning off Thumb mode increased performance in Flower Garden by over 2x • Heavy usage of floating point operations though
    54. Thumb Mode • Turning off Thumb mode increased performance in Flower Garden by over 2x • Heavy usage of floating point operations though • Most games will probably benefit from turning it off (especially 3D games)
    55. Integer Divide
    56. Integer Divide There is no integer divide
    57. Vector FP Unit
    58. Vector FP Unit • The main CPU has no floating point support.
    59. Vector FP Unit • The main CPU has no floating point support. • Compiled C/C++/ObjC code uses the vector floating point unit for any floating point operations.
    60. Vector FP Unit • The main CPU has no floating point support. • Compiled C/C++/ObjC code uses the vector floating point unit for any floating point operations. • Can program the VFP in assembly for max performance.
    61. Vector FP Unit
    62. Vector FP Unit • Big win when many floating point operations can be vectorized.
    63. Vector FP Unit • Big win when many floating point operations can be vectorized. • Matrix multiplies, skinning, etc
    64. Vector FP Unit • Big win when many floating point operations can be vectorized. • Matrix multiplies, skinning, etc • vfpmath project in Google Code
    65. Vector FP Unit void Matrix4Mul(const float* src_mat_1, const float* src_mat_2, float* dst_mat) { asm volatile (VFP_SWITCH_TO_ARM VFP_VECTOR_LENGTH(3) "fldmias %2, {s8-s23} nt" "fldmias %1!, {s0-s3} nt" "fmuls s24, s8, s0 nt" "fmacs s24, s12, s1 nt" //...
    66. Game Loop Architecture
    67. Game loop
    68. Game loop Gather input
    69. Game loop Gather input Update state
    70. Game loop Gather input Update state Render frame
    71. Game loop Gather input Repeat 60 times per second Update state Render frame
    72. NSTimer
    73. NSTimer • Easiest way to drive the main loop
    74. NSTimer • Easiest way to drive the main loop • All the samples from Apple
    75. NSTimer • Easiest way to drive the main loop • All the samples from Apple m_gameLoopTimer = [NSTimer scheduledTimerWithTimeInterval:animationInterval target:self selector:@selector(doFrame) userInfo:nil repeats:YES];
    76. NSTimer
    77. NSTimer • Not very accurate. Frames can vary by as much as 5-10 ms!
    78. NSTimer • Not very accurate. Frames can vary by as much as 5-10 ms! Event
    79. NSTimer • Not very accurate. Frames can vary by as much as 5-10 ms! Event Event
    80. NSTimer • Not very accurate. Frames can vary by as much as 5-10 ms! NSTimer Event Event
    81. NSTimer • Not very accurate. Event Frames can vary by as much as 5-10 ms! NSTimer Event Event
    82. NSTimer NSTimer • Not very accurate. Event Frames can vary by as much as 5-10 ms! NSTimer Event Event
    83. NSTimer Event NSTimer • Not very accurate. Event Frames can vary by as much as 5-10 ms! NSTimer Event Event
    84. NSTimer
    85. NSTimer • Call it at a higher frequency so main loop is called right away.
    86. NSTimer • Call it at a higher frequency so main loop is called right away. Event
    87. NSTimer • Call it at a higher frequency so main loop is called right away. NSTimer Event
    88. NSTimer • Call it at a higher frequency so main loop is called right away. NSTimer NSTimer Event
    89. NSTimer • Call it at a higher frequency so main loop is called right away. NSTimer NSTimer NSTimer Event
    90. NSTimer • Call it at a higher frequency so main loop NSTimer is called right away. NSTimer NSTimer NSTimer Event
    91. NSTimer • Call it at a higher Event frequency so main loop NSTimer is called right away. NSTimer NSTimer NSTimer Event
    92. NSTimer • Call it at a higher Event frequency so main loop NSTimer is called right away. NSTimer • Unfortunately that NSTimer floods the message NSTimer queue and causes delay Event on touch events
    93. Threads
    94. Threads • Put the game on a separate thread from the UI one.
    95. Threads • Put the game on a separate thread from the UI one. • Runs as fast as possible without delays
    96. Threads
    97. Threads • Careful what you do when parsing input events from main thread
    98. Threads • Careful what you do when parsing input events from main thread • Can be more complex to debug
    99. Threads • Careful what you do when parsing input events from main thread • Can be more complex to debug • Running as fast as possible might not always be desirable (draining battery life)
    100. Threads
    101. Threads • Can also split game update and rendering in two different threads.
    102. Threads • Can also split game update and rendering in two different threads. • Syncing the two is much more difficult
    103. Threads • Can also split game update and rendering in two different threads. • Syncing the two is much more difficult • Can only use one thread for each OpenGL context
    104. Threads • Can also split game update and rendering in two different threads. • Syncing the two is much more difficult • Can only use one thread for each OpenGL context • Theoretically best results, but maybe not much difference
    105. Thread Driving Loop
    106. Thread Driving Loop m_mainLoop = [[NSThread alloc] initWithTarget:self selector:@selector(mainLoopTimer) object:nil]; [m_mainLoop start];
    107. Thread Driving Loop m_mainLoop = [[NSThread alloc] initWithTarget:self selector:@selector(mainLoopTimer) object:nil]; [m_mainLoop start]; - (void)mainLoopTimer { while (![[NSThread currentThread] isCancelled]) { [self performSelectorOnMainThread:@selector(doFrame) withObject:nil waitUntilDone:YES]; usleep(2000); } }
    108. Thread driving loop
    109. Thread driving loop • Can be best of both worlds
    110. Thread driving loop • Can be best of both worlds • Consistent call to main loop
    111. Thread driving loop • Can be best of both worlds • Consistent call to main loop • No flooding of events (cuts in line)
    112. CADisplayLink
    113. CADisplayLink • Triggered by display refresh
    114. CADisplayLink • Triggered by display refresh • Can choose to get a call every X updates
    115. CADisplayLink • Triggered by display refresh • Can choose to get a call every X updates • Consistent update
    116. CADisplayLink • Triggered by display refresh • Can choose to get a call every X updates • Consistent update • Perfect for games
    117. CADisplayLink • Triggered by display refresh • Can choose to get a call every X updates • Consistent update • Perfect for games m_displayLink  = [CADisplayLink displayLinkWithTarget: selfselector:@selector(mainLoop:)]; m_displayLink.frameInterval = 2; [m_displayLink addToRunLoop :[NSRunLoopcurrentRunLoop] forMode:NSDefaultRunLoopMode];
    118. CADisplayLink
    119. CADisplayLink • Only available on SDK 3.1
    120. CADisplayLink • Only available on SDK 3.1 • Definitely the way to go for future games
    121. Rendering
    122. iPhone 2G iPhone 3G iPod Touch 1st, 2nd, and some 3rd gen
    123. iPhone 2G iPhone 3G iPod Touch 1st, 2nd, and some 3rd gen iPhone 3GS High-end 3rd gen iPod Touch
    124. Game loop
    125. Game loop
    126. Game loop
    127. Game loop
    128. Game loop
    129. Game loop
    130. Game loop • Avoid locking some resource while the GPU is rendering.
    131. Game loop • Avoid locking some resource while the GPU is rendering. • Might get a small speed up by first presenting the frame, then updating game state, and finally rendering.
    132. Game loop • Avoid locking some resource while the GPU is rendering. • Might get a small speed up by first presenting the frame, then updating game state, and finally rendering. • Almost no difference in Flower Garden
    133. Rendering
    134. Rendering • A lot of the usual advice for GPUs:
    135. Rendering • A lot of the usual advice for GPUs: • Minimize draw primitive calls
    136. Rendering • A lot of the usual advice for GPUs: • Minimize draw primitive calls • Minimize state changes
    137. Rendering • A lot of the usual advice for GPUs: • Minimize draw primitive calls • Minimize state changes • ...
    138. Vertices
    139. Vertices • Rendering vertices can become a bottleneck
    140. Vertices • Rendering vertices can become a bottleneck • No true Vertex Buffer Objects on the MBX, so CPU involved in every draw call
    141. Vertices • Rendering vertices can become a bottleneck • No true Vertex Buffer Objects on the MBX, so CPU involved in every draw call • Fixed on the 3GS though
    142. Vertices
    143. Vertices • Make vertices as small as possible
    144. Vertices • Make vertices as small as possible struct PetalVertex { #ifdef PETAL_SMALL_VERTEX int16_t x, y, z; int16_t u, v; int16_t u1, v1; #else float x, y, z; float u, v; float u1, v1; #endif };
    145. Vertices • Make vertices as small as possible struct PetalVertex { #ifdef PETAL_SMALL_VERTEX int16_t x, y, z; int16_t u, v; int16_t u1, v1; #else float x, y, z; float u, v; float u1, v1; #endif };
    146. Vertices
    147. Vertices • Align vertices on at least 4-byte boundaries
    148. Vertices • Align vertices on at least 4-byte boundaries • Experiment with larger alignments
    149. Vertices
    150. Vertices • Use indexed lists...
    151. Vertices • Use indexed lists... • ... ordered as if they were strips.
    152. Mixing OpenGL and UIKit
    153. Mixing OpenGL and UIKit • You can mix and match OpenGL and UIKit
    154. Mixing OpenGL and UIKit • You can mix and match OpenGL and UIKit • But you have to be very careful
    155. Mixing OpenGL and UIKit • You can mix and match OpenGL and UIKit • But you have to be very careful • Minimize UIKit elements on top of EAGLView
    156. Mixing OpenGL and UIKit • You can mix and match OpenGL and UIKit • But you have to be very careful • Minimize UIKit elements on top of EAGLView • Keep them opaque
    157. Mixing OpenGL and UIKit • You can mix and match OpenGL and UIKit • But you have to be very careful • Minimize UIKit elements on top of EAGLView • Keep them opaque • OK to have smaller EAGLView
    158. Use iPhone Features
    159. Use iPhone Features
    160. Use iPhone Features
    161. Memory
    162. !=
    163. !=
    164. Memory rant
    165. Memory rant • No memory guarantees
    166. Memory rant • No memory guarantees • No idea of how much memory will be available
    167. Memory rant • No memory guarantees • No idea of how much memory will be available • Not even how much memory at startup!
    168. Memory rant • No memory guarantees • No idea of how much memory will be available • Not even how much memory at startup! • Sometimes I’ve seen the game start with as low as 3MB free memory!!!
    169. Memory
    170. Memory • System notifies apps when memory is running low
    171. Memory • System notifies apps when memory is running low • Apps are supposed to free as much memory as they can
    172. Memory • System notifies apps when memory is running low • Apps are supposed to free as much memory as they can • Horrible situation for games!
    173. Memory
    174. Memory
    175. Memory
    176. Memory WARNING!!
    177. Memory WARNING!!
    178. Memory
    179. Memory WARNING!!
    180. Memory Wait... WARNING!!
    181. Memory Wait... ... wait for it ... WARNING!!
    182. Memory Wait... ... wait for it ... WARNING!!
    183. Memory WARNING!!
    184. Memory WARNING!!
    185. Memory WARNING!!
    186. Memory trick
    187. Memory trick • Do you prefer to allocate your memory all at once at initialization time?
    188. Memory trick • Do you prefer to allocate your memory all at once at initialization time? • Allocate memory slowly.
    189. Memory trick • Do you prefer to allocate your memory all at once at initialization time? • Allocate memory slowly. • When you get a warning, wait a little.
    190. Memory trick • Do you prefer to allocate your memory all at once at initialization time? • Allocate memory slowly. • When you get a warning, wait a little. • If free memory doesn’t increase, release some.
    191. Memory trick • Do you prefer to allocate your memory all at once at initialization time? • Allocate memory slowly. • When you get a warning, wait a little. • If free memory doesn’t increase, release some. • Repeat until enough memory
    192. Memory trick • Do you prefer to allocate your memory all at once at initialization time? • Allocate memory slowly. • When you get a warning, wait a little. • If free memory doesn’t increase, release some. • Repeat until enough memory
    193. Thank you! Twitter: @snappytouch Email: noel@snappytouch.com Blog: http://gamesfromwithin.com

    ×