Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

166 views

Published on

Leszek Godlewski

Language: English

Video games are complex and non-deterministic systems. So complex, in fact, that some days the everyday breakpoint just doesn't cut it when you're looking for that next bug. Drawing from the experience of deploying three large titles to four platforms, this talk will discuss the different approaches and borderline magical tricks to debugging different parts of a game: noise filtering when our breakpoint is hit way too often, memory stomping, time-dependent bugs, rendering artifacts… Story of a game programmer's life.

Published in: Software
  • Be the first to comment

  • Be the first to like this

4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

  1. 1. Gamedev-grade debugging Leszek Godlewski, The Astronauts Source: http://igetyourfail.blogspot.com/2009/01/reaching-out-tale-of-failed-skinning.html
  2. 2. ● Engine Programmer, The Astronauts (Nov 2014 – present) – PS4 port of The Vanishing of Ethan Carter ● Programmer, Nordic Games (early 2014 – Nov 2014) ● Freelance Programmer (Sep 2013 – early 2014) ● Generalist Programmer, The Farm 51 (Mar 2010 – Aug 2013) Who is this guy?
  3. 3. Agenda ● How is gamedev different? ● Bug species ● Case studies ● Conclusions
  4. 4. StartStart Exit?Exit? EndEnd Yes No UpdateUpdate DrawDraw How is gamedev different?
  5. 5. 33 milliseconds ● How much time you have to get shit done™ – 30 Hz → 33⅓ ms per frame – 60 Hz → 16⅔ ms per frame EditorEditor Level toolsLevel tools Asset toolsAsset tools EngineEngine PhysicsPhysics RenderingRendering AudioAudio NetworkNetwork PlatformPlatform InputInput Network back-end Network back-end GameGame UIUI LogicLogic AIAI
  6. 6. Interdisciplinary working environment ● Designers – Game, Level, Quest, Audio… ● Artists – Environment, Character, 2D, UI, Concept… ● Programmers – Gameplay, Engine, Tools, UI, Audio… ● Writers ● Composers ● Actors ● Producers ● PR & Marketing Specialists ● … } Tightly woven teams
  7. 7. Severe, fixed hardware constraints ● Main reason for extensive use of native code
  8. 8. Different trade-offs Robustness Cost Performance Fun /Coolness Enterprise/B2B/webdev Gamedev
  9. 9. Indeterminism & complexity ● Leads to poor testability – Parts make no sense in isolation – What exactly is correct? – Performance regressions? Source: https://github.com/memononen/recastnavigation
  10. 10. Aversion to general software engineering ● Modelling ● Object-Oriented Programming ● Design patterns ● C++ STL ● Templates in general ● …
  11. 11. Agenda ● How is gamedev different? ● Bug species ● Case studies ● Conclusions
  12. 12. Source: http://benigoat.tumblr.com/post/100306422911/press-b-to-crouch Bug species
  13. 13. General programming bugs ● Memory access violations ● Memory stomping/buffer overflows ● Infinite loops ● Uninitialized variables ● Reference cycles ● Floating point precision errors ● Out-Of-Memory/memory fragmentation ● Memory leaks ● Threading errors
  14. 14. Bad maths ● Incorrect transform order – Matrix multiplication not commutative – AB ≠ BA ● Incorrect transform space Source: http://leadwerks.com/wiki/index.php?title=TFormQuat
  15. 15. Temporal bugs ● Incorrect update order – for (int i = 0; i < entities.size(); ++i) entities[i].update(); ● Incorrect interpolation/blending – Bad alpha term – Bad blending mode (additive/modulate) ● Deferred effects – After n frames – After n times an action happens – n may be random, indeterministic
  16. 16. Graphical glitches ● Incorrect render state ● Shader code bugs ● Precision Source: http://igetyourfail.blogspot.com/2009/01/visit-lake-fail-this-weekend.html
  17. 17. Content bugs ● Incorrect scripts ● Buggy assets Source: http://www.polycount.com/forum/showpost.php?p=1263124&postcount=10466
  18. 18. Worst part? ● Most cases are two or more of the aforementioned, intertwined
  19. 19. Agenda ● How is gamedev different? ● Bug species ● Case studies ● Conclusions
  20. 20. Most material captured by Case studies
  21. 21. Video settings not updating
  22. 22. Incorrect weapon after demon mode foreshadowing
  23. 23. Post-death sprint camera anim
  24. 24. Corpses teleported on death
  25. 25. Corpses teleported on death ● In normal gameplay, pawns have simplified movement – Sweep the actor's collision primitive through the world – Slide along slopes, stop against walls Source: http://udn.epicgames.com/Three/PhysicalAnimation.html
  26. 26. Corpses teleported on death ● Upon death, pawns switch to physics-based movement (ragdoll) Source: http://udn.epicgames.com/Three/PhysicalAnimation.html
  27. 27. Corpses teleported on death (cont.) ● Physics bodies have separate state from the game actor – Actor does not drive physics bodies, unless requested – If actor is driven by physics simulation, their location is synchronized to the hips bone body's Source: http://udn.epicgames.com/Three/PhysicalAnimation.html
  28. 28. Corpses teleported on death (cont.) ● Idea: breakpoint in FarMove()? – One function because world octree is updated – Function gets called a gazillion times per frame � – Terrible noise ● Breakpoint condition? – Teleport from arbitrary point A to arbitrary point B – Distance? ● Breakpoint sequence? – Break on death instead – When breakpoint hit, break in FarMove()
  29. 29. Corpses teleported on death (cont.) ● Cause: physics body driving the actor with out- of-date state ● Fix: request physics body state synchronization to animation before switching to ragdoll
  30. 30. Weapons floating away from the player
  31. 31. Weapons floating away from the player
  32. 32. Weapons floating away from the player ● Extremely rare, only encountered on consoles – Reproduction rate somewhere at 1 in 50 attempts – And never on developer machines � ● Player pawn in a special state for the rollercoaster ride – Many things could go wrong ● For the lack of repro, sprinkled the code with debug logs
  33. 33. Weapons floating away from the player (cont.) ● Cause: incorrect update order – for (int i = 0; i < entities.size(); ++i) entities[i].update(); – Player pawn forced to update after rollercoaster car – Possible for weapons to be updated before player pawns ● Fix: enforce weapon update after player pawns
  34. 34. Characters with “rapiers”
  35. 35. Characters with “rapiers” ● UE3 has ”content cooking” as part of game build pipeline – Redistributable builds are ”cooked” builds ● Artifact appears only in cooked builds
  36. 36. Characters with “rapiers” (cont.) ● Logs contained assertions for ”out-of-bounds vertices” ● Mesh vertex compression scheme – 32-bit float → 16-bit short int (~50% savings) – Find bounding sphere for all vertices – Normalize all vertices to said sphere radius – Map [-1; 1] floats to [-32768; 32767] 16-bit integers ● Assert condition – for (int i = 0; i < 3; ++i) assert(v[i] >= -1.f && v[i] <= 1.f, ”Out-of-bound vertex!”);
  37. 37. Characters with “rapiers” (cont.) ● v[i] was NaN – Interesting property of NaN: all comparisons fail – Even with itself ● float f = nanf(); bool b = (f == f); // b is false ● How did it get there?! ● Tracked the NaN all the way down to the raw engine asset!
  38. 38. Characters with “rapiers” (cont.) ● Cause: ??? ● Fix: re-export the mesh from 3D software – Magic!
  39. 39. Meta-case: undeniable assertion
  40. 40. Undeniable assertion ● Happened while debugging ”rapiers” ● Texture compression library without sources ● Flood of non-critical assertions – For almost every texture – Could not ignore in bulk � – Terrible noise ● Solution suggestion taken from [SINILO12]
  41. 41. Undeniable assertion (cont.) ● Enter disassembly
  42. 42. Undeniable assertion (cont.) ● Locate assert message function call instruction
  43. 43. Undeniable assertion (cont.) ● Enter memory view and look up the adress – 0xE8 is the CALL opcode – 4-byte address argument
  44. 44. Undeniable assertion (cont.) ● NOP it out! – 0x90 is the NOP opcode
  45. 45. Undeniable assertion (cont.)
  46. 46. Incorrect player movement
  47. 47. Incorrect player movement
  48. 48. Incorrect player movement ● Recreating player movement from one engine in another (Pain Engine → Unreal Engine 3) ● Different physics engines (Havok vs PhysX) ● Many nuances – Air control – Jump and fall heights – Slope & stair climbing & sliding down
  49. 49. Incorrect player movement (cont.) ● Main nuance: capsule vs cylinder
  50. 50. Incorrect player movement (cont.) ● Switching our pawn collision to capsule-based was not an option ● Emulate by sampling the ground under the cylinder instead ● No clever way to debug, just make it ”bug out” and break in debugger
  51. 51. Incorrect player movement (cont.) ● Situation when getting stuck ● Cause: vanilla UE3 code sent a player locked between non-walkable surfaces into the ”falling” state ● Fix: keep the player in the “walking” state
  52. 52. Incorrect player movement (cont.) ● Situation when moving without player intent ● Added visualization of sampling, turned on collision display ● Cause: undersampling ● Fix: increase radial sampling resolution 1) 2)
  53. 53. Blinking full-screen damage effects
  54. 54. Blinking full-screen damage effects ● Post-process effects are organized in one-way chains
  55. 55. Blinking full-screen damage effects (cont.) ● No debugger available to observe the PP chain ● Rolled my own overlay that walked and dumped the chain contents MaterialEffect 'Vignette' Param 'Strength' 0.83 [IIIIIIII ] MaterialEffect 'FilmGrain' Param 'Strength' 0.00 [ ] UberPostProcessEffect 'None' SceneHighLights (X=0.80,Y=0.80,Z=0.80) SceneMidTones (X=0.80,Y=0.80,Z=0.80) … MaterialEffect 'Blood' Param 'Strength' 1.00 [IIIIIIIIII]
  56. 56. Blinking full-screen damage effects (cont.) ● Cause: entire PP chain override – Breakpoint in chain setting revealed the level script as the source – Overeager level designer ticking one checkbox too many when setting up thunderstorm effects ● Fix: disable chain overriding altogether – No use case for it in our game anyway
  57. 57. Incorrect animation states
  58. 58. Incorrect animation states
  59. 59. Incorrect animation states
  60. 60. Incorrect animation states ● Animation in UE3 is done by evaluating a tree – Branches are weight-blended (either replacement or additive blend) – Sequences (raw animations) for whole-skeleton poses – Skeletal controls for fine-tuning of individual bones Source: http://udn.epicgames.com/Three/AnimTreeEditorUserGuide.html
  61. 61. Incorrect animation states (cont.) ● Prominent case for domain-specific debuggers ● No tools for that in UE3, rolled my own visualizer – Walks the animation tree and dumps active branches – Allows inspection of states, but not transitions – Conventional debugging still required, but greatly narrowed down
  62. 62. Incorrect animation states (cont.) ● Animation bug “checklist” ● Inspect the animation state in slow motion – Is the correct blending mode used? ● Inspect the AI and cutscene state – Capable of full animation overrides ● Inspect the assets (animation sequences) – Is the root bone correctly oriented? – Is the root bone motion correct? – Are inverse kinematics targets present and correctly placed? – Is the mesh skeleton complete and correct?
  63. 63. Incorrect animation states (cont.) ● Incorrect blend of reload animation – Cause: bad root bone orientation in animation sequence ● Left hand off the weapon – Cause: left hand inverse kinematics was off – Fix: revise IK state control code ● Left hand incorrectly oriented – Cause: bad IK target marker orientation on weapon mesh
  64. 64. Viewport stretched when portals are in view
  65. 65. Viewport stretched when portals are in view ● Graphics debugging is: – Tracing & recording graphics API calls – Replaying the trace – Reviewing the renderer state and resources ● Trace may be somewhat unreadable at first…
  66. 66. Viewport stretched when portals are… (cont.) ● Traces may be annotated for clarity – Direct3D: ID3DUserDefinedAnnotation – OpenGL: GL_KHR_debug (more info: [GODLEWSKI01])
  67. 67. Viewport stretched when portals are… (cont.) ● Quick renderer state inspection revealed that viewport dimensions were off – 1024x1024, 1:1 aspect ratio instead of 1280x720, 16:9 – Looks like shadow map resolution? ● Found the latest glViewport() call – Shadow map code indeed ● Why wasn't the viewport updated for main scene rendering?
  68. 68. Viewport stretched when portals are… (cont.) ● Renderer state changes are expensive – New state needs to be validated – Modern graphics APIs are asynchronous – State reading may requrie synchronization → stalls ● Cache the current renderer state to avoid redundant calls – Cache ↔ state divergence → bugs!
  69. 69. Viewport stretched when portals are… (cont.) ● Cause: cache ↔ state divergence – Difference between Direct3D and OpenGL: viewport dimensions as part of render target state, or global state ● Fix: tie viewport dimensions to render target in the cache
  70. 70. Black artifacts
  71. 71. Black artifacts
  72. 72. Black artifacts
  73. 73. Black artifacts
  74. 74. Black artifacts
  75. 75. Black artifacts ● First thing to do is to inspect the state ● Nothing suspicious found, turned to shaders ● On OpenGL 4.2+, shaders could be debugged in NSight… ● OpenGL 2.1, so had to resort to early returns from shader with debug colours – Shader equivalent of debug logs, a.k.a. ”Your Mum's Debugger” ● ”Shotgun debugging” with is*() functions – isnan(), isinf() ● isnan() returned true!
  76. 76. Black artifacts (cont.) ● Cause: undefined behaviour in NVIDIA's pow() implementation – Results are undefined if x < 0. Results are undefined if x = 0 and y <= 0. [GLSL120] – Undefined means the implementation is free to do whatever ● NVIDIA returns QNaN the Barbarian (displayed as black, poisoning all involved calculations) ● Other vendors usually return 0 ● Fix: for all pow() calls, clamp either: – Arguments to their proper ranges – Output to [0; ∞)
  77. 77. Mysterious crash
  78. 78. Mysterious crash ● Game in content lock (feature freeze) for a while ● Playstation 3 port nearly done ● Crash ~3-5 frames after entering a specific room ● First report included a perfectly normal callstack but no obvious reason ● QA reassigned to another task, could not pursue more ● Concluded it must've been an OOM crash
  79. 79. Mysterious crash (cont.) ● Bug comes back, albeit with wildly different callstack ● Asked QA to reproduce mutliple times, including other platforms – No crashes on X360 & Windows! ● Totally different callstack each time ● Confusion! – OOM? Even in 512 MB developer mode (256 MB in retail units)? – Bad content? – Console OS bug? – Audio thread? – ???
  80. 80. Mysterious crash (cont.) ● Reviewed a larger sample of callstacks ● Most ended in dlmalloc's integrity checks – Assertions triggered upon allocations and frees ● Memory stomping…? Could it be…?
  81. 81. Mysterious crash (cont.) ● Started researching memory debugging – No tools provided by Sony ● Tried using debug allocators (dmalloc et al.) – Most use the concept of memory fences – Difficult to hook up to UE3 malloc Regular allocation Fenced allocation malloc
  82. 82. Mysterious crash (cont.) ● Found and integrated a community-developed tool, Heap Inspector [VANDERBEEK14] – Memory analyzer – Focused on consumption and usage patterns monitoring – Records callstacks for allocations and frees ● Several reproduction attempts revealed a correlation – Crash adress – Construction of a specific class ● Gotcha!
  83. 83. Mysterious crash (cont.) // class declaration class Crasher extends ActorComponent; var int DummyArray[1024]; // in ammo consumption code Crash = new class'Crasher'; Comp = new class'ActorComponent' (Crash);
  84. 84. Mysterious crash (cont.) ● Cause: buffer overflow vulnerability in UnrealScript VM – No manifestation on X360 & Windows due to larger allocation alignment (8 vs 16 bytes) ● Fix: make copy-construction fail when template is a subclassed object ● I wish I had Valgrind! [GODLEWSKI02]
  85. 85. Agenda ● How is gamedev different? ● Bug species ● Case studies ● Conclusions
  86. 86. Takeaway ● Time is of the essence! ● Always on a tight schedule ● Constantly in motion – Temporal visualization is key – Custom, domain-specific tools ● Complex and indeterministic – Difficult to automate testing – Wide knowledge required ● Prone to bugs outside the code – Custom, domain-specific tools, again
  87. 87. Takeaway (cont.) ● Rendering is a whole separate beast – Absolutely custom tools in isolation from the rest of the game – Still far from ideal usability ● Good to know your machine down to the metal ● Good memory debugging tools make a world's difference ● You are never safe, not even in managed languages!
  88. 88. Questions? @ leszek.godlewski@theastronauts.com t @TheIneQuation K www.inequation.org
  89. 89. Thank you!
  90. 90. References ● SINILO12 – Sinilo, M. ”Coding in a debugger” [link] ● GODLEWSKI01 – Godlewski, L. ”OpenGL (ES) debugging” [link] ● GLSL120 – Kessenich, J. ”The OpenGL® Shading Language”, Language Version: 1.20, Document Revision: 8, p. 57 [link] ● VANDERBEEK14 – van der Beek, J. ”Heap Inspector” [link] ● GODLEWSKI02 – Godlewski, L. ”Advanced Linux Game Programming” [link]

×