Successfully reported this slideshow.

Advanced Linux Game Programming

11

Share

Upcoming SlideShare
OpenGL (ES) debugging
OpenGL (ES) debugging
Loading in …3
×
1 of 98
1 of 98

Advanced Linux Game Programming

11

Share

Download to read offline

Presenter notes: https://www.dropbox.com/s/lq6oxuw1s3bhoun/Advanced%20Linux%20Game%20Programming%20%E2%80%93%20Presenter%20Notes.pdf

Ever since the advent of SteamOS, interest in game development for Linux has seen an increase. This lecture aims to address some more advanced issues encountered by programmers on this platform, beyond the very basic Linux setup, and drawing from over a year and two and a half games of experience in the subject. The areas discussed will be:

• Executable build improvements
• Crash handling and reporting
• Memory debugging
• OpenGL instrumentation and debugging
• Various caveats, tips and tricks.

Presenter notes: https://www.dropbox.com/s/lq6oxuw1s3bhoun/Advanced%20Linux%20Game%20Programming%20%E2%80%93%20Presenter%20Notes.pdf

Ever since the advent of SteamOS, interest in game development for Linux has seen an increase. This lecture aims to address some more advanced issues encountered by programmers on this platform, beyond the very basic Linux setup, and drawing from over a year and two and a half games of experience in the subject. The areas discussed will be:

• Executable build improvements
• Crash handling and reporting
• Memory debugging
• OpenGL instrumentation and debugging
• Various caveats, tips and tricks.

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

Advanced Linux Game Programming

  1. 1. Advanced Linux Game Programming Leszek Godlewski Programmer, Nordic Games
  2. 2. Nordic Games GmbH • Started in 2011 as a sister company to Nordic Games Publishing (We Sing) • Base IP acquired from JoWooD and DreamCatcher (SpellForce, The Guild, Aquanox, Painkiller) • Initially focusing on smaller, niche games • Acquired THQ IPs in 2013 (Darksiders, Titan Quest, Red Faction, MX vs. ATV) • Now shifting towards being a production company with internal devs • Since fall 2013: internal studio in Munich, Germany (Grimlore Games) 2
  3. 3. Leszek Godlewski Programmer, Nordic Games • Ports ● Painkiller Hell & Damnation (The Farm 51) ● Deadfall Adventures (The Farm 51) ● Darksiders (Nordic Games) • Formerly generalist programmer on PKHD & DA at TF51 3
  4. 4. Objective of this talk Your game engine on Linux, before porting: 4 Missing!
  5. 5. Objective of this talk (cont.) Your first “working” Linux port: 5 Oops. Bat-Signal!
  6. 6. Objective of this talk (cont.) Where I want to try helping you get to: 6
  7. 7. In other words, from this: 7
  8. 8. To this: 8
  9. 9. And that's mostly debugging All sorts of debuggers! 9 apitrace
  10. 10. Demo code available is.gd/GDCE14Linux 10
  11. 11. Intended takeaway ● Build system improvements ● Signal handlers ● Memory debugging with Valgrind ● OpenGL debugging techniques 11
  12. 12. Intended takeaway Agenda ● Build system improvements ● Signal handlers ● Memory debugging with Valgrind ● OpenGL debugging techniques 12
  13. 13. Build systems What I had initially with UE3: ● Copy/paste of the Mac OS X toolchain ● It worked, but... ● Slow ● Huge binaries because of debug symbols ● Problematic linking of circular dependencies 13
  14. 14. Build systems (cont.) ● 32-bit binaries required for feature/hardware parity with Windows ● Original solution: a chroot jail with an entire 32-bit Ubuntu system just for building 14
  15. 15. Cross-compiling for 32/64-bit ● gcc -m32/-m64 is not enough! ● Only sets target code generation ● Not headers & libraries (CRT, OpenMP, libgcc etc.) ● Fixed by installing gcc-multilib ● Dependency package for non-default architectures (i.e. i386 on an amd64 system and vice versa) 15
  16. 16. Clang (ad nauseam) ● Clang is faster ● gcc: 3m47s ● Clang: 3m05s ● More benchmarks at Phoronix [LARABEL13] ● Clang has different diagnostics than gcc 16
  17. 17. Clang (cont.) ● Preprocessor macro compatibility ● Declares __GNUC__ etc. ● Command line compatibility ● Easily switch back & forth between Clang & gcc 17
  18. 18. Clang – caveats ● C++ object files may be incompatible with gcc & fail to link (need full rebuilds) ● Clang is not as mature as gcc ● Occasionally has generated faulty code for me (YMMV) 18
  19. 19. Clang – caveats (cont.) ● Slight inconsistencies in C++ standard strictness ● Templates ● Anonymous structs/unions ● May need to add this-> in some places ● May need to name some anonymous types 19
  20. 20. So: Clang or gcc? Both: ● Clang – quick iterations during development ● gcc – final shipping binaries 20
  21. 21. Linking – GNU ld ● Default linker on Linux ● Ancient ● Single-threaded ● Requires specification of libraries in the order of reverse dependency... ● We are not doomed to use it! 21
  22. 22. Linking – GNU gold ● Multi-threaded linker for ELF binaries ● ld: 18s ● gold: 5s ● Developed at Google, now officially part of GNU binutils 22
  23. 23. Linking – GNU gold (cont.) ● Drop-in replacement for ld ● May need an additional parameter or toolchain setup ● clang++ -B/usr/lib/gold-ld ... ● g++ -fuse-ld=gold ... ● Still needs libs in the order of reverse dependency... 23
  24. 24. Linking – reverse dependency ● Major headache/game-breaker with circular dependencies ● ”Proper” fix: re-specify the same libraries over and over again ● gcc app.o -lA -lB -lA 24
  25. 25. Linking – reverse dep. (cont.) 25 app A B
  26. 26. Linking – reverse dep. (cont.) 26 app A B
  27. 27. Linking – reverse dep. (cont.) 27 app A B
  28. 28. Linking – reverse dep. (cont.) 28 app A B A Just the missing symbols
  29. 29. Linking – library groups ● Declare library groups instead ● Wrap library list with --start-group, --end- group ● Shorthand: -(, -) ● g++ foo.obj -Wl,-( -lA -lB -Wl,-) ● Results in exhaustive search for symbols 29
  30. 30. Linking – library groups (cont.) ● Actually used for non-library objects (TUs) ● Caveat: the exhaustive search! ● Manual warns of possible performance hit ● Not observed here, but keep that in mind! 30
  31. 31. Running the binary in debugger inequation@spearhead:~/projects/largebinary$ gdb -– silent largebinary Reading symbols from /home/inequation/projects/larg ebinary/largebinary... [zzz... several minutes later...] done. (gdb) 31
  32. 32. Caching the gdb-index ● Large codebases generate heavy debug symbols (hundreds of MBs) ● GDB does symbol indexing at every single startup � ● Massive waste of time! 32
  33. 33. Caching the gdb-index (cont.) ● Solution: fold indexing into the build process ● Old linkers: as described in [GNU01] ● New linkers (i.e. gold): --gdb-index ● May need to forward from compiler driver: -Wl,--gdb-index 33
  34. 34. Agenda ● Build system improvements ● Signal handlers ● Memory debugging with Valgrind ● OpenGL debugging techniques 34
  35. 35. Signal handlers ● Unix signals are async notifications ● Sources can be: ● the process itself ● another process ● user ● kernel 35
  36. 36. Signal handlers (cont.) ● A lot like interrupts ● Jump to handler upon first non-atomic op 36 Normal program flow Normal flow resumes Signal handler execution SIGNAL
  37. 37. Signal handlers (cont.) ● System installs a default handler ● Usually terminates and/or dumps core ● Core ≈ minidump in Windows parlance, but entire mapped address range is dumped (truncated to RLIMIT_CORE bytes) ● See signal(7) for default actions 37
  38. 38. Signal handlers (cont.) ● Can (should!) specify custom handlers ● Get/set handlers via sigaction(2) ● void handler(int, siginfo_t *, void *); ● Needs SA_SIGINFO flag in sigaction() call ● Extensively covered in [BENYOSSEF08] 38
  39. 39. Interesting siginfo_t fields ● si_code – reason for sending the signal ● Examples: signal source, FP over/underflow, memory permissions, unmapped address ● si_addr – memory location (if relevant) ● SIGILL, SIGFPE, SIGSEGV, SIGBUS and SIGTRAP 39
  40. 40. Interesting signals ● Worth catching ● SIGSEGV, SIGILL, SIGHUP, SIGQUIT, SIGTRAP, SIGIOT, SIGBUS, SIGFPE, SIGTERM, SIGINT ● Worth ignoring ● signal(signum, SIG_IGN); ● SIGCHLD, SIGPIPE 40
  41. 41. Signal handling caveats ● Prone to race conditions ● Signals may be nested 41 Normal flow Signal handler SIGNAL Signal handler SIGNAL SIGNAL
  42. 42. Signal handling caveats (cont.) ● Prone to race conditions ● Can't share locks with the main program 42 Lock mutex Lock mutex Deadlock ☹ Signal handler SIGNAL Normal flow
  43. 43. Signal handling caveats (cont.) ● Prone to race conditions ● Can't call async-unsafe/non-reentrant functions ● See signal(7) for a list of safe ones ● Notable functions not on the list: ● printf() and friends (formatted output) ● malloc() and free() 43
  44. 44. Signal handling caveats (cont.) ● Not safe to allocate or free heap memory 44 STOMPED? STOMPED? STOMPED? STOMPED? STOMPED? STOMPED? STOMPED? STOMPED? STOMPED? STOMPED? STOMPED? Source: [LEA01]
  45. 45. Signal handling caveats (cont.) ● Custom handlers do not dump core ● At handler installation time: ● Raise RLIMIT_CORE to desired core size ● Inside handler, after custom logging: ● Restore default handler using signal(2) or sigaction(2) ● raise(signum); 45
  46. 46. Safe stack walking ● glibc provides backtrace(3) and friends ● Symbols are read from the dynamic symbol table ● Pass -rdynamic at compile-time to populate 46
  47. 47. Safe stack walking (cont.) ● backtrace_symbols() internally calls malloc() ● Not safe... ☹ ● Still, can get away with it most of the time (YMMV) 47
  48. 48. Example “proper” solution ● Fork a watchdog process in main() ● Communicate over a FIFO pipe ● In signal handler: ● Collect & send information down the pipe ● backtrace_symbols_fd() down the pipe ● Demo code: is.gd/GDCE14Linux 48
  49. 49. Agenda ● Build system improvements ● Signal handlers ● Memory debugging with Valgrind ● OpenGL debugging techniques 49
  50. 50. Is this even related to porting? ● Yes! Portability bugs easily overlooked ● Hardcoded struct sizes/offsets ● OpenGL buffers ● Incorrect binary packing/unpacking ● “How did we/they manage to ship that?!” 50
  51. 51. What is Valgrind? ● Framework for dynamic, runtime analysis ● Dynamic recompilation ● machine code IR tool machine code→ → → ● Performance typically at 25-20% of unmodified code ● Worse if heavily threaded – execution is serialized 51
  52. 52. What is Valgrind? (cont.) ● Many tools in it: ● Memory error detectors (Memcheck, SGcheck) ● Cache profilers (Cachegrind, Callgrind) ● Thread error detectors (Helgrind, DRD) ● Heap profilers (Massif, DHAT) 52
  53. 53. Memcheck basics ● Basic usage extremely simple ● …as long as you use the vanilla libc malloc() ● valgrind ./app ● Will probably report a ton of errors on the first run! ● Again: “How did they manage to ship that?!” 53
  54. 54. Memcheck basics (cont.) ● Many false positives, esp. in 3rd parties ● Xlib, NVIDIA driver ● Can suppress them via suppress files ● Call Valgrind with --gen-suppressions=yes to generate suppression definitions ● Be careful with that! Can let OpenGL bugs slip! 54
  55. 55. Contrived example #include <stdlib.h> int main(int argc, char *argv[]) { int foo, *ptr1 = &foo; int *ptr2 = malloc(sizeof(int)); if (*ptr1) ptr2[1] = 0xabad1dea; else ptr2[1] = 0x15bad700; ptr2[0] = ptr2[2]; return *ptr1; } 55 Demo code: is.gd/GDCE14Linux
  56. 56. Valgrind output for such ==8925== Conditional jump or move depends on uninitialised value(s) ==8925== Invalid write of size 4 ==8925== Invalid read of size 4 ==8925== Syscall param exit_group(status) contains uninitialised byte(s) ==8925== LEAK SUMMARY: ==8925== definitely lost: 4 bytes in 1 blocks 56
  57. 57. What about custom allocators? ● Custom memory pool & allocation algo ● Valgrind only “sees” mmap()/munmap() of multiples of entire memory pages ● All access within those pages – now valid! ● How to track errors? 57
  58. 58. Client requests ● Allow annotation of custom allocators ● ~20 C macros defined in valgrind.h ● Common and per-tool requests exist ● Can be cut out with -DNVALGRIND ● Detailed description in [VALGRIND01] 58
  59. 59. Example: Instrumenting dlmalloc ● 2.8.4 instrumentation from [CRYSTAL01] ● Demo code: is.gd/GDCE14Linux ● Compile the sample with -DDLMALLOC ● Similar results to libc malloc() 59
  60. 60. Other uses of client requests ● Pointer validation ● Is address mapped? Is it defined? ● Mid-session leak checks ● Level transitions 60
  61. 61. Other uses of client req. (cont.) ● Poisoning memory regions ● Ensuring signal handlers don't touch the heap ● Ensuring geometry buffers aren't read on CPU 61Source: [LEA01]
  62. 62. Debugging inside Valgrind ● A gdbserver for “remote” debugging ● SIGTRAP (breakpoint) on every error ● Unlimited memory watchpoints! ● Data breakpoints in Visual Studio parlance ● Cf. 4 single-word hardware debug registers on x86 62
  63. 63. Debugging inside Valgrind (cont.) ● Terminal A: ● valgrind --vgdb=yes --vgdb-error=0 ./MyGame ● Terminal B: ● gdb ./MyGame ● target remote | vgdb 63
  64. 64. Agenda ● Build system improvements ● Signal handlers ● Memory debugging with Valgrind ● OpenGL debugging techniques 64
  65. 65. Ye Olde Way ● Call glGetError() after each OpenGL call ● Get 1 of 8 (sic!) error codes ● Look up the call in the manual ● See what this particular error means in this particular context… 65
  66. 66. Ye Olde Way (cont.) ● …Then check what was actually the case ● 6 possible reasons for GL_INVALID_VALUE in glTexImage*() alone! See [OPENGL01] ● Usually: attach a debugger, replay the scenario… ● This sucks! 66
  67. 67. Ye Olde Way (cont.) ● …Then check what was actually the case ● 6 possible reasons for GL_INVALID_VALUE in glTexImage*() alone! See [OPENGL01] ● Usually: attach a debugger, replay the scenario… ● This sucks! used to suck ☺ 67
  68. 68. Debug callback ● Never call glGetError() again! ● Much more detailed information ● Incl. performance tips from the driver ● Good to check what different drivers say ● May not work without a debug OpenGL context (GLX_CONTEXT_DEBUG_BIT_ARB) 68
  69. 69. Debug callback (cont.) ● Provided by either of (ABI-compatible): GL_KHR_debug [OPENGL02], GL_ARB_debug_output [OPENGL03] 69 OpenGL OpenGL ES NVIDIA (official) AMD (official) Intel (Mesa) AMD (Mesa) ARB _debug _output √ × √ √ √ √ KHR _debug √ √ √ √ × ×
  70. 70. Debug callback (cont.) void callback(GLenum source, GLenum type, GLuint id, GLenum severity, GLsizei length, const GLchar* message, const void* userParam); 70 Filter by source, type, severity or individual messages
  71. 71. Debug callback (cont.) ● Verbosity can be controlled (filtering) ● glDebugMessageControl[ARB]() ● [OPENGL02][OPENGL03] ● Turn to 11 (GL_DONT_CARE) for valuable perf information! ● Memory type for buffers, unused mip levels… 71
  72. 72. API call tracing ● Record a trace of the run of the application ● Replay and review the trace ● Look up OpenGL state at a particular call ● Inspect state variables, resources and objects: textures, shaders, buffers... ● apitrace or VOGL 72
  73. 73. Well, this is not helpful... 73
  74. 74. Much better! 74
  75. 75. KHR_debug EXT _debug _marker EXT _debug _label GREMEDY _string _marker GREMEDY _frame _terminator One-off messages √ √ × √ × Call grouping √ √ × × × Object labels √ × √ × × Frame terminators × × × × √ Support Good Limited Limited Limited Limited Annotating the call stream 75
  76. 76. Annotating the call stream (cont.) ● All aforementioned extensions supported by apitrace regardless of driver ● Recommended: GL_KHR_debug 76
  77. 77. Annotating the call stream (cont.) ● Call grouping ● glPushDebugGroup()/glPopDebugGroup() ● One-off messages ● glDebugMessageInsert[ARB]() ● glStringMarkerGREMEDY() 77
  78. 78. Object labelling ● glObjectLabel(), glGetObjectLabel() ● Buffer, shader, program, vertex array, query, program pipeline, transform feedback, sampler, texture, render buffer, frame buffer, display list ● glObjectPtrLabel(), glGetObjectPtrLabel() ● Sync objects 78
  79. 79. Annotation caveats ● Multi-threaded grouping may break hierarchy ● glDebugMessageInsert() calls the debug callback, polluting error streams ● Workaround: drop if source == GL_DEBUG_SOURCE_APPLICATION 79
  80. 80. Example 1: PIX events emulation #define D3DPERF_BeginEvent(colour, name) if (GLEW_KHR_debug && threadOwnsDevice()) glPushDebugGroup(GL_DEBUG_SOURCE_APPLICATION, (GLuint)colour, -1, name) #define D3DPERF_EndEvent() if (GLEW_KHR_debug && threadOwnsDevice()) glPopDebugGroup() 80
  81. 81. Example 2: Game tech demo ● University assignment from 2009 ☺ ● Annotated OpenGL 1.4 ● Demo code: is.gd/GDCE14Linux 81
  82. 82. Takeaway ● gcc-multilib is the prerequisite for 32/64- bit cross-compilation ● Switching back and forth between Clang and gcc is easy and useful ● Link times can be greatly improved by using gold 82
  83. 83. Takeaway (cont.) ● Caching the gdb-index improves debugging experience ● Crash handling is easy to do, tricky to get right 83
  84. 84. Takeaway (cont.) ● Valgrind is an enormous aid in memory debugging ● Even when employing custom allocators ● OpenGL debugging experience can be vastly improved using some extensions 84
  85. 85. Questions? @ lgodlewski@nordicgames.at t @TheIneQuation K inequation.org 85
  86. 86. Thank you! Further Nordic Games information: K www.nordicgames.at Development information: K www.grimloregames.com 86
  87. 87. References ● LARABEL13 – Larabel, M. “Clang 3.4 Performance Very Strong Against GCC 4.9” [link] ● GNU01 – “Index Files Speed Up GDB” [link] ● GNU02 – “Options for Debugging Your Program or GCC” [link] ● BENYOSSEF08 – Ben-Yossef, G. “Crash N' Burn: Writing Linux application fault handlers” [link ] ● LEA01 – Lea, D. “A Memory Allocator” [link] ● VALGRIND01 – “The Client Request mechanism” [link] ● CRYSTAL01 – “Crystal Space 3D SDK” [link] ● OPENGL01 – “glTexImage2D” [link] ● OPENGL02 – “ARB_debug_output” [link] ● OPENGL03 – “KHR_debug” [link] ● XDG01 – “XDG Base Directory Specification” [link] ● <page>(<section>), e.g. sigaction(2) – “Linux Programmer's Manual”; to view, type man <section> <page> into a terminal or a web search engine 87
  88. 88. Special thanks Fabian Giesen Katarzyna Griksa Damian Kobiałka Jetro Lauha Eric Lengyel Krzysztof Narkowicz Reinhard Pollice Bartłomiej Wroński Kacper Ząber 88
  89. 89. Bonus slides! ● OpenGL resource leak checking ● Intel i965 driver vs stack ● Locating user data according to FreeDesktop.org guidelines ● Thread priorities in Linux ● Additional/new debug features 89 FREE!!!FREE!!!
  90. 90. OpenGL resource leak checking Courtesy of Eric Lengyel & Fabian Giesen static void check_for_leaks() { GLuint max_id = 10000; // better idea would be to keep track of assigned names. GLuint id; // if brute force doesn't work, you're not applying it hard enough for ( id = 1 ; id <= max_id ; id++ ) { #define CHECK( type ) if ( glIs##type( id ) ) fprintf( stderr, "GLX: leaked " #type " handle 0x%xn", (unsigned int) id ) CHECK( Texture ); CHECK( Buffer ); CHECK( Framebuffer ); CHECK( Renderbuffer ); CHECK( VertexArray ); CHECK( Shader ); CHECK( Program ); CHECK( ProgramPipeline ); #undef CHECK } } 90
  91. 91. Intel i965 vs stack ● Been chasing a segfault on a call instruction down _mesa_Clear() (glClear()) ● Region of code copy/pasted from D3D renderer ● Address mapped, so not an invalid jump... ● Only 16 function frames – surely this can't be a stack overflow? 91
  92. 92. Intel i965 vs stack (cont.) ● Oh no, wait: ● Check ESP against /proc/[pid]/maps ● Yup, encroaching on unmapped address space ● Moral: cut your render some stack slack (160+ kB), or Mesa will blow it up with locals (e.g. in clear shader generation) 92
  93. 93. Locating user data ● There is a spec for that – see [XDG01] ● Savegames, screenshots, options etc.: ● $XDG_CONFIG_HOME or ~/.config/<app> ● Caches of all kinds: ● $XDG_CACHE_HOME or ~/.cache/<app> ● Per-user persistent data (e.g. DLC): ● $XDG_DATA_HOME or ~/.local/share/<app> 93
  94. 94. Locating user data (cont.) ● <app> subdirectory currently unregulated ● De-facto standard: simplified or “Unix name” ● Lowercase, “safe” ASCII characters, e.g. blender ● When asked, XDG people suggest rev-DNS ● com.company.appname 94
  95. 95. Thread priorities in Linux ● Priority elevation requires root permissions � ● No user will ever grant you root (scary!) ● Reason: DoS protection in servers (probably) ● Priority can be tweaked with nice() ● Think “how nice the process is to others” ● Being nice to everyone will starve your process ● Niceness can be negative (but only with root) 95
  96. 96. Thread priorities in Linux (cont.) ● Why not setpriority(2)? ● Also sets scheduling algorithm here be dragons→ ● Priority values have different meaning per scheduler ● Still needs root ● What about capabilities(7)? ● This might actually work if your users trust you ● Demo code: is.gd/GDCE14Linux 96
  97. 97. Thread priorities in Linux (cont.) ● Don't all threads in a process share niceness? ● They should, according to POSIX, but they don't! ● One of the few cases where Linux is non- compliant 97
  98. 98. Additional/new debug features ● Additional debug info: -g3 ● Including #defines (macros) ● Better debugger performance [GNU02]: ● -fdebug-types-section: improved layout ● -gpubnames: new format for index 98

Editor's Notes

  • Hi everyone, my name is Leszek Godlewski and I&amp;apos;m here today to talk about some of the more advanced aspects of Linux game programming.
    Or, porting your game engine to Linux, to be more precise.
    These notes will only supplement the slides, they are not a transcript.
  • I&amp;apos;m a programmer with Nordic Games. A bit unusual role with the company since it&amp;apos;s mostly a publisher, but I&amp;apos;m the resident Linux guy and generalist programmer where necessary.
    I&amp;apos;ve got three ports under my belt: Painkiller Hell &amp; Damnation and Deadfall Adventures for The Farm 51, and Darksiders for Nordic.
    I was also a generalist programmer at TF51 on the original teams for those two games.
  • Who can guess what these are, apart from the obvious ones? :)
    Answers, clockwise from top left: GDB, Valgrind, apitrace (doh), Nsight (doh!), GPU PerfStudio, VOGL (doh...).
  • The focus of this talk lies in these 4 areas. I&amp;apos;d like to share my findings about optimizing native code builds, signal (crash) handling, jump on to memory debugging stuff using the Valgrind tool, and finish off with OpenGL debugging techniques. There are some bonus slides if there is still time, too.
  • Actually, let&amp;apos;s adopt this as the agenda. :) The upcoming element will be in bold.
  • This is based on the two UE3 titles, but also applied to Darksiders to some extent (there was no Mac toolchain, had to make one from scratch). It displayed the same problems.
  • As much as I would love 32-bit to die, I had to support it to achieve full parity with Windows.
    A chroot is a bit like a VM – a separate, sandboxed view of the entire system environment (i.e. file system), but running on the same kernel.
    Host system is Debian amd64.
  • Fortunately, I could get rid of that chroot jail thanks to gcc-multilib.
  • “Did you mean x?” and colours for the win.
  • C object files are unaffected.
    Clang seems to have slightly different C++ symbol name generation rules (pre-mangling).
  • Officially in binutils → ships as an optional part of default toolchain on all modern distros.
  • And not only circular – sometimes it&amp;apos;s tricky to even get regular dependencies in the right order.
    Circular actually happened to me with the PhysX 2.8.4 loader. That was even worse because even though it seemed to have linked, the dynamic linker at runtime still failed to resolve everything...
  • Here&amp;apos;s how this reverse dependency thing works: you first declare the dependent object, then its dependency. Starting the construction of a building from the roof, if you will.
    First, an “app” object file is linked. It has unsatisfied imports for symbols from library A.
  • Library A&amp;apos;s exports satisfy the imports of app, but they import symbols from B.
  • Library B&amp;apos;s exports match library A&amp;apos;s imports, but B now has unsatisfied imports of A&amp;apos;s symbols.
  • By re-specifying A, the linker satisfies the A imports in library B.
  • This is your typical GDB session start at the beginning of porting.
  • Old linkers need some shell script magic – run GDB in batch mode etc.
    Here&amp;apos;s what I had been using before discovering --gdb-index:
    @echo Generating gdb-index in $(OUTPUT_PATH)/gdb-index/$(notdir $@).gdb-index
    @mkdir -p $(OUTPUT_PATH)/gdb-index
    @$(GDB) -batch -ex &amp;quot;save gdb-index $(OUTPUT_PATH)/gdb-index&amp;quot; $@
    @echo Merging gdb-index into $@
    @$(OBJCOPY) --add-section .gdb_index=$(OUTPUT_PATH)/gdb-index/$(notdir $@).gdb-index --set-section-flags .gdb_index=readonly $@ $@
  • A program can possibly resume normal execution after a signal.
  • Signal handler can do different things by default: terminate, dump core or stop (suspend).
    I&amp;apos;ll talk about RLIMIT_CORE a bit later.
    It&amp;apos;s actually preferable to perform a clean exit (instead of aborting) in response to SIGTERM and SIGQUIT.
  • These fields are a good reason for using the extended information handlers (with siginfo_t), as opposed to the traditional signal(2) calls with void sighandler(int) handlers.
    si_code is an enumeration, not a bit field. Includes reasons general (sent by user, kernel, async I/O operation etc.) and specific to the signal (e.g. whether SIGILL was caused by invalid opcode, operand or addressing mode etc.).
  • Not all signals are interesting.
    SIGTRAP is a programmatical breakpoint (you can also raise(SIGTRAP) from code to have the debugger stop). It&amp;apos;s up to you to decide how to handle that.
    SIGCHLD is only of note if you plan on spawning sub-processes. Similarly for SIGPIPE, if you&amp;apos;re interested in notifications for the other end breaking a FIFO pipe.
  • In reality, this is slightly more complex than that because of masking and deferring of signals. See sigaction(2) for details. Worth noting that by default, further signalling of the same signal type will be deferred until the return of the handler, unless you specify SA_NODEFER or explicitly unmask that signal.
    Ben-Yossef recommends using a spinlock, but warns of deadlocking if contending against a thread with higher priority. In that case it&amp;apos;s better to just try locking and sleep with pselect(), but that&amp;apos;s a overkill in most cases.
  • What if your code is interrupted right after grabbing the lock?
    Unless your mutex is re-entrant and the lock happened on the signalled thread, of course.
  • Speaking of re-entrant…
  • Allocator&amp;apos;s data structures may have been damaged in the fault.
  • When using sigaction(2) to install your handler, you will need a stash of struct sigaction in static memory and populate it when installing your handler.
    You might not be able to bump RLIMIT_CORE due to permissions. You may also want to limit it, as core dumps are often huge because they include a dump of all the mapped memory tranges. However, smaller core size → truncation → not all structures/ranges can be dumped → dump may actually be useless (stack not in the dumped memory range). Important to get a stack trace in the handler anyway.
    The raise(3) call may not work unless you unmask the signal. This is demonstrated in the code example.
    Also worth noting that apps are not really in control of core dump filename (/proc/sys/kernel/core_pattern or sysctl).
  • The backtrace() function is easy to use, just pass it a static array of void pointers and its size.
  • Personally, I haven&amp;apos;t had any problems with just calling backtrace_symbols() and calling it a day, but I imagine it may be a bad idea in memory-constrained situations or when nested signalling happens.
  • Pipe communication protocol up to the developer.
  • One such case was not setting the right OpenGL pixel packing/unpacking rules when uploading textures, resulting in reading beyond the buffer.
  • Memcheck doesn&amp;apos;t watch the stack or static variables, only the heap. SGcheck – the reverse.
  • Valgrind will also watch out for allocations made by the driver.
  • This very bad code makes several mistakes:
    Branch on uninitialized variable
    Overwrite memory past malloced area
    Read past malloced area
    Make a syscall with uninitialized memory as a parameter (return code)
    Leak memory allocated to ptr2.
    Actual demo code in repo also has an example for use-after-free.
  • Memcheck easily identifies all the bugs. Actual output includes call stacks, both for when the errors occur and for the allocation/freeing of the relevant memory chunk.
    Running with --leak-check=full also pinpoints the source of the leak.
  • Valgrind has no way of knowing about individual allocations and frees from the pool. Thus it won&amp;apos;t protect control structures etc.
    Worth noting that mmap() calls for mapping large address ranges will cause Valgrind to issue a warning (not sure about the threshold value, but a 512MB heap does trigger it).
  • Client requests have negligible performance impact and are no-ops if Valgrind is not present.
  • This instrumentation could be improved by using the MEMPOOL family of client requests instead, but it&amp;apos;s still quite verbose and useful.
  • Pointer validation could be used in garbage collectors, for instance.
  • I&amp;apos;m sure you&amp;apos;ll find a ton of other uses for making sure some data stays untouched.
  • I will deliberately not speak of IHV tools because they are complex enough to warrant separate talks. I&amp;apos;ll leave that to the IHV&amp;apos;s dev relations teams. :)
  • No longer does, because we no longer need to do that.
  • The extension spec says that it&amp;apos;s only suggested that driver implementers make it available only in debug contexts. It might just work in a regular one with your driver, but better stay on the safe side.
  • State of Mesa at the time of writing this, on my ThinkPad T500 with Debian amd64 testing.
  • You don&amp;apos;t get just the message, but also some metadata and a user pointer.
    IDs are unregulated, so drivers may use them as they see fit. They will definitely vary from vendor to vendor.
  • You can also filter the messages programatically with ifs.
    Intel is pretty verbose by default; NVIDIA needs setting all channels to GL_DONT_CARE to spew out more performance hints.
  • It&amp;apos;s a flat list. Good luck finding a faulty call in a multi-pass renderer in it.
  • Does not a tree view improve things?
  • In multi-threaded apps, look at the thread ID from which you&amp;apos;re trying to push or pop a debug group.
    Callback filtering can be achieved via glDebugMessageControl[ARB]() or programmatically. Just remember to always set source to GL_DEBUG_SOURCE_APPLICATION in your glDebugMessageInsert[ARB]() calls.
  • There&amp;apos;s a little trick here – since there are no set rules about message IDs, I&amp;apos;m casting the D3DCOLOR (32-bit pixel) to GLuint and using that as an ID.
    This was extremely cool to do, because I could instantly take advantage of the existing instrumentation on the whole engine.
  • The reason this demo was written for such an old version of OpenGL is that it needed to run on an old mobile Intel GPU (i915 I think).
    Still works nicely with KHR_debug, though.
  • Shout out to people who helped me (in various ways) to prepare this talk!
  • This originally came together as a half-joke on Twitter between those two guys.
    Mind you that this would need adjustments for ES (probably going over the entire 32-bit range minus 0), as names are not guaranteed to be consecutive there, whereas desktop GL always returns the first free name.
    “If brute force doesn&amp;apos;t work, you&amp;apos;re not applying it hard enough.”
  • These performance improvement features require fairly recent version of GCC and GDB (probably 4.9 and 7.0, respectively).
  • ×