Advanced Linux Game Programming

Leszek Godlewski
Leszek GodlewskiEngine Programmer at Flying Wild Hog Cracow
Advanced Linux
Game Programming
Leszek Godlewski
Programmer, Nordic Games
Nordic Games GmbH
• Started in 2011 as a sister company to Nordic Games Publishing (We Sing)
• Base IP acquired from JoWooD and DreamCatcher (SpellForce, The Guild,
Aquanox, Painkiller)
• Initially focusing on smaller, niche games
• Acquired THQ IPs in 2013 (Darksiders, Titan Quest, Red Faction, MX vs. ATV)
• Now shifting towards being a production company with internal devs
• Since fall 2013: internal studio in Munich, Germany (Grimlore Games)
2
Leszek Godlewski
Programmer, Nordic Games
• Ports
● Painkiller Hell & Damnation (The Farm 51)
● Deadfall Adventures (The Farm 51)
● Darksiders (Nordic Games)
• Formerly generalist programmer on PKHD & DA at TF51
3
Objective of this talk
Your game engine on Linux, before porting:
4
Missing!
Objective of this talk (cont.)
Your first “working” Linux port:
5
Oops. Bat-Signal!
Objective of this talk (cont.)
Where I want to try helping you get to:
6
In other words, from this:
7
To this:
8
And that's mostly debugging
All sorts of debuggers!
9
apitrace
Demo code available
is.gd/GDCE14Linux
10
Intended takeaway
● Build system improvements
● Signal handlers
● Memory debugging with Valgrind
● OpenGL debugging techniques
11
Intended takeaway Agenda
● Build system improvements
● Signal handlers
● Memory debugging with Valgrind
● OpenGL debugging techniques
12
Build systems
What I had initially with UE3:
● Copy/paste of the Mac OS X toolchain
● It worked, but...
● Slow
● Huge binaries because of debug symbols
● Problematic linking of circular dependencies
13
Build systems (cont.)
● 32-bit binaries required for
feature/hardware parity with Windows
● Original solution: a chroot jail with an
entire 32-bit Ubuntu system just for
building
14
Cross-compiling for 32/64-bit
● gcc -m32/-m64 is not enough!
● Only sets target code generation
● Not headers & libraries (CRT, OpenMP, libgcc etc.)
● Fixed by installing gcc-multilib
● Dependency package for non-default architectures (i.e.
i386 on an amd64 system and vice versa)
15
Clang (ad nauseam)
● Clang is faster
● gcc: 3m47s
● Clang: 3m05s
● More benchmarks at Phoronix [LARABEL13]
● Clang has different diagnostics than gcc
16
Clang (cont.)
● Preprocessor macro compatibility
● Declares __GNUC__ etc.
● Command line compatibility
● Easily switch back & forth between Clang & gcc
17
Clang – caveats
● C++ object files may be incompatible with
gcc & fail to link (need full rebuilds)
● Clang is not as mature as gcc
● Occasionally has generated faulty code for me
(YMMV)
18
Clang – caveats (cont.)
● Slight inconsistencies in C++ standard
strictness
● Templates
● Anonymous structs/unions
● May need to add this-> in some places
● May need to name some anonymous types
19
So: Clang or gcc?
Both:
● Clang – quick iterations during
development
● gcc – final shipping binaries
20
Linking – GNU ld
● Default linker on Linux
● Ancient
● Single-threaded
● Requires specification of libraries in the order
of reverse dependency...
● We are not doomed to use it!
21
Linking – GNU gold
● Multi-threaded linker for ELF binaries
● ld: 18s
● gold: 5s
● Developed at Google, now officially part of
GNU binutils
22
Linking – GNU gold (cont.)
● Drop-in replacement for ld
● May need an additional parameter or toolchain
setup
● clang++ -B/usr/lib/gold-ld ...
● g++ -fuse-ld=gold ...
● Still needs libs in the order of reverse
dependency...
23
Linking – reverse dependency
● Major headache/game-breaker with
circular dependencies
● ”Proper” fix: re-specify the same libraries
over and over again
● gcc app.o -lA -lB -lA
24
Linking – reverse dep. (cont.)
25
app
A B
Linking – reverse dep. (cont.)
26
app A
B
Linking – reverse dep. (cont.)
27
app A B
Linking – reverse dep. (cont.)
28
app A B A
Just the missing
symbols
Linking – library groups
● Declare library groups instead
● Wrap library list with --start-group, --end-
group
● Shorthand: -(, -)
● g++ foo.obj -Wl,-( -lA -lB -Wl,-)
● Results in exhaustive search for symbols
29
Linking – library groups (cont.)
● Actually used for non-library objects (TUs)
● Caveat: the exhaustive search!
● Manual warns of possible performance hit
● Not observed here, but keep that in mind!
30
Running the binary in debugger
inequation@spearhead:~/projects/largebinary$ gdb -–
silent largebinary
Reading symbols from /home/inequation/projects/larg
ebinary/largebinary...
[zzz... several minutes later...]
done.
(gdb)
31
Caching the gdb-index
● Large codebases generate heavy debug
symbols (hundreds of MBs)
● GDB does symbol indexing at every
single startup �
● Massive waste of time!
32
Caching the gdb-index (cont.)
● Solution: fold indexing into the build
process
● Old linkers: as described in [GNU01]
● New linkers (i.e. gold): --gdb-index
● May need to forward from compiler driver:
-Wl,--gdb-index
33
Agenda
● Build system improvements
● Signal handlers
● Memory debugging with Valgrind
● OpenGL debugging techniques
34
Signal handlers
● Unix signals are async notifications
● Sources can be:
● the process itself
● another process
● user
● kernel
35
Signal handlers (cont.)
● A lot like interrupts
● Jump to handler upon first non-atomic op
36
Normal program
flow
Normal flow
resumes
Signal
handler
execution
SIGNAL
Signal handlers (cont.)
● System installs a default handler
● Usually terminates and/or dumps core
● Core ≈ minidump in Windows parlance, but entire
mapped address range is dumped (truncated to
RLIMIT_CORE bytes)
● See signal(7) for default actions
37
Signal handlers (cont.)
● Can (should!) specify custom handlers
● Get/set handlers via sigaction(2)
● void handler(int, siginfo_t *, void *);
● Needs SA_SIGINFO flag in sigaction() call
● Extensively covered in [BENYOSSEF08]
38
Interesting siginfo_t fields
● si_code – reason for sending the signal
● Examples: signal source, FP over/underflow,
memory permissions, unmapped address
● si_addr – memory location (if relevant)
● SIGILL, SIGFPE, SIGSEGV, SIGBUS and
SIGTRAP
39
Interesting signals
● Worth catching
● SIGSEGV, SIGILL, SIGHUP, SIGQUIT, SIGTRAP,
SIGIOT, SIGBUS, SIGFPE, SIGTERM, SIGINT
● Worth ignoring
● signal(signum, SIG_IGN);
● SIGCHLD, SIGPIPE
40
Signal handling caveats
● Prone to race conditions
● Signals may be nested
41
Normal
flow
Signal
handler
SIGNAL
Signal
handler
SIGNAL SIGNAL
Signal handling caveats (cont.)
● Prone to race conditions
● Can't share locks with the main program
42
Lock
mutex
Lock mutex
Deadlock ☹
Signal
handler
SIGNAL
Normal
flow
Signal handling caveats (cont.)
● Prone to race conditions
● Can't call async-unsafe/non-reentrant
functions
● See signal(7) for a list of safe ones
● Notable functions not on the list:
● printf() and friends (formatted output)
● malloc() and free()
43
Signal handling caveats (cont.)
● Not safe to allocate or free heap memory
44
STOMPED?
STOMPED?
STOMPED?
STOMPED?
STOMPED?
STOMPED?
STOMPED?
STOMPED?
STOMPED?
STOMPED?
STOMPED?
Source: [LEA01]
Signal handling caveats (cont.)
● Custom handlers do not dump core
● At handler installation time:
● Raise RLIMIT_CORE to desired core size
● Inside handler, after custom logging:
● Restore default handler using signal(2) or
sigaction(2)
● raise(signum);
45
Safe stack walking
● glibc provides backtrace(3) and friends
● Symbols are read from the dynamic
symbol table
● Pass -rdynamic at compile-time to populate
46
Safe stack walking (cont.)
● backtrace_symbols() internally calls
malloc()
● Not safe... ☹
● Still, can get away with it most of the time
(YMMV)
47
Example “proper” solution
● Fork a watchdog process in main()
● Communicate over a FIFO pipe
● In signal handler:
● Collect & send information down the pipe
● backtrace_symbols_fd() down the pipe
● Demo code: is.gd/GDCE14Linux
48
Agenda
● Build system improvements
● Signal handlers
● Memory debugging with Valgrind
● OpenGL debugging techniques
49
Is this even related to porting?
● Yes! Portability bugs easily overlooked
● Hardcoded struct sizes/offsets
● OpenGL buffers
● Incorrect binary packing/unpacking
● “How did we/they manage to ship that?!”
50
What is Valgrind?
● Framework for dynamic, runtime analysis
● Dynamic recompilation
● machine code IR tool machine code→ → →
● Performance typically at 25-20% of unmodified
code
● Worse if heavily threaded – execution is serialized
51
What is Valgrind? (cont.)
● Many tools in it:
● Memory error detectors (Memcheck,
SGcheck)
● Cache profilers (Cachegrind, Callgrind)
● Thread error detectors (Helgrind, DRD)
● Heap profilers (Massif, DHAT)
52
Memcheck basics
● Basic usage extremely simple
● …as long as you use the vanilla libc malloc()
● valgrind ./app
● Will probably report a ton of errors on the
first run!
● Again: “How did they manage to ship that?!”
53
Memcheck basics (cont.)
● Many false positives, esp. in 3rd
parties
● Xlib, NVIDIA driver
● Can suppress them via suppress files
● Call Valgrind with --gen-suppressions=yes to
generate suppression definitions
● Be careful with that! Can let OpenGL bugs slip!
54
Contrived example
#include <stdlib.h>
int main(int argc, char *argv[]) {
int foo, *ptr1 = &foo;
int *ptr2 = malloc(sizeof(int));
if (*ptr1)
ptr2[1] = 0xabad1dea;
else
ptr2[1] = 0x15bad700;
ptr2[0] = ptr2[2];
return *ptr1;
}
55
Demo code:
is.gd/GDCE14Linux
Valgrind output for such
==8925== Conditional jump or move depends on
uninitialised value(s)
==8925== Invalid write of size 4
==8925== Invalid read of size 4
==8925== Syscall param exit_group(status)
contains uninitialised byte(s)
==8925== LEAK SUMMARY:
==8925== definitely lost: 4 bytes in 1 blocks
56
What about custom allocators?
● Custom memory pool & allocation algo
● Valgrind only “sees” mmap()/munmap() of
multiples of entire memory pages
● All access within those pages – now valid!
● How to track errors?
57
Client requests
● Allow annotation of custom allocators
● ~20 C macros defined in valgrind.h
● Common and per-tool requests exist
● Can be cut out with -DNVALGRIND
● Detailed description in [VALGRIND01]
58
Example: Instrumenting dlmalloc
● 2.8.4 instrumentation from [CRYSTAL01]
● Demo code: is.gd/GDCE14Linux
● Compile the sample with -DDLMALLOC
● Similar results to libc malloc()
59
Other uses of client requests
● Pointer validation
● Is address mapped? Is it defined?
● Mid-session leak checks
● Level transitions
60
Other uses of client req. (cont.)
● Poisoning memory regions
● Ensuring signal handlers don't touch the heap
● Ensuring geometry buffers aren't read on CPU
61Source: [LEA01]
Debugging inside Valgrind
● A gdbserver for “remote” debugging
● SIGTRAP (breakpoint) on every error
● Unlimited memory watchpoints!
● Data breakpoints in Visual Studio parlance
● Cf. 4 single-word hardware debug registers on
x86
62
Debugging inside Valgrind (cont.)
● Terminal A:
● valgrind --vgdb=yes --vgdb-error=0
./MyGame
● Terminal B:
● gdb ./MyGame
● target remote | vgdb
63
Agenda
● Build system improvements
● Signal handlers
● Memory debugging with Valgrind
● OpenGL debugging techniques
64
Ye Olde Way
● Call glGetError() after each OpenGL call
● Get 1 of 8 (sic!) error codes
● Look up the call in the manual
● See what this particular error means in
this particular context…
65
Ye Olde Way (cont.)
● …Then check what was actually the case
● 6 possible reasons for GL_INVALID_VALUE in
glTexImage*() alone! See [OPENGL01]
● Usually: attach a debugger, replay the
scenario…
● This sucks!
66
Ye Olde Way (cont.)
● …Then check what was actually the case
● 6 possible reasons for GL_INVALID_VALUE in
glTexImage*() alone! See [OPENGL01]
● Usually: attach a debugger, replay the
scenario…
● This sucks! used to suck ☺
67
Debug callback
● Never call glGetError() again!
● Much more detailed information
● Incl. performance tips from the driver
● Good to check what different drivers say
● May not work without a debug OpenGL
context (GLX_CONTEXT_DEBUG_BIT_ARB)
68
Debug callback (cont.)
● Provided by either of (ABI-compatible):
GL_KHR_debug [OPENGL02],
GL_ARB_debug_output [OPENGL03]
69
OpenGL OpenGL
ES
NVIDIA
(official)
AMD
(official)
Intel
(Mesa)
AMD
(Mesa)
ARB
_debug
_output
√ × √ √ √ √
KHR
_debug
√ √ √ √ × ×
Debug callback (cont.)
void callback(GLenum source,
GLenum type,
GLuint id,
GLenum severity,
GLsizei length,
const GLchar* message,
const void* userParam);
70
Filter by
source, type,
severity or
individual
messages
Debug callback (cont.)
● Verbosity can be controlled (filtering)
● glDebugMessageControl[ARB]()
● [OPENGL02][OPENGL03]
● Turn to 11 (GL_DONT_CARE) for valuable
perf information!
● Memory type for buffers, unused mip levels…
71
API call tracing
● Record a trace of the run of the application
● Replay and review the trace
● Look up OpenGL state at a particular call
● Inspect state variables, resources and objects:
textures, shaders, buffers...
● apitrace or VOGL
72
Well, this is not helpful...
73
Much better!
74
KHR_debug EXT
_debug
_marker
EXT
_debug
_label
GREMEDY
_string
_marker
GREMEDY
_frame
_terminator
One-off
messages √ √ × √ ×
Call grouping √ √ × × ×
Object labels √ × √ × ×
Frame
terminators × × × × √
Support Good Limited Limited Limited Limited
Annotating the call stream
75
Annotating the call stream (cont.)
● All aforementioned extensions supported
by apitrace regardless of driver
● Recommended: GL_KHR_debug
76
Annotating the call stream (cont.)
● Call grouping
● glPushDebugGroup()/glPopDebugGroup()
● One-off messages
● glDebugMessageInsert[ARB]()
● glStringMarkerGREMEDY()
77
Object labelling
● glObjectLabel(), glGetObjectLabel()
● Buffer, shader, program, vertex array, query, program
pipeline, transform feedback, sampler, texture, render buffer,
frame buffer, display list
● glObjectPtrLabel(),
glGetObjectPtrLabel()
● Sync objects
78
Annotation caveats
● Multi-threaded grouping may break
hierarchy
● glDebugMessageInsert() calls the debug
callback, polluting error streams
● Workaround: drop if source ==
GL_DEBUG_SOURCE_APPLICATION
79
Example 1: PIX events emulation
#define D3DPERF_BeginEvent(colour, name) 
if (GLEW_KHR_debug && threadOwnsDevice()) 
glPushDebugGroup(GL_DEBUG_SOURCE_APPLICATION,
(GLuint)colour, -1, name)
#define D3DPERF_EndEvent() 
if (GLEW_KHR_debug && threadOwnsDevice()) 
glPopDebugGroup()
80
Example 2: Game tech demo
● University assignment
from 2009 ☺
● Annotated OpenGL 1.4
● Demo code:
is.gd/GDCE14Linux
81
Takeaway
● gcc-multilib is the prerequisite for 32/64-
bit cross-compilation
● Switching back and forth between Clang
and gcc is easy and useful
● Link times can be greatly improved by
using gold
82
Takeaway (cont.)
● Caching the gdb-index improves
debugging experience
● Crash handling is easy to do, tricky to get
right
83
Takeaway (cont.)
● Valgrind is an enormous aid in memory
debugging
● Even when employing custom allocators
● OpenGL debugging experience can be
vastly improved using some extensions
84
Questions?
@ lgodlewski@nordicgames.at
t @TheIneQuation
K inequation.org
85
Thank you!
Further Nordic Games information:
K www.nordicgames.at
Development information:
K www.grimloregames.com
86
References
● LARABEL13 – Larabel, M. “Clang 3.4 Performance Very Strong Against GCC 4.9” [link]
● GNU01 – “Index Files Speed Up GDB” [link]
● GNU02 – “Options for Debugging Your Program or GCC” [link]
● BENYOSSEF08 – Ben-Yossef, G. “Crash N' Burn: Writing Linux application fault handlers” [link
]
● LEA01 – Lea, D. “A Memory Allocator” [link]
● VALGRIND01 – “The Client Request mechanism” [link]
● CRYSTAL01 – “Crystal Space 3D SDK” [link]
● OPENGL01 – “glTexImage2D” [link]
● OPENGL02 – “ARB_debug_output” [link]
● OPENGL03 – “KHR_debug” [link]
● XDG01 – “XDG Base Directory Specification” [link]
● <page>(<section>), e.g. sigaction(2) – “Linux Programmer's Manual”; to view, type man
<section> <page> into a terminal or a web search engine
87
Special thanks
Fabian Giesen
Katarzyna Griksa
Damian Kobiałka
Jetro Lauha
Eric Lengyel
Krzysztof Narkowicz
Reinhard Pollice
Bartłomiej Wroński
Kacper Ząber
88
Bonus slides!
● OpenGL resource leak checking
● Intel i965 driver vs stack
● Locating user data according to
FreeDesktop.org guidelines
● Thread priorities in Linux
● Additional/new debug features
89
FREE!!!FREE!!!
OpenGL resource leak checking
Courtesy of Eric Lengyel & Fabian Giesen
static void check_for_leaks()
{
GLuint max_id = 10000; // better idea would be to keep track of assigned names.
GLuint id;
// if brute force doesn't work, you're not applying it hard enough
for ( id = 1 ; id <= max_id ; id++ )
{
#define CHECK( type ) if ( glIs##type( id ) ) fprintf( stderr, "GLX: leaked " #type " handle 0x%xn", (unsigned int) id )
CHECK( Texture );
CHECK( Buffer );
CHECK( Framebuffer );
CHECK( Renderbuffer );
CHECK( VertexArray );
CHECK( Shader );
CHECK( Program );
CHECK( ProgramPipeline );
#undef CHECK
}
}
90
Intel i965 vs stack
● Been chasing a segfault on a call instruction down
_mesa_Clear() (glClear())
● Region of code copy/pasted from D3D renderer
● Address mapped, so not an invalid jump...
● Only 16 function frames – surely this can't be a stack
overflow?
91
Intel i965 vs stack (cont.)
● Oh no, wait:
● Check ESP against /proc/[pid]/maps
● Yup, encroaching on unmapped address space
● Moral: cut your render some stack slack
(160+ kB), or Mesa will blow it up with
locals (e.g. in clear shader generation)
92
Locating user data
● There is a spec for that – see [XDG01]
● Savegames, screenshots, options etc.:
● $XDG_CONFIG_HOME or ~/.config/<app>
● Caches of all kinds:
● $XDG_CACHE_HOME or ~/.cache/<app>
● Per-user persistent data (e.g. DLC):
● $XDG_DATA_HOME or ~/.local/share/<app>
93
Locating user data (cont.)
● <app> subdirectory currently unregulated
● De-facto standard: simplified or “Unix name”
● Lowercase, “safe” ASCII characters, e.g. blender
● When asked, XDG people suggest rev-DNS
● com.company.appname
94
Thread priorities in Linux
● Priority elevation requires root permissions �
● No user will ever grant you root (scary!)
● Reason: DoS protection in servers (probably)
● Priority can be tweaked with nice()
● Think “how nice the process is to others”
● Being nice to everyone will starve your process
● Niceness can be negative (but only with root)
95
Thread priorities in Linux (cont.)
● Why not setpriority(2)?
● Also sets scheduling algorithm here be dragons→
● Priority values have different meaning per scheduler
● Still needs root
● What about capabilities(7)?
● This might actually work if your users trust you
● Demo code: is.gd/GDCE14Linux
96
Thread priorities in Linux (cont.)
● Don't all threads in a process share
niceness?
● They should, according to POSIX, but they
don't!
● One of the few cases where Linux is non-
compliant
97
Additional/new debug features
● Additional debug info: -g3
● Including #defines (macros)
● Better debugger performance [GNU02]:
● -fdebug-types-section: improved layout
● -gpubnames: new format for index
98
1 of 98

Recommended

OpenGL (ES) debugging by
OpenGL (ES) debuggingOpenGL (ES) debugging
OpenGL (ES) debuggingLeszek Godlewski
1.7K views36 slides
Linux as a gaming platform - Errata by
Linux as a gaming platform - ErrataLinux as a gaming platform - Errata
Linux as a gaming platform - ErrataLeszek Godlewski
16.5K views15 slides
Gamedev-grade debugging by
Gamedev-grade debuggingGamedev-grade debugging
Gamedev-grade debuggingLeszek Godlewski
1.3K views92 slides
One Year of Porting - Post-mortem of two Linux/SteamOS launches by
One Year of Porting - Post-mortem of two Linux/SteamOS launchesOne Year of Porting - Post-mortem of two Linux/SteamOS launches
One Year of Porting - Post-mortem of two Linux/SteamOS launchesLeszek Godlewski
31.8K views44 slides
Linux as a gaming platform, ideology aside by
Linux as a gaming platform, ideology asideLinux as a gaming platform, ideology aside
Linux as a gaming platform, ideology asideLeszek Godlewski
24.4K views49 slides
Digging for Android Kernel Bugs by
Digging for Android Kernel BugsDigging for Android Kernel Bugs
Digging for Android Kernel BugsJiahong Fang
4.6K views33 slides

More Related Content

What's hot

Software to the slaughter by
Software to the slaughterSoftware to the slaughter
Software to the slaughterQuinn Wilton
1K views35 slides
4Developers 2015: Gamedev-grade debugging - Leszek Godlewski by
4Developers 2015: Gamedev-grade debugging - Leszek Godlewski4Developers 2015: Gamedev-grade debugging - Leszek Godlewski
4Developers 2015: Gamedev-grade debugging - Leszek GodlewskiPROIDEA
244 views90 slides
Linux: the first second by
Linux: the first secondLinux: the first second
Linux: the first secondAlison Chaiken
692 views61 slides
Kernel Recipes 2014 - Writing Code: Keep It Short, Stupid! by
Kernel Recipes 2014 - Writing Code: Keep It Short, Stupid!Kernel Recipes 2014 - Writing Code: Keep It Short, Stupid!
Kernel Recipes 2014 - Writing Code: Keep It Short, Stupid!Anne Nicolas
1.8K views30 slides
Don't Give Credit: Hacking Arcade Machines by
Don't Give Credit: Hacking Arcade MachinesDon't Give Credit: Hacking Arcade Machines
Don't Give Credit: Hacking Arcade MachinesMichael Scovetta
7.9K views60 slides
Ice Age melting down: Intel features considered usefull! by
Ice Age melting down: Intel features considered usefull!Ice Age melting down: Intel features considered usefull!
Ice Age melting down: Intel features considered usefull!Peter Hlavaty
599 views64 slides

What's hot(20)

Software to the slaughter by Quinn Wilton
Software to the slaughterSoftware to the slaughter
Software to the slaughter
Quinn Wilton1K views
4Developers 2015: Gamedev-grade debugging - Leszek Godlewski by PROIDEA
4Developers 2015: Gamedev-grade debugging - Leszek Godlewski4Developers 2015: Gamedev-grade debugging - Leszek Godlewski
4Developers 2015: Gamedev-grade debugging - Leszek Godlewski
PROIDEA244 views
Kernel Recipes 2014 - Writing Code: Keep It Short, Stupid! by Anne Nicolas
Kernel Recipes 2014 - Writing Code: Keep It Short, Stupid!Kernel Recipes 2014 - Writing Code: Keep It Short, Stupid!
Kernel Recipes 2014 - Writing Code: Keep It Short, Stupid!
Anne Nicolas1.8K views
Don't Give Credit: Hacking Arcade Machines by Michael Scovetta
Don't Give Credit: Hacking Arcade MachinesDon't Give Credit: Hacking Arcade Machines
Don't Give Credit: Hacking Arcade Machines
Michael Scovetta7.9K views
Ice Age melting down: Intel features considered usefull! by Peter Hlavaty
Ice Age melting down: Intel features considered usefull!Ice Age melting down: Intel features considered usefull!
Ice Age melting down: Intel features considered usefull!
Peter Hlavaty599 views
A War Story: Porting Android 4.0 to a Custom Board (ELCE 2012) by Matthias Brugger
A War Story: Porting Android 4.0 to a Custom Board (ELCE 2012)A War Story: Porting Android 4.0 to a Custom Board (ELCE 2012)
A War Story: Porting Android 4.0 to a Custom Board (ELCE 2012)
Matthias Brugger10.9K views
The Listening: Email Client Backdoor by Michael Scovetta
The Listening: Email Client BackdoorThe Listening: Email Client Backdoor
The Listening: Email Client Backdoor
Michael Scovetta1.3K views
LAS16-403: GDB Linux Kernel Awareness by Linaro
LAS16-403: GDB Linux Kernel AwarenessLAS16-403: GDB Linux Kernel Awareness
LAS16-403: GDB Linux Kernel Awareness
Linaro1.4K views
Why kernelspace sucks? by OpenFest team
Why kernelspace sucks?Why kernelspace sucks?
Why kernelspace sucks?
OpenFest team1.7K views
Android Variants, Hacks, Tricks and Resources by Opersys inc.
Android Variants, Hacks, Tricks and ResourcesAndroid Variants, Hacks, Tricks and Resources
Android Variants, Hacks, Tricks and Resources
Opersys inc.3.6K views
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013 by midnite_runr
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
midnite_runr8.7K views
The Simple Scheduler in Embedded System @ OSDC.TW 2014 by Jian-Hong Pan
The Simple Scheduler in Embedded System @ OSDC.TW 2014The Simple Scheduler in Embedded System @ OSDC.TW 2014
The Simple Scheduler in Embedded System @ OSDC.TW 2014
Jian-Hong Pan3.7K views
Jollen's Presentation: Introducing Android low-level by Jollen Chen
Jollen's Presentation: Introducing Android low-levelJollen's Presentation: Introducing Android low-level
Jollen's Presentation: Introducing Android low-level
Jollen Chen2.5K views
Kernel Recipes 2015 - The Dronecode Project – A step in open source drones by Anne Nicolas
Kernel Recipes 2015 - The Dronecode Project – A step in open source dronesKernel Recipes 2015 - The Dronecode Project – A step in open source drones
Kernel Recipes 2015 - The Dronecode Project – A step in open source drones
Anne Nicolas2.3K views
Android Variants, Hacks, Tricks and Resources presented at AnDevConII by Opersys inc.
Android Variants, Hacks, Tricks and Resources presented at AnDevConIIAndroid Variants, Hacks, Tricks and Resources presented at AnDevConII
Android Variants, Hacks, Tricks and Resources presented at AnDevConII
Opersys inc.2.9K views
Attack on the Core by Peter Hlavaty
Attack on the CoreAttack on the Core
Attack on the Core
Peter Hlavaty12.4K views

Viewers also liked

El barrroco by
El barrrocoEl barrroco
El barrrocoespejodeoesed
132 views12 slides
CriminalEFS-PowerPoint by
CriminalEFS-PowerPointCriminalEFS-PowerPoint
CriminalEFS-PowerPointJenn Amabile
221 views68 slides
Ecosistemas by
EcosistemasEcosistemas
Ecosistemasespejodeoesed
51 views22 slides
Suir img by
Suir imgSuir img
Suir imgRocío Citroni
305 views4 slides
Gamedev-grade debugging by
Gamedev-grade debuggingGamedev-grade debugging
Gamedev-grade debuggingLeszek Godlewski
1.1K views32 slides
Social Media For Busy Entrepreneurs and Small Businesses by
Social Media For Busy Entrepreneurs and Small Businesses Social Media For Busy Entrepreneurs and Small Businesses
Social Media For Busy Entrepreneurs and Small Businesses Fikriyyah George
304 views35 slides

Viewers also liked(17)

CriminalEFS-PowerPoint by Jenn Amabile
CriminalEFS-PowerPointCriminalEFS-PowerPoint
CriminalEFS-PowerPoint
Jenn Amabile221 views
Social Media For Busy Entrepreneurs and Small Businesses by Fikriyyah George
Social Media For Busy Entrepreneurs and Small Businesses Social Media For Busy Entrepreneurs and Small Businesses
Social Media For Busy Entrepreneurs and Small Businesses
Fikriyyah George304 views
El presidencialismo mexicano antes y después by espejodeoesed
El presidencialismo mexicano antes y después El presidencialismo mexicano antes y después
El presidencialismo mexicano antes y después
espejodeoesed156 views
Хипстеры в энтерпрайзе by Aleksandr Tarasov
Хипстеры в энтерпрайзеХипстеры в энтерпрайзе
Хипстеры в энтерпрайзе
Aleksandr Tarasov1.8K views
каталог керасис by Nastasik
каталог керасискаталог керасис
каталог керасис
Nastasik3.4K views
Cross-platform game engine development with SDL 2.0 by Leszek Godlewski
Cross-platform game engine development with SDL 2.0Cross-platform game engine development with SDL 2.0
Cross-platform game engine development with SDL 2.0
Leszek Godlewski9.5K views
Course 102: Lecture 19: Using Signals by Ahmed El-Arabawy
Course 102: Lecture 19: Using Signals Course 102: Lecture 19: Using Signals
Course 102: Lecture 19: Using Signals
Ahmed El-Arabawy1.3K views
Signal Handling in Linux by Tushar B Kute
Signal Handling in LinuxSignal Handling in Linux
Signal Handling in Linux
Tushar B Kute10.4K views
Service Discovery. Spring Cloud Internals by Aleksandr Tarasov
Service Discovery. Spring Cloud InternalsService Discovery. Spring Cloud Internals
Service Discovery. Spring Cloud Internals
Aleksandr Tarasov1.7K views

Similar to Advanced Linux Game Programming

How to control physical devices with mruby by
How to control physical devices with mrubyHow to control physical devices with mruby
How to control physical devices with mrubyyamanekko
5.3K views101 slides
Lightweight Virtualization with Linux Containers and Docker | YaC 2013 by
Lightweight Virtualization with Linux Containers and Docker | YaC 2013Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013dotCloud
14.2K views76 slides
Lightweight Virtualization with Linux Containers and Docker I YaC 2013 by
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Docker, Inc.
1.1K views76 slides
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo... by
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo..."Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...Yandex
20.1K views76 slides
UWE Linux Boot Camp 2007: Hacking embedded Linux on the cheap by
UWE Linux Boot Camp 2007: Hacking embedded Linux on the cheapUWE Linux Boot Camp 2007: Hacking embedded Linux on the cheap
UWE Linux Boot Camp 2007: Hacking embedded Linux on the cheapedlangley
2.2K views28 slides
Clang: More than just a C/C++ Compiler by
Clang: More than just a C/C++ CompilerClang: More than just a C/C++ Compiler
Clang: More than just a C/C++ CompilerSamsung Open Source Group
19.4K views45 slides

Similar to Advanced Linux Game Programming(20)

How to control physical devices with mruby by yamanekko
How to control physical devices with mrubyHow to control physical devices with mruby
How to control physical devices with mruby
yamanekko5.3K views
Lightweight Virtualization with Linux Containers and Docker | YaC 2013 by dotCloud
Lightweight Virtualization with Linux Containers and Docker | YaC 2013Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
dotCloud14.2K views
Lightweight Virtualization with Linux Containers and Docker I YaC 2013 by Docker, Inc.
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Docker, Inc.1.1K views
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo... by Yandex
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo..."Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
Yandex20.1K views
UWE Linux Boot Camp 2007: Hacking embedded Linux on the cheap by edlangley
UWE Linux Boot Camp 2007: Hacking embedded Linux on the cheapUWE Linux Boot Camp 2007: Hacking embedded Linux on the cheap
UWE Linux Boot Camp 2007: Hacking embedded Linux on the cheap
edlangley2.2K views
Android Hacks, Variants, Tricks and Resources ESC SV 2012 by Opersys inc.
Android Hacks, Variants, Tricks and Resources ESC SV 2012Android Hacks, Variants, Tricks and Resources ESC SV 2012
Android Hacks, Variants, Tricks and Resources ESC SV 2012
Opersys inc.2.2K views
AOT-compilation of JavaScript with V8 by Phil Eaton
AOT-compilation of JavaScript with V8AOT-compilation of JavaScript with V8
AOT-compilation of JavaScript with V8
Phil Eaton707 views
BUD17-302: LLVM Internals #2 by Linaro
BUD17-302: LLVM Internals #2 BUD17-302: LLVM Internals #2
BUD17-302: LLVM Internals #2
Linaro1.6K views
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide by Linaro
BKK16-302: Android Optimizing Compiler: New Member Assimilation GuideBKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
Linaro866 views
Computer Graphics - Lecture 01 - 3D Programming I by 💻 Anton Gerdelan
Computer Graphics - Lecture 01 - 3D Programming IComputer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming I
Hunting and Exploiting Bugs in Kernel Drivers - DefCamp 2012 by DefCamp
Hunting and Exploiting Bugs in Kernel Drivers - DefCamp 2012Hunting and Exploiting Bugs in Kernel Drivers - DefCamp 2012
Hunting and Exploiting Bugs in Kernel Drivers - DefCamp 2012
DefCamp1.6K views
Bits of Advice for the VM Writer, by Cliff Click @ Curry On 2015 by curryon
Bits of Advice for the VM Writer, by Cliff Click @ Curry On 2015Bits of Advice for the VM Writer, by Cliff Click @ Curry On 2015
Bits of Advice for the VM Writer, by Cliff Click @ Curry On 2015
curryon9.8K views
Utilizing AMD GPUs: Tuning, programming models, and roadmap by George Markomanolis
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Tech Days 2015: ARM Programming with GNAT and Ada 2012 by AdaCore
Tech Days 2015: ARM Programming with GNAT and Ada 2012Tech Days 2015: ARM Programming with GNAT and Ada 2012
Tech Days 2015: ARM Programming with GNAT and Ada 2012
AdaCore1.9K views

Recently uploaded

FOSSLight Community Day 2023-11-30 by
FOSSLight Community Day 2023-11-30FOSSLight Community Day 2023-11-30
FOSSLight Community Day 2023-11-30Shane Coughlan
7 views18 slides
The Path to DevOps by
The Path to DevOpsThe Path to DevOps
The Path to DevOpsJohn Valentino
5 views6 slides
AI and Ml presentation .pptx by
AI and Ml presentation .pptxAI and Ml presentation .pptx
AI and Ml presentation .pptxFayazAli87
14 views15 slides
Programming Field by
Programming FieldProgramming Field
Programming Fieldthehardtechnology
6 views9 slides
.NET Deserialization Attacks by
.NET Deserialization Attacks.NET Deserialization Attacks
.NET Deserialization AttacksDharmalingam Ganesan
5 views50 slides
Dapr Unleashed: Accelerating Microservice Development by
Dapr Unleashed: Accelerating Microservice DevelopmentDapr Unleashed: Accelerating Microservice Development
Dapr Unleashed: Accelerating Microservice DevelopmentMiroslav Janeski
15 views29 slides

Recently uploaded(20)

FOSSLight Community Day 2023-11-30 by Shane Coughlan
FOSSLight Community Day 2023-11-30FOSSLight Community Day 2023-11-30
FOSSLight Community Day 2023-11-30
Shane Coughlan7 views
AI and Ml presentation .pptx by FayazAli87
AI and Ml presentation .pptxAI and Ml presentation .pptx
AI and Ml presentation .pptx
FayazAli8714 views
Dapr Unleashed: Accelerating Microservice Development by Miroslav Janeski
Dapr Unleashed: Accelerating Microservice DevelopmentDapr Unleashed: Accelerating Microservice Development
Dapr Unleashed: Accelerating Microservice Development
Miroslav Janeski15 views
How Workforce Management Software Empowers SMEs | TraQSuite by TraQSuite
How Workforce Management Software Empowers SMEs | TraQSuiteHow Workforce Management Software Empowers SMEs | TraQSuite
How Workforce Management Software Empowers SMEs | TraQSuite
TraQSuite6 views
predicting-m3-devopsconMunich-2023.pptx by Tier1 app
predicting-m3-devopsconMunich-2023.pptxpredicting-m3-devopsconMunich-2023.pptx
predicting-m3-devopsconMunich-2023.pptx
Tier1 app8 views
How to build dyanmic dashboards and ensure they always work by Wiiisdom
How to build dyanmic dashboards and ensure they always workHow to build dyanmic dashboards and ensure they always work
How to build dyanmic dashboards and ensure they always work
Wiiisdom14 views
Understanding HTML terminology by artembondar5
Understanding HTML terminologyUnderstanding HTML terminology
Understanding HTML terminology
artembondar57 views
Bootstrapping vs Venture Capital.pptx by Zeljko Svedic
Bootstrapping vs Venture Capital.pptxBootstrapping vs Venture Capital.pptx
Bootstrapping vs Venture Capital.pptx
Zeljko Svedic15 views
Navigating container technology for enhanced security by Niklas Saari by Metosin Oy
Navigating container technology for enhanced security by Niklas SaariNavigating container technology for enhanced security by Niklas Saari
Navigating container technology for enhanced security by Niklas Saari
Metosin Oy15 views
Top-5-production-devconMunich-2023.pptx by Tier1 app
Top-5-production-devconMunich-2023.pptxTop-5-production-devconMunich-2023.pptx
Top-5-production-devconMunich-2023.pptx
Tier1 app9 views
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium... by Lisi Hocke
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...
Lisi Hocke35 views
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P... by NimaTorabi2
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...
NimaTorabi216 views
Electronic AWB - Electronic Air Waybill by Freightoscope
Electronic AWB - Electronic Air Waybill Electronic AWB - Electronic Air Waybill
Electronic AWB - Electronic Air Waybill
Freightoscope 5 views
ADDO_2022_CICID_Tom_Halpin.pdf by TomHalpin9
ADDO_2022_CICID_Tom_Halpin.pdfADDO_2022_CICID_Tom_Halpin.pdf
ADDO_2022_CICID_Tom_Halpin.pdf
TomHalpin95 views

Advanced Linux Game Programming

  • 1. Advanced Linux Game Programming Leszek Godlewski Programmer, Nordic Games
  • 2. Nordic Games GmbH • Started in 2011 as a sister company to Nordic Games Publishing (We Sing) • Base IP acquired from JoWooD and DreamCatcher (SpellForce, The Guild, Aquanox, Painkiller) • Initially focusing on smaller, niche games • Acquired THQ IPs in 2013 (Darksiders, Titan Quest, Red Faction, MX vs. ATV) • Now shifting towards being a production company with internal devs • Since fall 2013: internal studio in Munich, Germany (Grimlore Games) 2
  • 3. Leszek Godlewski Programmer, Nordic Games • Ports ● Painkiller Hell & Damnation (The Farm 51) ● Deadfall Adventures (The Farm 51) ● Darksiders (Nordic Games) • Formerly generalist programmer on PKHD & DA at TF51 3
  • 4. Objective of this talk Your game engine on Linux, before porting: 4 Missing!
  • 5. Objective of this talk (cont.) Your first “working” Linux port: 5 Oops. Bat-Signal!
  • 6. Objective of this talk (cont.) Where I want to try helping you get to: 6
  • 7. In other words, from this: 7
  • 9. And that's mostly debugging All sorts of debuggers! 9 apitrace
  • 11. Intended takeaway ● Build system improvements ● Signal handlers ● Memory debugging with Valgrind ● OpenGL debugging techniques 11
  • 12. Intended takeaway Agenda ● Build system improvements ● Signal handlers ● Memory debugging with Valgrind ● OpenGL debugging techniques 12
  • 13. Build systems What I had initially with UE3: ● Copy/paste of the Mac OS X toolchain ● It worked, but... ● Slow ● Huge binaries because of debug symbols ● Problematic linking of circular dependencies 13
  • 14. Build systems (cont.) ● 32-bit binaries required for feature/hardware parity with Windows ● Original solution: a chroot jail with an entire 32-bit Ubuntu system just for building 14
  • 15. Cross-compiling for 32/64-bit ● gcc -m32/-m64 is not enough! ● Only sets target code generation ● Not headers & libraries (CRT, OpenMP, libgcc etc.) ● Fixed by installing gcc-multilib ● Dependency package for non-default architectures (i.e. i386 on an amd64 system and vice versa) 15
  • 16. Clang (ad nauseam) ● Clang is faster ● gcc: 3m47s ● Clang: 3m05s ● More benchmarks at Phoronix [LARABEL13] ● Clang has different diagnostics than gcc 16
  • 17. Clang (cont.) ● Preprocessor macro compatibility ● Declares __GNUC__ etc. ● Command line compatibility ● Easily switch back & forth between Clang & gcc 17
  • 18. Clang – caveats ● C++ object files may be incompatible with gcc & fail to link (need full rebuilds) ● Clang is not as mature as gcc ● Occasionally has generated faulty code for me (YMMV) 18
  • 19. Clang – caveats (cont.) ● Slight inconsistencies in C++ standard strictness ● Templates ● Anonymous structs/unions ● May need to add this-> in some places ● May need to name some anonymous types 19
  • 20. So: Clang or gcc? Both: ● Clang – quick iterations during development ● gcc – final shipping binaries 20
  • 21. Linking – GNU ld ● Default linker on Linux ● Ancient ● Single-threaded ● Requires specification of libraries in the order of reverse dependency... ● We are not doomed to use it! 21
  • 22. Linking – GNU gold ● Multi-threaded linker for ELF binaries ● ld: 18s ● gold: 5s ● Developed at Google, now officially part of GNU binutils 22
  • 23. Linking – GNU gold (cont.) ● Drop-in replacement for ld ● May need an additional parameter or toolchain setup ● clang++ -B/usr/lib/gold-ld ... ● g++ -fuse-ld=gold ... ● Still needs libs in the order of reverse dependency... 23
  • 24. Linking – reverse dependency ● Major headache/game-breaker with circular dependencies ● ”Proper” fix: re-specify the same libraries over and over again ● gcc app.o -lA -lB -lA 24
  • 25. Linking – reverse dep. (cont.) 25 app A B
  • 26. Linking – reverse dep. (cont.) 26 app A B
  • 27. Linking – reverse dep. (cont.) 27 app A B
  • 28. Linking – reverse dep. (cont.) 28 app A B A Just the missing symbols
  • 29. Linking – library groups ● Declare library groups instead ● Wrap library list with --start-group, --end- group ● Shorthand: -(, -) ● g++ foo.obj -Wl,-( -lA -lB -Wl,-) ● Results in exhaustive search for symbols 29
  • 30. Linking – library groups (cont.) ● Actually used for non-library objects (TUs) ● Caveat: the exhaustive search! ● Manual warns of possible performance hit ● Not observed here, but keep that in mind! 30
  • 31. Running the binary in debugger inequation@spearhead:~/projects/largebinary$ gdb -– silent largebinary Reading symbols from /home/inequation/projects/larg ebinary/largebinary... [zzz... several minutes later...] done. (gdb) 31
  • 32. Caching the gdb-index ● Large codebases generate heavy debug symbols (hundreds of MBs) ● GDB does symbol indexing at every single startup � ● Massive waste of time! 32
  • 33. Caching the gdb-index (cont.) ● Solution: fold indexing into the build process ● Old linkers: as described in [GNU01] ● New linkers (i.e. gold): --gdb-index ● May need to forward from compiler driver: -Wl,--gdb-index 33
  • 34. Agenda ● Build system improvements ● Signal handlers ● Memory debugging with Valgrind ● OpenGL debugging techniques 34
  • 35. Signal handlers ● Unix signals are async notifications ● Sources can be: ● the process itself ● another process ● user ● kernel 35
  • 36. Signal handlers (cont.) ● A lot like interrupts ● Jump to handler upon first non-atomic op 36 Normal program flow Normal flow resumes Signal handler execution SIGNAL
  • 37. Signal handlers (cont.) ● System installs a default handler ● Usually terminates and/or dumps core ● Core ≈ minidump in Windows parlance, but entire mapped address range is dumped (truncated to RLIMIT_CORE bytes) ● See signal(7) for default actions 37
  • 38. Signal handlers (cont.) ● Can (should!) specify custom handlers ● Get/set handlers via sigaction(2) ● void handler(int, siginfo_t *, void *); ● Needs SA_SIGINFO flag in sigaction() call ● Extensively covered in [BENYOSSEF08] 38
  • 39. Interesting siginfo_t fields ● si_code – reason for sending the signal ● Examples: signal source, FP over/underflow, memory permissions, unmapped address ● si_addr – memory location (if relevant) ● SIGILL, SIGFPE, SIGSEGV, SIGBUS and SIGTRAP 39
  • 40. Interesting signals ● Worth catching ● SIGSEGV, SIGILL, SIGHUP, SIGQUIT, SIGTRAP, SIGIOT, SIGBUS, SIGFPE, SIGTERM, SIGINT ● Worth ignoring ● signal(signum, SIG_IGN); ● SIGCHLD, SIGPIPE 40
  • 41. Signal handling caveats ● Prone to race conditions ● Signals may be nested 41 Normal flow Signal handler SIGNAL Signal handler SIGNAL SIGNAL
  • 42. Signal handling caveats (cont.) ● Prone to race conditions ● Can't share locks with the main program 42 Lock mutex Lock mutex Deadlock ☹ Signal handler SIGNAL Normal flow
  • 43. Signal handling caveats (cont.) ● Prone to race conditions ● Can't call async-unsafe/non-reentrant functions ● See signal(7) for a list of safe ones ● Notable functions not on the list: ● printf() and friends (formatted output) ● malloc() and free() 43
  • 44. Signal handling caveats (cont.) ● Not safe to allocate or free heap memory 44 STOMPED? STOMPED? STOMPED? STOMPED? STOMPED? STOMPED? STOMPED? STOMPED? STOMPED? STOMPED? STOMPED? Source: [LEA01]
  • 45. Signal handling caveats (cont.) ● Custom handlers do not dump core ● At handler installation time: ● Raise RLIMIT_CORE to desired core size ● Inside handler, after custom logging: ● Restore default handler using signal(2) or sigaction(2) ● raise(signum); 45
  • 46. Safe stack walking ● glibc provides backtrace(3) and friends ● Symbols are read from the dynamic symbol table ● Pass -rdynamic at compile-time to populate 46
  • 47. Safe stack walking (cont.) ● backtrace_symbols() internally calls malloc() ● Not safe... ☹ ● Still, can get away with it most of the time (YMMV) 47
  • 48. Example “proper” solution ● Fork a watchdog process in main() ● Communicate over a FIFO pipe ● In signal handler: ● Collect & send information down the pipe ● backtrace_symbols_fd() down the pipe ● Demo code: is.gd/GDCE14Linux 48
  • 49. Agenda ● Build system improvements ● Signal handlers ● Memory debugging with Valgrind ● OpenGL debugging techniques 49
  • 50. Is this even related to porting? ● Yes! Portability bugs easily overlooked ● Hardcoded struct sizes/offsets ● OpenGL buffers ● Incorrect binary packing/unpacking ● “How did we/they manage to ship that?!” 50
  • 51. What is Valgrind? ● Framework for dynamic, runtime analysis ● Dynamic recompilation ● machine code IR tool machine code→ → → ● Performance typically at 25-20% of unmodified code ● Worse if heavily threaded – execution is serialized 51
  • 52. What is Valgrind? (cont.) ● Many tools in it: ● Memory error detectors (Memcheck, SGcheck) ● Cache profilers (Cachegrind, Callgrind) ● Thread error detectors (Helgrind, DRD) ● Heap profilers (Massif, DHAT) 52
  • 53. Memcheck basics ● Basic usage extremely simple ● …as long as you use the vanilla libc malloc() ● valgrind ./app ● Will probably report a ton of errors on the first run! ● Again: “How did they manage to ship that?!” 53
  • 54. Memcheck basics (cont.) ● Many false positives, esp. in 3rd parties ● Xlib, NVIDIA driver ● Can suppress them via suppress files ● Call Valgrind with --gen-suppressions=yes to generate suppression definitions ● Be careful with that! Can let OpenGL bugs slip! 54
  • 55. Contrived example #include <stdlib.h> int main(int argc, char *argv[]) { int foo, *ptr1 = &foo; int *ptr2 = malloc(sizeof(int)); if (*ptr1) ptr2[1] = 0xabad1dea; else ptr2[1] = 0x15bad700; ptr2[0] = ptr2[2]; return *ptr1; } 55 Demo code: is.gd/GDCE14Linux
  • 56. Valgrind output for such ==8925== Conditional jump or move depends on uninitialised value(s) ==8925== Invalid write of size 4 ==8925== Invalid read of size 4 ==8925== Syscall param exit_group(status) contains uninitialised byte(s) ==8925== LEAK SUMMARY: ==8925== definitely lost: 4 bytes in 1 blocks 56
  • 57. What about custom allocators? ● Custom memory pool & allocation algo ● Valgrind only “sees” mmap()/munmap() of multiples of entire memory pages ● All access within those pages – now valid! ● How to track errors? 57
  • 58. Client requests ● Allow annotation of custom allocators ● ~20 C macros defined in valgrind.h ● Common and per-tool requests exist ● Can be cut out with -DNVALGRIND ● Detailed description in [VALGRIND01] 58
  • 59. Example: Instrumenting dlmalloc ● 2.8.4 instrumentation from [CRYSTAL01] ● Demo code: is.gd/GDCE14Linux ● Compile the sample with -DDLMALLOC ● Similar results to libc malloc() 59
  • 60. Other uses of client requests ● Pointer validation ● Is address mapped? Is it defined? ● Mid-session leak checks ● Level transitions 60
  • 61. Other uses of client req. (cont.) ● Poisoning memory regions ● Ensuring signal handlers don't touch the heap ● Ensuring geometry buffers aren't read on CPU 61Source: [LEA01]
  • 62. Debugging inside Valgrind ● A gdbserver for “remote” debugging ● SIGTRAP (breakpoint) on every error ● Unlimited memory watchpoints! ● Data breakpoints in Visual Studio parlance ● Cf. 4 single-word hardware debug registers on x86 62
  • 63. Debugging inside Valgrind (cont.) ● Terminal A: ● valgrind --vgdb=yes --vgdb-error=0 ./MyGame ● Terminal B: ● gdb ./MyGame ● target remote | vgdb 63
  • 64. Agenda ● Build system improvements ● Signal handlers ● Memory debugging with Valgrind ● OpenGL debugging techniques 64
  • 65. Ye Olde Way ● Call glGetError() after each OpenGL call ● Get 1 of 8 (sic!) error codes ● Look up the call in the manual ● See what this particular error means in this particular context… 65
  • 66. Ye Olde Way (cont.) ● …Then check what was actually the case ● 6 possible reasons for GL_INVALID_VALUE in glTexImage*() alone! See [OPENGL01] ● Usually: attach a debugger, replay the scenario… ● This sucks! 66
  • 67. Ye Olde Way (cont.) ● …Then check what was actually the case ● 6 possible reasons for GL_INVALID_VALUE in glTexImage*() alone! See [OPENGL01] ● Usually: attach a debugger, replay the scenario… ● This sucks! used to suck ☺ 67
  • 68. Debug callback ● Never call glGetError() again! ● Much more detailed information ● Incl. performance tips from the driver ● Good to check what different drivers say ● May not work without a debug OpenGL context (GLX_CONTEXT_DEBUG_BIT_ARB) 68
  • 69. Debug callback (cont.) ● Provided by either of (ABI-compatible): GL_KHR_debug [OPENGL02], GL_ARB_debug_output [OPENGL03] 69 OpenGL OpenGL ES NVIDIA (official) AMD (official) Intel (Mesa) AMD (Mesa) ARB _debug _output √ × √ √ √ √ KHR _debug √ √ √ √ × ×
  • 70. Debug callback (cont.) void callback(GLenum source, GLenum type, GLuint id, GLenum severity, GLsizei length, const GLchar* message, const void* userParam); 70 Filter by source, type, severity or individual messages
  • 71. Debug callback (cont.) ● Verbosity can be controlled (filtering) ● glDebugMessageControl[ARB]() ● [OPENGL02][OPENGL03] ● Turn to 11 (GL_DONT_CARE) for valuable perf information! ● Memory type for buffers, unused mip levels… 71
  • 72. API call tracing ● Record a trace of the run of the application ● Replay and review the trace ● Look up OpenGL state at a particular call ● Inspect state variables, resources and objects: textures, shaders, buffers... ● apitrace or VOGL 72
  • 73. Well, this is not helpful... 73
  • 75. KHR_debug EXT _debug _marker EXT _debug _label GREMEDY _string _marker GREMEDY _frame _terminator One-off messages √ √ × √ × Call grouping √ √ × × × Object labels √ × √ × × Frame terminators × × × × √ Support Good Limited Limited Limited Limited Annotating the call stream 75
  • 76. Annotating the call stream (cont.) ● All aforementioned extensions supported by apitrace regardless of driver ● Recommended: GL_KHR_debug 76
  • 77. Annotating the call stream (cont.) ● Call grouping ● glPushDebugGroup()/glPopDebugGroup() ● One-off messages ● glDebugMessageInsert[ARB]() ● glStringMarkerGREMEDY() 77
  • 78. Object labelling ● glObjectLabel(), glGetObjectLabel() ● Buffer, shader, program, vertex array, query, program pipeline, transform feedback, sampler, texture, render buffer, frame buffer, display list ● glObjectPtrLabel(), glGetObjectPtrLabel() ● Sync objects 78
  • 79. Annotation caveats ● Multi-threaded grouping may break hierarchy ● glDebugMessageInsert() calls the debug callback, polluting error streams ● Workaround: drop if source == GL_DEBUG_SOURCE_APPLICATION 79
  • 80. Example 1: PIX events emulation #define D3DPERF_BeginEvent(colour, name) if (GLEW_KHR_debug && threadOwnsDevice()) glPushDebugGroup(GL_DEBUG_SOURCE_APPLICATION, (GLuint)colour, -1, name) #define D3DPERF_EndEvent() if (GLEW_KHR_debug && threadOwnsDevice()) glPopDebugGroup() 80
  • 81. Example 2: Game tech demo ● University assignment from 2009 ☺ ● Annotated OpenGL 1.4 ● Demo code: is.gd/GDCE14Linux 81
  • 82. Takeaway ● gcc-multilib is the prerequisite for 32/64- bit cross-compilation ● Switching back and forth between Clang and gcc is easy and useful ● Link times can be greatly improved by using gold 82
  • 83. Takeaway (cont.) ● Caching the gdb-index improves debugging experience ● Crash handling is easy to do, tricky to get right 83
  • 84. Takeaway (cont.) ● Valgrind is an enormous aid in memory debugging ● Even when employing custom allocators ● OpenGL debugging experience can be vastly improved using some extensions 84
  • 86. Thank you! Further Nordic Games information: K www.nordicgames.at Development information: K www.grimloregames.com 86
  • 87. References ● LARABEL13 – Larabel, M. “Clang 3.4 Performance Very Strong Against GCC 4.9” [link] ● GNU01 – “Index Files Speed Up GDB” [link] ● GNU02 – “Options for Debugging Your Program or GCC” [link] ● BENYOSSEF08 – Ben-Yossef, G. “Crash N' Burn: Writing Linux application fault handlers” [link ] ● LEA01 – Lea, D. “A Memory Allocator” [link] ● VALGRIND01 – “The Client Request mechanism” [link] ● CRYSTAL01 – “Crystal Space 3D SDK” [link] ● OPENGL01 – “glTexImage2D” [link] ● OPENGL02 – “ARB_debug_output” [link] ● OPENGL03 – “KHR_debug” [link] ● XDG01 – “XDG Base Directory Specification” [link] ● <page>(<section>), e.g. sigaction(2) – “Linux Programmer's Manual”; to view, type man <section> <page> into a terminal or a web search engine 87
  • 88. Special thanks Fabian Giesen Katarzyna Griksa Damian Kobiałka Jetro Lauha Eric Lengyel Krzysztof Narkowicz Reinhard Pollice Bartłomiej Wroński Kacper Ząber 88
  • 89. Bonus slides! ● OpenGL resource leak checking ● Intel i965 driver vs stack ● Locating user data according to FreeDesktop.org guidelines ● Thread priorities in Linux ● Additional/new debug features 89 FREE!!!FREE!!!
  • 90. OpenGL resource leak checking Courtesy of Eric Lengyel & Fabian Giesen static void check_for_leaks() { GLuint max_id = 10000; // better idea would be to keep track of assigned names. GLuint id; // if brute force doesn't work, you're not applying it hard enough for ( id = 1 ; id <= max_id ; id++ ) { #define CHECK( type ) if ( glIs##type( id ) ) fprintf( stderr, "GLX: leaked " #type " handle 0x%xn", (unsigned int) id ) CHECK( Texture ); CHECK( Buffer ); CHECK( Framebuffer ); CHECK( Renderbuffer ); CHECK( VertexArray ); CHECK( Shader ); CHECK( Program ); CHECK( ProgramPipeline ); #undef CHECK } } 90
  • 91. Intel i965 vs stack ● Been chasing a segfault on a call instruction down _mesa_Clear() (glClear()) ● Region of code copy/pasted from D3D renderer ● Address mapped, so not an invalid jump... ● Only 16 function frames – surely this can't be a stack overflow? 91
  • 92. Intel i965 vs stack (cont.) ● Oh no, wait: ● Check ESP against /proc/[pid]/maps ● Yup, encroaching on unmapped address space ● Moral: cut your render some stack slack (160+ kB), or Mesa will blow it up with locals (e.g. in clear shader generation) 92
  • 93. Locating user data ● There is a spec for that – see [XDG01] ● Savegames, screenshots, options etc.: ● $XDG_CONFIG_HOME or ~/.config/<app> ● Caches of all kinds: ● $XDG_CACHE_HOME or ~/.cache/<app> ● Per-user persistent data (e.g. DLC): ● $XDG_DATA_HOME or ~/.local/share/<app> 93
  • 94. Locating user data (cont.) ● <app> subdirectory currently unregulated ● De-facto standard: simplified or “Unix name” ● Lowercase, “safe” ASCII characters, e.g. blender ● When asked, XDG people suggest rev-DNS ● com.company.appname 94
  • 95. Thread priorities in Linux ● Priority elevation requires root permissions � ● No user will ever grant you root (scary!) ● Reason: DoS protection in servers (probably) ● Priority can be tweaked with nice() ● Think “how nice the process is to others” ● Being nice to everyone will starve your process ● Niceness can be negative (but only with root) 95
  • 96. Thread priorities in Linux (cont.) ● Why not setpriority(2)? ● Also sets scheduling algorithm here be dragons→ ● Priority values have different meaning per scheduler ● Still needs root ● What about capabilities(7)? ● This might actually work if your users trust you ● Demo code: is.gd/GDCE14Linux 96
  • 97. Thread priorities in Linux (cont.) ● Don't all threads in a process share niceness? ● They should, according to POSIX, but they don't! ● One of the few cases where Linux is non- compliant 97
  • 98. Additional/new debug features ● Additional debug info: -g3 ● Including #defines (macros) ● Better debugger performance [GNU02]: ● -fdebug-types-section: improved layout ● -gpubnames: new format for index 98

Editor's Notes

  1. Hi everyone, my name is Leszek Godlewski and I&amp;apos;m here today to talk about some of the more advanced aspects of Linux game programming. Or, porting your game engine to Linux, to be more precise. These notes will only supplement the slides, they are not a transcript.
  2. I&amp;apos;m a programmer with Nordic Games. A bit unusual role with the company since it&amp;apos;s mostly a publisher, but I&amp;apos;m the resident Linux guy and generalist programmer where necessary. I&amp;apos;ve got three ports under my belt: Painkiller Hell &amp; Damnation and Deadfall Adventures for The Farm 51, and Darksiders for Nordic. I was also a generalist programmer at TF51 on the original teams for those two games.
  3. Who can guess what these are, apart from the obvious ones? :) Answers, clockwise from top left: GDB, Valgrind, apitrace (doh), Nsight (doh!), GPU PerfStudio, VOGL (doh...).
  4. The focus of this talk lies in these 4 areas. I&amp;apos;d like to share my findings about optimizing native code builds, signal (crash) handling, jump on to memory debugging stuff using the Valgrind tool, and finish off with OpenGL debugging techniques. There are some bonus slides if there is still time, too.
  5. Actually, let&amp;apos;s adopt this as the agenda. :) The upcoming element will be in bold.
  6. This is based on the two UE3 titles, but also applied to Darksiders to some extent (there was no Mac toolchain, had to make one from scratch). It displayed the same problems.
  7. As much as I would love 32-bit to die, I had to support it to achieve full parity with Windows. A chroot is a bit like a VM – a separate, sandboxed view of the entire system environment (i.e. file system), but running on the same kernel. Host system is Debian amd64.
  8. Fortunately, I could get rid of that chroot jail thanks to gcc-multilib.
  9. “Did you mean x?” and colours for the win.
  10. C object files are unaffected. Clang seems to have slightly different C++ symbol name generation rules (pre-mangling).
  11. Officially in binutils → ships as an optional part of default toolchain on all modern distros.
  12. And not only circular – sometimes it&amp;apos;s tricky to even get regular dependencies in the right order. Circular actually happened to me with the PhysX 2.8.4 loader. That was even worse because even though it seemed to have linked, the dynamic linker at runtime still failed to resolve everything...
  13. Here&amp;apos;s how this reverse dependency thing works: you first declare the dependent object, then its dependency. Starting the construction of a building from the roof, if you will. First, an “app” object file is linked. It has unsatisfied imports for symbols from library A.
  14. Library A&amp;apos;s exports satisfy the imports of app, but they import symbols from B.
  15. Library B&amp;apos;s exports match library A&amp;apos;s imports, but B now has unsatisfied imports of A&amp;apos;s symbols.
  16. By re-specifying A, the linker satisfies the A imports in library B.
  17. This is your typical GDB session start at the beginning of porting.
  18. Old linkers need some shell script magic – run GDB in batch mode etc. Here&amp;apos;s what I had been using before discovering --gdb-index: @echo Generating gdb-index in $(OUTPUT_PATH)/gdb-index/$(notdir $@).gdb-index @mkdir -p $(OUTPUT_PATH)/gdb-index @$(GDB) -batch -ex &amp;quot;save gdb-index $(OUTPUT_PATH)/gdb-index&amp;quot; $@ @echo Merging gdb-index into $@ @$(OBJCOPY) --add-section .gdb_index=$(OUTPUT_PATH)/gdb-index/$(notdir $@).gdb-index --set-section-flags .gdb_index=readonly $@ $@
  19. A program can possibly resume normal execution after a signal.
  20. Signal handler can do different things by default: terminate, dump core or stop (suspend). I&amp;apos;ll talk about RLIMIT_CORE a bit later. It&amp;apos;s actually preferable to perform a clean exit (instead of aborting) in response to SIGTERM and SIGQUIT.
  21. These fields are a good reason for using the extended information handlers (with siginfo_t), as opposed to the traditional signal(2) calls with void sighandler(int) handlers. si_code is an enumeration, not a bit field. Includes reasons general (sent by user, kernel, async I/O operation etc.) and specific to the signal (e.g. whether SIGILL was caused by invalid opcode, operand or addressing mode etc.).
  22. Not all signals are interesting. SIGTRAP is a programmatical breakpoint (you can also raise(SIGTRAP) from code to have the debugger stop). It&amp;apos;s up to you to decide how to handle that. SIGCHLD is only of note if you plan on spawning sub-processes. Similarly for SIGPIPE, if you&amp;apos;re interested in notifications for the other end breaking a FIFO pipe.
  23. In reality, this is slightly more complex than that because of masking and deferring of signals. See sigaction(2) for details. Worth noting that by default, further signalling of the same signal type will be deferred until the return of the handler, unless you specify SA_NODEFER or explicitly unmask that signal. Ben-Yossef recommends using a spinlock, but warns of deadlocking if contending against a thread with higher priority. In that case it&amp;apos;s better to just try locking and sleep with pselect(), but that&amp;apos;s a overkill in most cases.
  24. What if your code is interrupted right after grabbing the lock? Unless your mutex is re-entrant and the lock happened on the signalled thread, of course.
  25. Speaking of re-entrant…
  26. Allocator&amp;apos;s data structures may have been damaged in the fault.
  27. When using sigaction(2) to install your handler, you will need a stash of struct sigaction in static memory and populate it when installing your handler. You might not be able to bump RLIMIT_CORE due to permissions. You may also want to limit it, as core dumps are often huge because they include a dump of all the mapped memory tranges. However, smaller core size → truncation → not all structures/ranges can be dumped → dump may actually be useless (stack not in the dumped memory range). Important to get a stack trace in the handler anyway. The raise(3) call may not work unless you unmask the signal. This is demonstrated in the code example. Also worth noting that apps are not really in control of core dump filename (/proc/sys/kernel/core_pattern or sysctl).
  28. The backtrace() function is easy to use, just pass it a static array of void pointers and its size.
  29. Personally, I haven&amp;apos;t had any problems with just calling backtrace_symbols() and calling it a day, but I imagine it may be a bad idea in memory-constrained situations or when nested signalling happens.
  30. Pipe communication protocol up to the developer.
  31. One such case was not setting the right OpenGL pixel packing/unpacking rules when uploading textures, resulting in reading beyond the buffer.
  32. Memcheck doesn&amp;apos;t watch the stack or static variables, only the heap. SGcheck – the reverse.
  33. Valgrind will also watch out for allocations made by the driver.
  34. This very bad code makes several mistakes: Branch on uninitialized variable Overwrite memory past malloced area Read past malloced area Make a syscall with uninitialized memory as a parameter (return code) Leak memory allocated to ptr2. Actual demo code in repo also has an example for use-after-free.
  35. Memcheck easily identifies all the bugs. Actual output includes call stacks, both for when the errors occur and for the allocation/freeing of the relevant memory chunk. Running with --leak-check=full also pinpoints the source of the leak.
  36. Valgrind has no way of knowing about individual allocations and frees from the pool. Thus it won&amp;apos;t protect control structures etc. Worth noting that mmap() calls for mapping large address ranges will cause Valgrind to issue a warning (not sure about the threshold value, but a 512MB heap does trigger it).
  37. Client requests have negligible performance impact and are no-ops if Valgrind is not present.
  38. This instrumentation could be improved by using the MEMPOOL family of client requests instead, but it&amp;apos;s still quite verbose and useful.
  39. Pointer validation could be used in garbage collectors, for instance.
  40. I&amp;apos;m sure you&amp;apos;ll find a ton of other uses for making sure some data stays untouched.
  41. I will deliberately not speak of IHV tools because they are complex enough to warrant separate talks. I&amp;apos;ll leave that to the IHV&amp;apos;s dev relations teams. :)
  42. No longer does, because we no longer need to do that.
  43. The extension spec says that it&amp;apos;s only suggested that driver implementers make it available only in debug contexts. It might just work in a regular one with your driver, but better stay on the safe side.
  44. State of Mesa at the time of writing this, on my ThinkPad T500 with Debian amd64 testing.
  45. You don&amp;apos;t get just the message, but also some metadata and a user pointer. IDs are unregulated, so drivers may use them as they see fit. They will definitely vary from vendor to vendor.
  46. You can also filter the messages programatically with ifs. Intel is pretty verbose by default; NVIDIA needs setting all channels to GL_DONT_CARE to spew out more performance hints.
  47. It&amp;apos;s a flat list. Good luck finding a faulty call in a multi-pass renderer in it.
  48. Does not a tree view improve things?
  49. In multi-threaded apps, look at the thread ID from which you&amp;apos;re trying to push or pop a debug group. Callback filtering can be achieved via glDebugMessageControl[ARB]() or programmatically. Just remember to always set source to GL_DEBUG_SOURCE_APPLICATION in your glDebugMessageInsert[ARB]() calls.
  50. There&amp;apos;s a little trick here – since there are no set rules about message IDs, I&amp;apos;m casting the D3DCOLOR (32-bit pixel) to GLuint and using that as an ID. This was extremely cool to do, because I could instantly take advantage of the existing instrumentation on the whole engine.
  51. The reason this demo was written for such an old version of OpenGL is that it needed to run on an old mobile Intel GPU (i915 I think). Still works nicely with KHR_debug, though.
  52. Shout out to people who helped me (in various ways) to prepare this talk!
  53. This originally came together as a half-joke on Twitter between those two guys. Mind you that this would need adjustments for ES (probably going over the entire 32-bit range minus 0), as names are not guaranteed to be consecutive there, whereas desktop GL always returns the first free name. “If brute force doesn&amp;apos;t work, you&amp;apos;re not applying it hard enough.”
  54. These performance improvement features require fairly recent version of GCC and GDB (probably 4.9 and 7.0, respectively).