C++ on the Web
A Tale from the Trenches
Head of Development, Berlin
GDC Europe 2013
What’s this about?
• the web as a new target platform for C++ code
• differences to traditional platforms
• differences between C++/web technologies
• porting problems and solutions
• Dragons Demo: minimal 3D skinned
character demo [show demo]
• Map Demo: more advanced 3D demo
• based on Nebula3 engine, also used in
Why develop for the web
HTML5 + WebGL?
Create Deploy Play
• no walled gardens, no gate-keepers, no certiﬁcation process
• free choice of hosting & payment providers
• no installations, no updates, no plugins, no lengthy downloads
• multi-platform “for free”
• battle-hardened security infrastructure
The web is the most open and seamless
platform for users and developers.
C++ to web technologies
LLVM has opened up a lot of new usage
scenarios for C/C++...
...for instance running C/C++ code inside byte
codeVMs and other sandboxed environments:
• OpenSource project, started in 2010
• .cpp → LLVM .bc → .js
• extremely active and responsive dev team
• lots of wrapper APIs (OpenGL, SDL, GLUT, ...)
• limited threading support (no pthreads)
• asm.js (highly optimizable subset of JS)
• massive compilation speed improvements
• OpenSource project, started in 2008
• .cpp → LLVM .bc → (deploy) x86/x64/ARM
• Google Chrome only
• safe sandbox for native code execution
• full pthreads implementation
• pNaCl ﬁnally ready for prime-time
• enabled in Chrome v.30 and up
• no longer restricted to Chrome Web Store apps
• formerly known as Alchemy and ﬂascc
• started in 2008, recently open-sourced
• .cpp → LLVM .bc → AVM2 byte code
• runs in Flash plugin
• proprietary 3D API (Stage3D)
• incredibly slow and resource hungry build process :/
Will mostly talk about emscripten (and some pNaCl)
• emscripten has widest reach (all major browsers)
• emscripten progresses incredibly fast
• pNaCl currently has edge in threading support
• pNaCl and emscripten are actually quite similar from dev
asm.js generated code is probably faster than you think, and pNaCl
generated code is probably slower than you think (don’t have hard
benchmark numbers yet... sorry)
IMHO: for 3D games, the real performance gains will come
through WebGL extensions, high call-overhead requires
extensions to reduce number of GL calls!
My [OSX] dev environment
• Xcode (for compiling/debugging native OSX and iOS apps)
• Eclipse (for emscripten and NaCl speciﬁc dev work)
• emscripten SDK
• NaCl SDK
• a local HTTP server (e.g.“python -m SimpleHTTPServer”)
Multiplatform Build System
CMakeLists.txt + make
cmake: ﬂexible meta-build-system, generates IDE
project ﬁles and/or makeﬁles from generic
cmake toolchain ﬁles deﬁne
platform-speciﬁc tools, header/lib
search paths and compile options
CMakeLists.txt ﬁles deﬁne
compile targets and their source
code must be
32/64 bit clean
else is POSIX-ish
but D3D9 still relevant
porting to exotic
harder, not easier
•WinXP is still incredibly big in Eastern Europe & Asia
• 3D feature base-line is OpenGL ES 2.0 w/o extensions (== WebGL)
• go fully GL on all platforms? (GL driver quality? Win8 Metro apps? ANGLE?)
What keeps me awake at night:
N3 Multiplatform Philosophy
Platform-speciﬁc code lives in its
Platform-speciﬁc pre-processor deﬁnes
provided by build system.
class NaclDisplayCore class EmscDisplayCore class IOSDisplayCore
#if __NACL__ #if __EMSCRIPTEN__ #if __IOS__
#endif #endif #endif
Diamond-shape class hierarchy resolved at compile time:
Multiplatform Line Counts
Dragons Demo (~170k lines of code)
about ~2% platform-
The Callback Problem
Key Point to understand (and accept):
Browser runtime environment uses callback
model for asynchronous programming.
Start lengthy operation, provide callback which will be
called when operation is ﬁnished: becomes very messy
Games are usually frame-driven, not callback-driven.
This is the main riddle when trying to port a game
engine to browser platforms.
The Game Loop Problem
Most event-driven platforms don’t let you “own the
Instead the application runs completely inside event
callback functions which must return quickly.
Failing to return quickly results in unresponsive
behaviour or even your app being killed.
The Game Loop Problem
Best solution is to use the app main thread
exclusively for system event handling...
...and spawn a “Game Thread” which runs the actual
up on system
Runs your typical
Some platforms have restrictions what OS functionality
is accessible from threads.
E.g. must call OpenGL or IO functions from the
main thread only.
Either run everything on main thread, or dispatch
“system calls” to run asynchronously on main thread.
All “PPAPI calls” must happen on main thread, and the
main thread must never block.
Threads can push function pointers for deferred
execution on main thread.
Deferred function calls and result callbacks execute in a simple
run-loop after your per-frame callback on the main thread.
This primitive runloop/callback model makes it easy to shoot
yourself in the foot by waiting for events triggered by your own
callbacks.This stops the entire runloop and freezes the app.
But: All other threads can block as much as they want, waiting for events
triggered by callbacks on the main thread. Nice way to simulate blocking
Conclusion: pNaCl’s full threading support can be used to
workaround many of its restrictions by moving the actual game logic
into its own thread, and use the main thread only for “system calls” and
their result callbacks.
launch Game Thread
begin async IO,
put StartIO func ptr on
main thread’s run
queue and wait for
ﬁnished-condvar is set,
continue Game Thread
invoke callbacks to
pNaCl app code:
invoke deferred funcs
invoke result callbacks
Similar restrictions as pNaCl, but can’t
easily use threads to workaround them:
• most “interesting functions” (WebGL!) must be called from main thread
• main thread must not block
• no pthreads, only WebWorkers for threading
•WebWorkers have their own “address space”
Can’t move entire game loop into WebWorker
Browser vendors working towards more ﬂexible
WebWorkers, but HTML5 standardization takes time.
All your code must run inside “slices”,
always return within 16 or 32 ms to browser.
If something takes longer, either spread work
over several frames, or move into WebWorker.
N3 has new “PhasedApplication” model: app goes through phases,
which tick themselves forward when ﬁnished.
max 16ms or 32ms (for 60 or 30 fps)
Failed approach: Try to wrap low-level threading code in some sort of
“co-operative thread scheduling” system.
Success: Move abstraction to a higher level (don’t wrap “low level
threads”, but wrap “parallel task system”).
2 uses for threading: hide blocking / make use of additional CPU cores.
Nebula3 parallel task system model
• Blocking: thread sleeps until messages arrives
• Timeout: block until messages arrive, or timeout occurs
• Run-through: inﬁnite loop doing per-frame work, pull messages
emscripten port adds 2 “run modes”:
• Parallel: work is pushed to WebWorker threads (makes use of cpu cores)
• Sliced: runs on main-thread, work is “triggered” per frame (hides callback mess)
Nebula3 IO System
Closer to HTTP philosophy then fopen()/fclose():
• URLs instead of ﬁle system paths
• asynchronous IO is default, synchronous is special case
• pluggable ﬁlesystem handlers associated with URL scheme (http://, ﬁle://, ...)
• Stream objects with StreamReaders and StreamWriters
Stream object with
• Filesystem modules return Stream objects holding downloaded data
• Stream objects have typical Read/Seek/... methods
• IO reponse is a “Future” object, app code polls whether response has become valid
Easy way: emscripten can pre-load assets into memory before app starts,
accessible through fopen() / fread()
Downside: delay on startup, memory cost - doesn’t work well for big asset sets.
Solution: need to stream and uncompress all assets on demand asynchronously
Problem: HTTP downloads much slower than loading from HDD, can’t
block while waiting for download to ﬁnish.
HTTP File System has platform-speciﬁc implementations:
• emscripten: emscripten_async_wget_data()
• pNaCl: pp::URLLoader
• OSX/iOS: NSURLRequest
• Linux / Windows: libCURL
• fallback: home-made HTTP client using raw TCP sockets (tricky!)
Loading Screen On
Problem: Sometimes asynchronous loading is too much hassle, or even impossible
(for instance when using 3rd party libs).
Solution: Have pre-loading app phases, show loading screen, download and pin
ﬁles into a memory ﬁlesystem, continue to next app phase when ﬁles have
Synchronous IO functions exclusively access data in memory ﬁlesystem, fail if ﬁle
hasn’t been preloaded.
Loading Screen Off
Loading Screen On
Loading Screen Off
Only use this approach when absolutely
necessary and only for small ﬁles, not for
textures, geometry, audio, etc...
None of the C++ web solutions have really good
interactive debugging support (yet).
Develop and debug your app mainly as a native desktop app for
OSX or Windows inside XCode orVStudio, this gives the best turn-
around time and “debugging experience”
Only fall-back to low-level debugging for platform-speciﬁc code.
emscripten debugging can be surprisingly easy:
• can inject debugging statements without recompiling
• see emscripten/src/settings.js for some interesting runtime debug options
JS Debugging with Source Maps
emcc -g4 generates source maps containing
reference data to the original C++ sources.
Interactively debug C++ code in the browser!
(still feels very rough around the edges though)
Too many slides, too little time...
Other interesting problem areas:
WebAudio vs Audio tag
format across browsers
WebSockets or WebRTC
much more restrictive
than Berkley Sockets
Feels like back in the 90’s,
have to roll our own Audio and
Networking libs AGAIN :(
Too many slides, too little time...
have to settle on OpenGL ES 2 feature set
“it just works!”
...even on mobile: Main problem is
call overhead into
WebGL, but it’s