C++ on the Web (GDCE 2013)
Upcoming SlideShare
Loading in...5

Like this? Share it with your network


C++ on the Web (GDCE 2013)

Uploaded on

My GDCE 2013 presentation about C++ on the web, a more detailed "remix" of the QuoVadis 2013 presentation.

My GDCE 2013 presentation about C++ on the web, a more detailed "remix" of the QuoVadis 2013 presentation.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • toll !
    Are you sure you want to
    Your message goes here
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 22

https://twitter.com 20
https://www.linkedin.com 2

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. C++ on the Web A Tale from the Trenches Andre Weissflog Head of Development, Berlin Bigpoint GmbH GDC Europe 2013
  • 2. What’s this about? • the web as a new target platform for C++ code • differences to traditional platforms • differences between C++/web technologies • porting problems and solutions
  • 3. Demos • Dragons Demo: minimal 3D skinned character demo [show demo] • Map Demo: more advanced 3D demo [show demo] • based on Nebula3 engine, also used in Drakensang Online
  • 4. Why develop for the web HTML5 + WebGL? Create Deploy Play • no walled gardens, no gate-keepers, no certification process • free choice of hosting & payment providers • no installations, no updates, no plugins, no lengthy downloads • multi-platform “for free” • battle-hardened security infrastructure The web is the most open and seamless platform for users and developers.
  • 5. C++ to web technologies Google’s pNaCl Mozilla’s emscripten Adobe’s crossbridge LLVM has opened up a lot of new usage scenarios for C/C++... ...for instance running C/C++ code inside byte codeVMs and other sandboxed environments:
  • 6. Mozilla’s emscripten • OpenSource project, started in 2010 • .cpp → LLVM .bc → .js • extremely active and responsive dev team • lots of wrapper APIs (OpenGL, SDL, GLUT, ...) • limited threading support (no pthreads) Recent Developments: • asm.js (highly optimizable subset of JS) • massive compilation speed improvements • inline Javascript directly into C++
  • 7. Google’s pNaCl • OpenSource project, started in 2008 • .cpp → LLVM .bc → (deploy) x86/x64/ARM • Google Chrome only • safe sandbox for native code execution • full pthreads implementation Recent Developments: • pNaCl finally ready for prime-time • enabled in Chrome v.30 and up • no longer restricted to Chrome Web Store apps
  • 8. • formerly known as Alchemy and flascc • started in 2008, recently open-sourced • .cpp → LLVM .bc → AVM2 byte code • runs in Flash plugin • proprietary 3D API (Stage3D) • incredibly slow and resource hungry build process :/ Adobe’s crossbridge
  • 9. Focus... Will mostly talk about emscripten (and some pNaCl) Why: • emscripten has widest reach (all major browsers) • emscripten progresses incredibly fast • pNaCl currently has edge in threading support • pNaCl and emscripten are actually quite similar from dev perspective But Javascript is slow, isn’t it? asm.js generated code is probably faster than you think, and pNaCl generated code is probably slower than you think (don’t have hard benchmark numbers yet... sorry) IMHO: for 3D games, the real performance gains will come through WebGL extensions, high call-overhead requires extensions to reduce number of GL calls!
  • 10. My [OSX] dev environment • Xcode (for compiling/debugging native OSX and iOS apps) • Eclipse (for emscripten and NaCl specific dev work) • emscripten SDK • NaCl SDK • cmake • a local HTTP server (e.g.“python -m SimpleHTTPServer”)
  • 11. Multiplatform Build System ios.toolchain.cmake osx.toolchain.cmake pnacl.toolchain.cmake emscripten.toolchain.cmake android.toolchain.cmake CMakeLists.txt + make cmake: flexible meta-build-system, generates IDE project files and/or makefiles from generic “CMakeLists” files. cmake cmake toolchain files define platform-specific tools, header/lib search paths and compile options windows.toolchain.cmake pNaCl CMakeLists.txt files define compile targets and their source files
  • 12. Multiplatform Ecosystem 32 BIT + 64 BIT x86, x86_64, ARM OpenGL vs Direct3D? BigEndian no longer matters POSIX + Windows code must be 32/64 bit clean no exceptions no RTTI no STL no Boost Windows still big, everything else is POSIX-ish OpenGL Renaissance, but D3D9 still relevant these make porting to exotic platforms often harder, not easier •WinXP is still incredibly big in Eastern Europe & Asia • 3D feature base-line is OpenGL ES 2.0 w/o extensions (== WebGL) • go fully GL on all platforms? (GL driver quality? Win8 Metro apps? ANGLE?) What keeps me awake at night:
  • 13. N3 Multiplatform Philosophy Platform-specific code lives in its own sub-directories. __POSIX__ __IOS__ __NACL__ __EMSCRIPTEN__ ... Platform-specific pre-processor defines provided by build system. class DisplayCoreBase class NaclDisplayCore class EmscDisplayCore class IOSDisplayCore #if __NACL__ #if __EMSCRIPTEN__ #if __IOS__ #endif #endif #endif class DisplayCore Diamond-shape class hierarchy resolved at compile time:
  • 14. Multiplatform Line Counts Dragons Demo (~170k lines of code) platform-agnostic: 148k POSIX/CRT: 7k OpenGL: 6.7k emscripten: 3k pNaCl: 3.5k OSX/iOS: 2.2k about ~2% platform- specific code
  • 15. Size Comparisons (Dragons Demo) ~170k lines of C++ code OSX (-arch i386 -O3): • orig: 2027 kByte • +no asserts: 1457 kByte • +stripped: 1237 kByte • +gzipped: 413 kByte OSX (-arch x86_64 -O3): • orig: 2134 kByte • +no asserts: 1663 kByte • +stripped: 1427 kByte • +gzipped: 460 kByte iOS (-arch armv7 -O3): • orig: 1542 kByte • +no asserts: 1196 kByte • +stripped: 972 kByte • +gzipped: 395 kByte pNaCl (-O2): • orig: 1654 kByte • +no asserts: 1333 kByte • +stripped: 1333 kByte • +gzipped: 842 kByte emscripten (-O2 --llvm-opts 3 --llvm-lto 3): • orig: 5414 kByte • +no asserts: 2154 kByte • +closure pass: 1951 kByte • +gzipped: 486 kByte wow, surprisingly compact! smaller than expected bigger than expected closure: Google’s JS optimizer/minifier
  • 16. The Callback Problem Key Point to understand (and accept): Browser runtime environment uses callback model for asynchronous programming. Start lengthy operation, provide callback which will be called when operation is finished: becomes very messy very quickly. Games are usually frame-driven, not callback-driven. This is the main riddle when trying to port a game engine to browser platforms.
  • 17. The Game Loop Problem Most event-driven platforms don’t let you “own the game loop”. Instead the application runs completely inside event callback functions which must return quickly. Failing to return quickly results in unresponsive behaviour or even your app being killed. pNaCl
  • 18. The Game Loop Problem Best solution is to use the app main thread exclusively for system event handling... ...and spawn a “Game Thread” which runs the actual game loop. Main Thread Game Thread input events system events quit event display change events Only wakes up on system events. Runs your typical “infinite” game loop.
  • 19. The CallOnMainThread-Problem Some platforms have restrictions what OS functionality is accessible from threads. E.g. must call OpenGL or IO functions from the main thread only. pNaCl Either run everything on main thread, or dispatch “system calls” to run asynchronously on main thread.
  • 20. CallOnMainThread problem All “PPAPI calls” must happen on main thread, and the main thread must never block. pNaCl Threads can push function pointers for deferred execution on main thread. Deferred function calls and result callbacks execute in a simple run-loop after your per-frame callback on the main thread. This primitive runloop/callback model makes it easy to shoot yourself in the foot by waiting for events triggered by your own callbacks.This stops the entire runloop and freezes the app. But: All other threads can block as much as they want, waiting for events triggered by callbacks on the main thread. Nice way to simulate blocking I/O. Conclusion: pNaCl’s full threading support can be used to workaround many of its restrictions by moving the actual game logic into its own thread, and use the main thread only for “system calls” and their result callbacks.
  • 21. CallOnMainThread visualizedpNaCl Init(): launch Game Thread StartIO(): begin async IO, set finish-callback to FinishIO() FinishIO(): set finished-condvar Main Thread put StartIO func ptr on main thread’s run queue and wait for finished-condvar new thread CallOnMainThread(StartIO) your Game Thread finished-condvar is set, continue Game Thread ...blocked... finished-condvar set your pNaCl main-thread code invoke callbacks to pNaCl app code: initialization invoke deferred funcs invoke result callbacks ... pNaCl runtime (runloop/callbacks)
  • 22. Limitations Similar restrictions as pNaCl, but can’t easily use threads to workaround them: • most “interesting functions” (WebGL!) must be called from main thread • main thread must not block • no pthreads, only WebWorkers for threading •WebWorkers have their own “address space” Can’t move entire game loop into WebWorker thread (yet?) Browser vendors working towards more flexible WebWorkers, but HTML5 standardization takes time.
  • 23. Limitation Workarounds All your code must run inside “slices”, always return within 16 or 32 ms to browser. If something takes longer, either spread work over several frames, or move into WebWorker. N3 has new “PhasedApplication” model: app goes through phases, which tick themselves forward when finished. OnInit OnPreloading OnOpening OnRunning OnClosing OnQuit OnFrame emscripten runtime environment max 16ms or 32ms (for 60 or 30 fps)
  • 24. Threading Workarounds Failed approach: Try to wrap low-level threading code in some sort of “co-operative thread scheduling” system. Success: Move abstraction to a higher level (don’t wrap “low level threads”, but wrap “parallel task system”). 2 uses for threading: hide blocking / make use of additional CPU cores. Dispatcher Worker Thread(s) request Nebula3 parallel task system model 3 Flavours: • Blocking: thread sleeps until messages arrives • Timeout: block until messages arrive, or timeout occurs • Run-through: infinite loop doing per-frame work, pull messages emscripten port adds 2 “run modes”: • Parallel: work is pushed to WebWorker threads (makes use of cpu cores) • Sliced: runs on main-thread, work is “triggered” per frame (hides callback mess) response queue
  • 25. Nebula3 IO System IO System HTTP File System App Code IO request IO response with Stream object Closer to HTTP philosophy then fopen()/fclose(): • URLs instead of file system paths • asynchronous IO is default, synchronous is special case • pluggable filesystem handlers associated with URL scheme (http://, file://, ...) • Stream objects with StreamReaders and StreamWriters Local File System http://.., file://... Stream object with file data • Filesystem modules return Stream objects holding downloaded data • Stream objects have typical Read/Seek/... methods • IO reponse is a “Future” object, app code polls whether response has become valid
  • 26. Asset Loading Easy way: emscripten can pre-load assets into memory before app starts, accessible through fopen() / fread() HTTP File System Web Server HTTP request Downside: delay on startup, memory cost - doesn’t work well for big asset sets. Solution: need to stream and uncompress all assets on demand asynchronously HTTP response Problem: HTTP downloads much slower than loading from HDD, can’t block while waiting for download to finish. Uncompress WebWorker App Code IO request IO response HTTP File System has platform-specific implementations: • emscripten: emscripten_async_wget_data() • pNaCl: pp::URLLoader • OSX/iOS: NSURLRequest • Linux / Windows: libCURL • fallback: home-made HTTP client using raw TCP sockets (tricky!)
  • 27. Preloading Phase Loading Screen On Preloading Problem: Sometimes asynchronous loading is too much hassle, or even impossible (for instance when using 3rd party libs). Solution: Have pre-loading app phases, show loading screen, download and pin files into a memory filesystem, continue to next app phase when files have finished downloading. Synchronous IO functions exclusively access data in memory filesystem, fail if file hasn’t been preloaded. Running Loading Screen Off Loading Screen On Preloading Loading Screen Off Running Memory File System fread() fread() populate populate Web Server HTTP Only use this approach when absolutely necessary and only for small files, not for textures, geometry, audio, etc...
  • 28. Debugging None of the C++ web solutions have really good interactive debugging support (yet). Develop and debug your app mainly as a native desktop app for OSX or Windows inside XCode orVStudio, this gives the best turn- around time and “debugging experience” Only fall-back to low-level debugging for platform-specific code. emscripten debugging can be surprisingly easy: • generated Javascript can be made very readable (see -g options in emcc) • can inject debugging statements without recompiling • see emscripten/src/settings.js for some interesting runtime debug options
  • 29. JS Debugging with Source Maps emcc -g4 generates source maps containing reference data to the original C++ sources. Interactively debug C++ code in the browser! (still feels very rough around the edges though)
  • 30. Too many slides, too little time... Other interesting problem areas: Audio Networking WebAudio vs Audio tag no common compressed audio format across browsers WebSockets or WebRTC much more restrictive than Berkley Sockets (security reasons) Feels like back in the 90’s, have to roll our own Audio and Networking libs AGAIN :(
  • 31. Too many slides, too little time... OpenGL have to settle on OpenGL ES 2 feature set “it just works!” ...even on mobile: Main problem is call overhead into WebGL, but it’s still surprisingly fast.
  • 32. Questions? Resources http://flohofwoe.blogspot.com http://www.flohofwoe.net/demos.html