SlideShare a Scribd company logo
1 of 54
Public version 10 Parallel Futures of a Game Engine Johan Andersson Rendering Architect, DICE
Background DICE Stockholm, Sweden ~250 employees Part of Electronic Arts Battlefield & Mirror’s Edge game series Frostbite Proprietary game engine used at DICE & EA Developed by DICE over the last 5 years 2
http://badcompany2.ea.com/ 3
http://badcompany2.ea.com/ 4
Outline Game engine 101 Current parallelism Futures Q&A 5
Game engine 101 6
Game development 2 year development cycle New IP often takes much longer, 3-5 years Engine is continuously in development & used AAA teams of 70-90 people 50% artists 30% designers 20% programmers 10% audio Budgets $20-40 million Cross-platform development is market reality Xbox 360 and PlayStation 3 PC DX10 and DX11 (and sometimes Mac) Current consoles will stay with us for many more years 7
Game engine requirements (1/2) Stable real-time performance Frame-driven updates, 30 fps  Few threads, instead per-frame jobs/tasks for everything  Predictable memory usage Fixed budgets for systems & content, fail if over Avoid runtime allocations Love unified memory!  Cross-platform The consoles determines our base tech level & focus PS3 is design target, most difficult and good potential Scale up for PC, dual core is min spec (slow!) 8
Game engine requirements (2/2) Full system profiling/debugging Engine is a vertical solution, touches everywhere PIX, xbtracedump, SN Tuner, ETW, GPUView Quick iterations Essential in order to be creative Fast building & fast loading, hot-swapping resources Affects both the tools and the game Middleware Use when it make senses, cross-platform & optimized Parallelism have to go through our systems 9
Current parallelism 10
Levels of code in Frostbite Editor (C#) Pipeline (C++) Game code (C++) System CPU-jobs (C++) System SPU-jobs (C++/asm) Generated shaders (HLSL) Compute kernels (HLSL) Offline CPU Runtime GPU 11
Levels of code in Frostbite Editor (C#) Pipeline (C++) Game code (C++) System CPU-jobs (C++) System SPU-jobs (C++/asm) Generated shaders (HLSL) Compute kernels (HLSL) Offline CPU Runtime GPU 12
Editor & Pipeline Editor (”FrostEd 2”) WYSIWYG editor for content C#, Windows only Basic threading / tasks Pipeline Offline/background data-processing & conversion C++, some MC++, Windows only Typically IO-bound A few compute-heavy steps use CPU-jobs Texture compression uses CUDA, would prefer OpenCL or CS Lighting pre-calculation using IncrediBuild over 100+ machines CPU parallelism models are generally not a problem here 13
Levels of code in Frostbite Editor (C#) Pipeline (C++) Game code (C++) System CPU-jobs (C++) System SPU-jobs (C++/asm) Generated shaders (HLSL) Compute kernels (HLSL) Offline CPU Runtime GPU 14
General ”game code” (1/2) This is the majority of our 1.5 million lines of C++ Runs on Win32, Win64, Xbox 360 and PS3 Similar to general application code Huge amount of code & logic to maintain + continue to develop Low compute density ”Glue code” Scattered in memory (pointer chasing) Difficult to efficiently parallelize Out-of-order execution is a big help, but consoles are in-order  Key to be able to quickly iterate & change This is the actual game logic & glue that builds the game C++ not ideal, but has the invested infrastructure 15
General ”game code” (2/2) PS3 is one of the main challenges Standard CPU parallelization doesn’t help CELL only has 2 HW threads on the PPU Split the code in 2: game code & system code Game logic, policy and glue code only on CPU ”If it runs well on the PS3 PPU, it runs well everywhere” Lower-level systems on PS3 SPUs Main goals going forward: Simplify & structure code base Reduce coupling with lower-level systems Increase in task parallelism for PC CELL processor 16
Levels of code in Frostbite Editor (C#) Pipeline (C++) Game code (C++) System CPU-jobs (C++) System SPU-jobs (C++/asm) Generated shaders (HLSL) Compute kernels (HLSL) Offline CPU Runtime GPU 17
Job-based parallelism Essential to utilize the cores on our target platforms Xbox 360: 6 HW threads PlayStation 3: 2 HW threads + 6 powerful SPUs PC: 2-16 HW threads (Nehalem HT is great!)  Divide up system work into Jobs (a.k.a. Tasks) 15-200k C++ code each. 25k is common Can depend on each other (if needed) Dependencies create job graph All HW threads consume jobs  ~200-300 / frame 18
What is a Job for us? An asynchronous function call Function ptr + 4 uintptr_t parameters Cross-platform scheduler: EA JobManager Often uses work stealing 2 types of Jobs in Frostbite: CPU job (good) General code moved into job instead of threads SPU job (great!) Stateless pure functions, no side effects Data-oriented, explicit memory DMA to local store Designed to run on the PS3 SPUs = also very fast on in-order CPU Can hot-swap quick iterations  19
struct FB_ALIGN(16) EntityRenderCullJobData { 	enum 	{ 		MaxSphereTreeCount = 2, 		MaxStaticCullTreeCount = 2 	}; 	uint sphereTreeCount; 	const SphereNode* sphereTrees[MaxSphereTreeCount]; 	u8 viewCount; 	u8 frustumCount; 	u8 viewIntersectFlags[32];  	Frustum frustums[32]; 	.... (cut out 2/3 of struct for display size) 	u32 maxOutEntityCount; 	// Output data, pre-allocated by callee 	u32 outEntityCount; 	EntityRenderCullInfo* outEntities; }; void entityRenderCullJob(EntityRenderCullJobData* data); void validate(const EntityRenderCullJobData& data); Frustum culling of dynamic entities in sphere tree EntityRenderCull job example 20
struct FB_ALIGN(16) EntityRenderCullJobData { 	enum 	{ 		MaxSphereTreeCount = 2, 		MaxStaticCullTreeCount = 2 	}; 	uint sphereTreeCount; 	const SphereNode* sphereTrees[MaxSphereTreeCount]; 	u8 viewCount; 	u8 frustumCount; 	u8 viewIntersectFlags[32];  	Frustum frustums[32]; 	.... (cut out 2/3 of struct for display size) 	u32 maxOutEntityCount; 	// Output data, pre-allocated by callee 	u32 outEntityCount; 	EntityRenderCullInfo* outEntities; }; void entityRenderCullJob(EntityRenderCullJobData* data); void validate(const EntityRenderCullJobData& data); Frustum culling of dynamic entities in sphere tree struct contains all input data needed EntityRenderCull job example 21
struct FB_ALIGN(16) EntityRenderCullJobData { 	enum 	{ 		MaxSphereTreeCount = 2, 		MaxStaticCullTreeCount = 2 	}; 	uint sphereTreeCount; 	const SphereNode* sphereTrees[MaxSphereTreeCount]; 	u8 viewCount; 	u8 frustumCount; 	u8 viewIntersectFlags[32];  	Frustum frustums[32]; 	.... (cut out 2/3 of struct for display size) 	u32 maxOutEntityCount; 	// Output data, pre-allocated by callee 	u32 outEntityCount; 	EntityRenderCullInfo* outEntities; }; void entityRenderCullJob(EntityRenderCullJobData* data); void validate(const EntityRenderCullJobData& data); Frustum culling of dynamic entities in sphere tree struct contains all input data needed Max output data pre-allocated by callee EntityRenderCull job example 22
struct FB_ALIGN(16) EntityRenderCullJobData { 	enum 	{ 		MaxSphereTreeCount = 2, 		MaxStaticCullTreeCount = 2 	}; 	uint sphereTreeCount; 	const SphereNode* sphereTrees[MaxSphereTreeCount]; 	u8 viewCount; 	u8 frustumCount; 	u8 viewIntersectFlags[32];  	Frustum frustums[32]; 	.... (cut out 2/3 of struct for display size) 	u32 maxOutEntityCount; 	// Output data, pre-allocated by callee 	u32 outEntityCount; 	EntityRenderCullInfo* outEntities; }; void entityRenderCullJob(EntityRenderCullJobData* data); void validate(const EntityRenderCullJobData& data); Frustum culling of dynamic entities in sphere tree struct contains all input data needed Max output data pre-allocated by callee Single job function Compile both as CPU & SPU job EntityRenderCull job example 23
struct FB_ALIGN(16) EntityRenderCullJobData { 	enum 	{ 		MaxSphereTreeCount = 2, 		MaxStaticCullTreeCount = 2 	}; 	uint sphereTreeCount; 	const SphereNode* sphereTrees[MaxSphereTreeCount]; 	u8 viewCount; 	u8 frustumCount; 	u8 viewIntersectFlags[32];  	Frustum frustums[32]; 	.... (cut out 2/3 of struct for display size) 	u32 maxOutEntityCount; 	// Output data, pre-allocated by callee 	u32 outEntityCount; 	EntityRenderCullInfo* outEntities; }; void entityRenderCullJob(EntityRenderCullJobData* data); void validate(const EntityRenderCullJobData& data); Frustum culling of dynamic entities in sphere tree struct contains all input data needed Max output data pre-allocated by callee Single job function Compile both as CPU & SPU job Optional struct validation func EntityRenderCull job example 24
EntityRenderCull SPU setup // local store variables EntityRenderCullJobData g_jobData; float g_zBuffer[256*114]; u16 g_terrainHeightData[64*64]; int main(uintptr_t dataEa, uintptr_t, uintptr_t, uintptr_t) {     dmaBlockGet("jobData", &g_jobData, dataEa, sizeof(g_jobData));     validate(g_jobData);     if (g_jobData.zBufferTestEnable)     {         dmaAsyncGet("zBuffer", g_zBuffer, g_jobData.zBuffer, g_jobData.zBufferResX*g_jobData.zBufferResY*4);         g_jobData.zBuffer = g_zBuffer;         if (g_jobData.zBufferShadowTestEnable && g_jobData.terrainHeightData)         {             dmaAsyncGet("terrainHeight", g_terrainHeightData, g_jobData.terrainHeightData, g_jobData.terrainHeightDataSize);             g_jobData.terrainHeightData = g_terrainHeightData;         }         dmaWaitAll(); // block on both DMAs     }     // run the actual job, will internally do streaming DMAs to the output entity list     entityRenderCullJob(&g_jobData);      // put back the data because we changed outEntityCount     dmaBlockPut(dataEa, &g_jobData, sizeof(g_jobData));      return 0; } 25
Frostbite CPU job graph Build big job graphs: ,[object Object]
Mix CPU- & SPU-jobs
Future: Mix in low-latency GPU-jobsJob dependencies determine: ,[object Object]
Sync points
Load balancing
i.e. the effective parallelismIntermixed task- & data-parallelism ,[object Object]
aka Nested Data-Parallelism
aka Tasks and Kernels26
Data-parallel jobs 27
Task-parallel algorithms & coordination 28
Timing view Example: PC, 4 CPU cores, 2 GPUs in AFR (AMD Radeon 4870x2) Real-time in-game overlay   See timing events & effective parallelism On CPU, SPU & GPU – for all platforms Use to reduce sync-points & optimize load balancing GPU timing through DX event queries Our main performance tool! 29
Rendering jobs Rendering systems are heavily divided up into CPU- & SPU-jobs Jobs: Terrain geometry [3] Undergrowth generation [2] Decal projection [4] Particle simulation Frustum culling Occlusion culling Occlusion rasterization Command buffer generation [6] PS3: Triangle culling [6] Most will move to GPU Eventually.. A few have already! Latency wall, more power and GPU memory access Mostly one-way data flow 30
Occlusion culling job example Problem: Buildings & env occlude large amounts of objects Obscured objects still have to: Update logic & animations Generate command buffer Processed on CPU & GPU = expensive & wasteful  Difficult to implement full culling: Destructible buildings Dynamic occludees Difficult to precompute  From Battlefield: Bad Company PS3 31
Solution: Software occlusion culling Rasterize coarse zbuffer on SPU/CPU 256x114 float Low-poly occluder meshes 100 m view distance Max 10000 vertices/frame Parallel vertex & raster SPU-jobs Cost: a few milliseconds Cull all objects against zbuffer Screen-space bounding-box test Before passed to all other systems Big performance savings! 32
GPU occlusion culling Ideally want to use the GPU, but current APIs are limited: Occlusion queries introduces overhead & latency Conditional rendering only helps GPU Compute Shader impl. possible, but same latency wall Future 1: Low-latency GPU execution context Rasterization and testing done on GPU where it belongs Lockstep with CPU, need to read back within a few ms Possible on Larrabee, want standard on PC  Potential WDDM issue Future 2: Move entire cull & rendering to ”GPU” World, cull, systems, dispatch. End goal 33
Levels of code in Frostbite Editor (C#) Pipeline (C++) Game code (C++) System CPU-jobs (C++) System SPU-jobs (C++/asm) Generated shaders (HLSL) Compute kernels (HLSL) Offline CPU Runtime GPU 34
Shader types Generated shaders [1] Graph-based surface shaders Treated as content, not code Artist created Generates HLSL code Used by all meshes and 3d surfaces Graphics / Compute kernels Hand-coded & optimized HLSL Statically linked in with C++ Pixel- & compute-shaders Lighting, post-processing & special effects Graph-based surface shader in FrostEd 2 35
Futures 36
3 major challenges/goals going forward: How do we make it easier to develop, maintain & parallelize general game code? What do we need to continue to innovate & scale up real-time computational graphics? How can we move & scale up advanced simulation and non-graphics tasks to data-parallel manycore processors? Challenges 37
3 major challenges/goals going forward: How do we make it easier to develop, maintain & parallelize general game code? What do we need to continue to innovate & scale up real-time computational graphics? How can we move & scale up advanced simulation and non-graphics tasks to data-parallel manycore processors? Challenges Most likely the same solution(s)! 38
Challenge 1 “How do we make it easier to develop, maintain & parallelize general game code?” Shared State Concurrency is a killer Not a big believer in Software Transactional Memory either  Because of performance and too ”optimistic” flow A more strict & adapted C++ model Support for true immutable & r/w-only memory access Per-thread/task memory access opt-in To reduce the possibility for side effects in parallel code As much compile-time validation as possible Micro-threads / coroutines as first class citizens More? (we are used to not having much, for us, practical innovation here) Other languages? 39
Challenge 1 - Task parallelism Multiple task libraries EA JobManager Current solution, designed primarily within SPU-job limitations MS ConcRT, Apple GCD, Intel TBB All has some good parts! Neither works on all of our platforms, key requirement OpenMP  We don’t use it. Tiny band aid, doesn’t satisfy our control needs Need C++ enhancements to simplify usage C++ 0x lambdas / GCD blocks  Glacial C++ development & deployment  Want on all platforms, so lost on this console generation Moving away from semi-static job graphs  Instead more dynamic on-demand job graphs 40
Challenge 2 - Definition Goal: ”Real-time interactive graphics & simulation at a Pixar level of quality” Needed visual features: Global indirect lighting & reflections Complete anti-aliasing (frame buffers & shader) Sub-pixel geometry OIT Huge improvements in character animation These require massively more compute, BW and improved model! (animation can’t be solved with just more/better compute, so pretend it doesn’t exist for now) 41
Challenge 2 - Problems Problems & limitations with current model: MSAA sample storage doesn’t scale to 16x+ Esp. with HDR & deferred shading GPU is handicapped by being spoon-fed by CPU Irregular workloads are difficult / inefficient Current HLSL is a limited language & model 42
Challenge 2 - Solutions Sounds like a job for a high-throughput oriented massive data-parallel processor With a highly flexible programming model The CPU, as we know it, and its APIs are only in the way Pure software solution not practical as next step after DX11 PC 1) Multi-vendor & multi-architecture marketplace Skeptical we will reach a multi-vendor standard ISA within 3+ years Future consoles on the other hand, this would be preferred And would love to be proven wrong by the IHVs! Want a rich high-level compute model as next step Efficiently target both SW- & HW-pipeline architectures Even if we had 100% SW solution, to simplify development 1) Depending on the time frame 43
”Pipelined Compute Shaders” Queues as streaming I/O between compute kernels Simple & expressive model supporting irregular workloads Keeps data on chip, supports variable sized caches & cores Can target multiple types of HW & architectures Hybrid graphics/compute user-defined pipelines Language/API defining fixed stages inputs & outputs Pipelines can feed other pipelines (similar to DrawIndirect) Reyes-style Rendering with Ray Tracing Shade Split Raster Sub-D Prims Tess Frame Buffer Trace 44
”Pipelined Compute Shaders” Wanted for next DirectX and OpenCL/OpenGL As a standard, as soon as possible My main request/wish! Run on all: GPU, manycore and CPU IHV-specific solutions can be good start for R&D Model is also a good fit for many of our CPU/SPU jobs Parts of job graph can be seen as queues between stages Easier to write kernels/jobs with streaming I/O  Instead of explicit fixed-buffers and ”memory passes” Or dynamic memory allocation 45
Language? Language for this model is a big question But the concepts & infrastructure are what is important! Could be an extendedHLSL or ”data-parallel C++” Data-oriented imperative language (i.e. not standard C++) Think HLSL would probably be easier & the most explicit Amount of code is small and written from scratch SIMT-style implicit vectorization is preferred over explicit vectorization Easier to target multiple evolving architectures implicitly Our CPU code is still stuck at SSE2  46
Language (cont.) Requirements: Full rich debugging, ideally in Visual Studio Asserts Internal kernel profiling Hot-swapping / edit-and-continue of kernels Opportunity for IHVs and platform providers to innovate here! Try to aim for an eventual cross-vendor standard Think of the co-development of Nvidia Cg and HLSL 47

More Related Content

What's hot

Taking Killzone Shadow Fall Image Quality Into The Next Generation
Taking Killzone Shadow Fall Image Quality Into The Next GenerationTaking Killzone Shadow Fall Image Quality Into The Next Generation
Taking Killzone Shadow Fall Image Quality Into The Next GenerationGuerrilla
 
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3Electronic Arts / DICE
 
Physically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in FrostbitePhysically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in FrostbiteElectronic Arts / DICE
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonAMD Developer Central
 
Graphics Gems from CryENGINE 3 (Siggraph 2013)
Graphics Gems from CryENGINE 3 (Siggraph 2013)Graphics Gems from CryENGINE 3 (Siggraph 2013)
Graphics Gems from CryENGINE 3 (Siggraph 2013)Tiago Sousa
 
Moving Frostbite to Physically Based Rendering
Moving Frostbite to Physically Based RenderingMoving Frostbite to Physically Based Rendering
Moving Frostbite to Physically Based RenderingElectronic Arts / DICE
 
Rendering AAA-Quality Characters of Project A1
Rendering AAA-Quality Characters of Project A1Rendering AAA-Quality Characters of Project A1
Rendering AAA-Quality Characters of Project A1Ki Hyunwoo
 
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunFive Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunElectronic Arts / DICE
 
Khronos Munich 2018 - Halcyon and Vulkan
Khronos Munich 2018 - Halcyon and VulkanKhronos Munich 2018 - Halcyon and Vulkan
Khronos Munich 2018 - Halcyon and VulkanElectronic Arts / DICE
 
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect AndromedaElectronic Arts / DICE
 
Developing and optimizing a procedural game: The Elder Scrolls Blades- Unite ...
Developing and optimizing a procedural game: The Elder Scrolls Blades- Unite ...Developing and optimizing a procedural game: The Elder Scrolls Blades- Unite ...
Developing and optimizing a procedural game: The Elder Scrolls Blades- Unite ...Unity Technologies
 
Destruction Masking in Frostbite 2 using Volume Distance Fields
Destruction Masking in Frostbite 2 using Volume Distance FieldsDestruction Masking in Frostbite 2 using Volume Distance Fields
Destruction Masking in Frostbite 2 using Volume Distance FieldsElectronic Arts / DICE
 
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)Philip Hammer
 
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal FilteringStable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal FilteringElectronic Arts / DICE
 
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)Johan Andersson
 
Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)Tiago Sousa
 
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)Johan Andersson
 

What's hot (20)

DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3
 
Taking Killzone Shadow Fall Image Quality Into The Next Generation
Taking Killzone Shadow Fall Image Quality Into The Next GenerationTaking Killzone Shadow Fall Image Quality Into The Next Generation
Taking Killzone Shadow Fall Image Quality Into The Next Generation
 
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
 
Shiny PC Graphics in Battlefield 3
Shiny PC Graphics in Battlefield 3Shiny PC Graphics in Battlefield 3
Shiny PC Graphics in Battlefield 3
 
Physically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in FrostbitePhysically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in Frostbite
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
 
Graphics Gems from CryENGINE 3 (Siggraph 2013)
Graphics Gems from CryENGINE 3 (Siggraph 2013)Graphics Gems from CryENGINE 3 (Siggraph 2013)
Graphics Gems from CryENGINE 3 (Siggraph 2013)
 
Moving Frostbite to Physically Based Rendering
Moving Frostbite to Physically Based RenderingMoving Frostbite to Physically Based Rendering
Moving Frostbite to Physically Based Rendering
 
Rendering AAA-Quality Characters of Project A1
Rendering AAA-Quality Characters of Project A1Rendering AAA-Quality Characters of Project A1
Rendering AAA-Quality Characters of Project A1
 
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunFive Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
 
Khronos Munich 2018 - Halcyon and Vulkan
Khronos Munich 2018 - Halcyon and VulkanKhronos Munich 2018 - Halcyon and Vulkan
Khronos Munich 2018 - Halcyon and Vulkan
 
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
 
Lighting you up in Battlefield 3
Lighting you up in Battlefield 3Lighting you up in Battlefield 3
Lighting you up in Battlefield 3
 
Developing and optimizing a procedural game: The Elder Scrolls Blades- Unite ...
Developing and optimizing a procedural game: The Elder Scrolls Blades- Unite ...Developing and optimizing a procedural game: The Elder Scrolls Blades- Unite ...
Developing and optimizing a procedural game: The Elder Scrolls Blades- Unite ...
 
Destruction Masking in Frostbite 2 using Volume Distance Fields
Destruction Masking in Frostbite 2 using Volume Distance FieldsDestruction Masking in Frostbite 2 using Volume Distance Fields
Destruction Masking in Frostbite 2 using Volume Distance Fields
 
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
 
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal FilteringStable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
 
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
 
Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)
 
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
 

Viewers also liked

The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...Johan Andersson
 
The Rendering Pipeline - Challenges & Next Steps
The Rendering Pipeline - Challenges & Next StepsThe Rendering Pipeline - Challenges & Next Steps
The Rendering Pipeline - Challenges & Next StepsJohan Andersson
 
High Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in FrostbiteHigh Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in FrostbiteElectronic Arts / DICE
 
Photogrammetry and Star Wars Battlefront
Photogrammetry and Star Wars BattlefrontPhotogrammetry and Star Wars Battlefront
Photogrammetry and Star Wars BattlefrontElectronic Arts / DICE
 
Executable Bloat - How it happens and how we can fight it
Executable Bloat - How it happens and how we can fight itExecutable Bloat - How it happens and how we can fight it
Executable Bloat - How it happens and how we can fight itElectronic Arts / DICE
 
5 Major Challenges in Real-time Rendering (2012)
5 Major Challenges in Real-time Rendering (2012)5 Major Challenges in Real-time Rendering (2012)
5 Major Challenges in Real-time Rendering (2012)Electronic Arts / DICE
 
Audio for Multiplayer & Beyond - Mixing Case Studies From Battlefield: Bad Co...
Audio for Multiplayer & Beyond - Mixing Case Studies From Battlefield: Bad Co...Audio for Multiplayer & Beyond - Mixing Case Studies From Battlefield: Bad Co...
Audio for Multiplayer & Beyond - Mixing Case Studies From Battlefield: Bad Co...Electronic Arts / DICE
 
5 Major Challenges in Interactive Rendering
5 Major Challenges in Interactive Rendering5 Major Challenges in Interactive Rendering
5 Major Challenges in Interactive RenderingElectronic Arts / DICE
 
Shadows & Decals: D3D10 Techniques in Frostbite (GDC'09)
 	 Shadows & Decals: D3D10 Techniques in Frostbite (GDC'09) 	 Shadows & Decals: D3D10 Techniques in Frostbite (GDC'09)
Shadows & Decals: D3D10 Techniques in Frostbite (GDC'09)Johan Andersson
 
How High Dynamic Range Audio Makes Battlefield: Bad Company Go BOOM
How High Dynamic Range Audio Makes Battlefield: Bad Company Go BOOMHow High Dynamic Range Audio Makes Battlefield: Bad Company Go BOOM
How High Dynamic Range Audio Makes Battlefield: Bad Company Go BOOMAnders Clerwall
 

Viewers also liked (16)

The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
 
Bending the Graphics Pipeline
Bending the Graphics PipelineBending the Graphics Pipeline
Bending the Graphics Pipeline
 
The Rendering Pipeline - Challenges & Next Steps
The Rendering Pipeline - Challenges & Next StepsThe Rendering Pipeline - Challenges & Next Steps
The Rendering Pipeline - Challenges & Next Steps
 
Lighting the City of Glass
Lighting the City of GlassLighting the City of Glass
Lighting the City of Glass
 
High Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in FrostbiteHigh Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in Frostbite
 
Photogrammetry and Star Wars Battlefront
Photogrammetry and Star Wars BattlefrontPhotogrammetry and Star Wars Battlefront
Photogrammetry and Star Wars Battlefront
 
Executable Bloat - How it happens and how we can fight it
Executable Bloat - How it happens and how we can fight itExecutable Bloat - How it happens and how we can fight it
Executable Bloat - How it happens and how we can fight it
 
A Real-time Radiosity Architecture
A Real-time Radiosity ArchitectureA Real-time Radiosity Architecture
A Real-time Radiosity Architecture
 
5 Major Challenges in Real-time Rendering (2012)
5 Major Challenges in Real-time Rendering (2012)5 Major Challenges in Real-time Rendering (2012)
5 Major Challenges in Real-time Rendering (2012)
 
Rendering Battlefield 4 with Mantle
Rendering Battlefield 4 with MantleRendering Battlefield 4 with Mantle
Rendering Battlefield 4 with Mantle
 
Audio for Multiplayer & Beyond - Mixing Case Studies From Battlefield: Bad Co...
Audio for Multiplayer & Beyond - Mixing Case Studies From Battlefield: Bad Co...Audio for Multiplayer & Beyond - Mixing Case Studies From Battlefield: Bad Co...
Audio for Multiplayer & Beyond - Mixing Case Studies From Battlefield: Bad Co...
 
5 Major Challenges in Interactive Rendering
5 Major Challenges in Interactive Rendering5 Major Challenges in Interactive Rendering
5 Major Challenges in Interactive Rendering
 
Shadows & Decals: D3D10 Techniques in Frostbite (GDC'09)
 	 Shadows & Decals: D3D10 Techniques in Frostbite (GDC'09) 	 Shadows & Decals: D3D10 Techniques in Frostbite (GDC'09)
Shadows & Decals: D3D10 Techniques in Frostbite (GDC'09)
 
How High Dynamic Range Audio Makes Battlefield: Bad Company Go BOOM
How High Dynamic Range Audio Makes Battlefield: Bad Company Go BOOMHow High Dynamic Range Audio Makes Battlefield: Bad Company Go BOOM
How High Dynamic Range Audio Makes Battlefield: Bad Company Go BOOM
 
Mantle for Developers
Mantle for DevelopersMantle for Developers
Mantle for Developers
 
Stochastic Screen-Space Reflections
Stochastic Screen-Space ReflectionsStochastic Screen-Space Reflections
Stochastic Screen-Space Reflections
 

Similar to Parallel Futures of a Game Engine

Static analysis of C++ source code
Static analysis of C++ source codeStatic analysis of C++ source code
Static analysis of C++ source codeAndrey Karpov
 
Static analysis of C++ source code
Static analysis of C++ source codeStatic analysis of C++ source code
Static analysis of C++ source codePVS-Studio
 
Tema3_Introduction_to_CUDA_C.pdf
Tema3_Introduction_to_CUDA_C.pdfTema3_Introduction_to_CUDA_C.pdf
Tema3_Introduction_to_CUDA_C.pdfpepe464163
 
BitSquid Tech: Benefits of a data-driven renderer
BitSquid Tech: Benefits of a data-driven rendererBitSquid Tech: Benefits of a data-driven renderer
BitSquid Tech: Benefits of a data-driven renderertobias_persson
 
Introduction to CUDA
Introduction to CUDAIntroduction to CUDA
Introduction to CUDARaymond Tay
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Databricks
 
Track c-High speed transaction-based hw-sw coverification -eve
Track c-High speed transaction-based hw-sw coverification -eveTrack c-High speed transaction-based hw-sw coverification -eve
Track c-High speed transaction-based hw-sw coverification -evechiportal
 
TestUpload
TestUploadTestUpload
TestUploadZarksaDS
 
A 3D printing programming API
A 3D printing programming APIA 3D printing programming API
A 3D printing programming APIMax Kleiner
 
Skiron - Experiments in CPU Design in D
Skiron - Experiments in CPU Design in DSkiron - Experiments in CPU Design in D
Skiron - Experiments in CPU Design in DMithun Hunsur
 
PVS-Studio 5.00, a solution for developers of modern resource-intensive appl...
PVS-Studio 5.00, a solution for developers of modern resource-intensive appl...PVS-Studio 5.00, a solution for developers of modern resource-intensive appl...
PVS-Studio 5.00, a solution for developers of modern resource-intensive appl...Andrey Karpov
 
PVS-Studio, a solution for resource intensive applications development
PVS-Studio, a solution for resource intensive applications developmentPVS-Studio, a solution for resource intensive applications development
PVS-Studio, a solution for resource intensive applications developmentOOO "Program Verification Systems"
 
20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storageKohei KaiGai
 
Optimizing unity games (Google IO 2014)
Optimizing unity games (Google IO 2014)Optimizing unity games (Google IO 2014)
Optimizing unity games (Google IO 2014)Alexander Dolbilov
 
Developing a Windows CE OAL.ppt
Developing a Windows CE OAL.pptDeveloping a Windows CE OAL.ppt
Developing a Windows CE OAL.pptKundanSingh887495
 
BKK16-211 Internet of Tiny Linux (io tl)- Status and Progress
BKK16-211 Internet of Tiny Linux (io tl)- Status and ProgressBKK16-211 Internet of Tiny Linux (io tl)- Status and Progress
BKK16-211 Internet of Tiny Linux (io tl)- Status and ProgressLinaro
 
GPU Programming on CPU - Using C++AMP
GPU Programming on CPU - Using C++AMPGPU Programming on CPU - Using C++AMP
GPU Programming on CPU - Using C++AMPMiller Lee
 
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarExploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarSpark Summit
 

Similar to Parallel Futures of a Game Engine (20)

Static analysis of C++ source code
Static analysis of C++ source codeStatic analysis of C++ source code
Static analysis of C++ source code
 
Static analysis of C++ source code
Static analysis of C++ source codeStatic analysis of C++ source code
Static analysis of C++ source code
 
Tema3_Introduction_to_CUDA_C.pdf
Tema3_Introduction_to_CUDA_C.pdfTema3_Introduction_to_CUDA_C.pdf
Tema3_Introduction_to_CUDA_C.pdf
 
BitSquid Tech: Benefits of a data-driven renderer
BitSquid Tech: Benefits of a data-driven rendererBitSquid Tech: Benefits of a data-driven renderer
BitSquid Tech: Benefits of a data-driven renderer
 
Quiz 9
Quiz 9Quiz 9
Quiz 9
 
Programar para GPUs
Programar para GPUsProgramar para GPUs
Programar para GPUs
 
Introduction to CUDA
Introduction to CUDAIntroduction to CUDA
Introduction to CUDA
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
 
Track c-High speed transaction-based hw-sw coverification -eve
Track c-High speed transaction-based hw-sw coverification -eveTrack c-High speed transaction-based hw-sw coverification -eve
Track c-High speed transaction-based hw-sw coverification -eve
 
TestUpload
TestUploadTestUpload
TestUpload
 
A 3D printing programming API
A 3D printing programming APIA 3D printing programming API
A 3D printing programming API
 
Skiron - Experiments in CPU Design in D
Skiron - Experiments in CPU Design in DSkiron - Experiments in CPU Design in D
Skiron - Experiments in CPU Design in D
 
PVS-Studio 5.00, a solution for developers of modern resource-intensive appl...
PVS-Studio 5.00, a solution for developers of modern resource-intensive appl...PVS-Studio 5.00, a solution for developers of modern resource-intensive appl...
PVS-Studio 5.00, a solution for developers of modern resource-intensive appl...
 
PVS-Studio, a solution for resource intensive applications development
PVS-Studio, a solution for resource intensive applications developmentPVS-Studio, a solution for resource intensive applications development
PVS-Studio, a solution for resource intensive applications development
 
20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage
 
Optimizing unity games (Google IO 2014)
Optimizing unity games (Google IO 2014)Optimizing unity games (Google IO 2014)
Optimizing unity games (Google IO 2014)
 
Developing a Windows CE OAL.ppt
Developing a Windows CE OAL.pptDeveloping a Windows CE OAL.ppt
Developing a Windows CE OAL.ppt
 
BKK16-211 Internet of Tiny Linux (io tl)- Status and Progress
BKK16-211 Internet of Tiny Linux (io tl)- Status and ProgressBKK16-211 Internet of Tiny Linux (io tl)- Status and Progress
BKK16-211 Internet of Tiny Linux (io tl)- Status and Progress
 
GPU Programming on CPU - Using C++AMP
GPU Programming on CPU - Using C++AMPGPU Programming on CPU - Using C++AMP
GPU Programming on CPU - Using C++AMP
 
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarExploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
 

Recently uploaded

UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 

Parallel Futures of a Game Engine

  • 1. Public version 10 Parallel Futures of a Game Engine Johan Andersson Rendering Architect, DICE
  • 2. Background DICE Stockholm, Sweden ~250 employees Part of Electronic Arts Battlefield & Mirror’s Edge game series Frostbite Proprietary game engine used at DICE & EA Developed by DICE over the last 5 years 2
  • 5. Outline Game engine 101 Current parallelism Futures Q&A 5
  • 7. Game development 2 year development cycle New IP often takes much longer, 3-5 years Engine is continuously in development & used AAA teams of 70-90 people 50% artists 30% designers 20% programmers 10% audio Budgets $20-40 million Cross-platform development is market reality Xbox 360 and PlayStation 3 PC DX10 and DX11 (and sometimes Mac) Current consoles will stay with us for many more years 7
  • 8. Game engine requirements (1/2) Stable real-time performance Frame-driven updates, 30 fps Few threads, instead per-frame jobs/tasks for everything Predictable memory usage Fixed budgets for systems & content, fail if over Avoid runtime allocations Love unified memory! Cross-platform The consoles determines our base tech level & focus PS3 is design target, most difficult and good potential Scale up for PC, dual core is min spec (slow!) 8
  • 9. Game engine requirements (2/2) Full system profiling/debugging Engine is a vertical solution, touches everywhere PIX, xbtracedump, SN Tuner, ETW, GPUView Quick iterations Essential in order to be creative Fast building & fast loading, hot-swapping resources Affects both the tools and the game Middleware Use when it make senses, cross-platform & optimized Parallelism have to go through our systems 9
  • 11. Levels of code in Frostbite Editor (C#) Pipeline (C++) Game code (C++) System CPU-jobs (C++) System SPU-jobs (C++/asm) Generated shaders (HLSL) Compute kernels (HLSL) Offline CPU Runtime GPU 11
  • 12. Levels of code in Frostbite Editor (C#) Pipeline (C++) Game code (C++) System CPU-jobs (C++) System SPU-jobs (C++/asm) Generated shaders (HLSL) Compute kernels (HLSL) Offline CPU Runtime GPU 12
  • 13. Editor & Pipeline Editor (”FrostEd 2”) WYSIWYG editor for content C#, Windows only Basic threading / tasks Pipeline Offline/background data-processing & conversion C++, some MC++, Windows only Typically IO-bound A few compute-heavy steps use CPU-jobs Texture compression uses CUDA, would prefer OpenCL or CS Lighting pre-calculation using IncrediBuild over 100+ machines CPU parallelism models are generally not a problem here 13
  • 14. Levels of code in Frostbite Editor (C#) Pipeline (C++) Game code (C++) System CPU-jobs (C++) System SPU-jobs (C++/asm) Generated shaders (HLSL) Compute kernels (HLSL) Offline CPU Runtime GPU 14
  • 15. General ”game code” (1/2) This is the majority of our 1.5 million lines of C++ Runs on Win32, Win64, Xbox 360 and PS3 Similar to general application code Huge amount of code & logic to maintain + continue to develop Low compute density ”Glue code” Scattered in memory (pointer chasing) Difficult to efficiently parallelize Out-of-order execution is a big help, but consoles are in-order  Key to be able to quickly iterate & change This is the actual game logic & glue that builds the game C++ not ideal, but has the invested infrastructure 15
  • 16. General ”game code” (2/2) PS3 is one of the main challenges Standard CPU parallelization doesn’t help CELL only has 2 HW threads on the PPU Split the code in 2: game code & system code Game logic, policy and glue code only on CPU ”If it runs well on the PS3 PPU, it runs well everywhere” Lower-level systems on PS3 SPUs Main goals going forward: Simplify & structure code base Reduce coupling with lower-level systems Increase in task parallelism for PC CELL processor 16
  • 17. Levels of code in Frostbite Editor (C#) Pipeline (C++) Game code (C++) System CPU-jobs (C++) System SPU-jobs (C++/asm) Generated shaders (HLSL) Compute kernels (HLSL) Offline CPU Runtime GPU 17
  • 18. Job-based parallelism Essential to utilize the cores on our target platforms Xbox 360: 6 HW threads PlayStation 3: 2 HW threads + 6 powerful SPUs PC: 2-16 HW threads (Nehalem HT is great!) Divide up system work into Jobs (a.k.a. Tasks) 15-200k C++ code each. 25k is common Can depend on each other (if needed) Dependencies create job graph All HW threads consume jobs ~200-300 / frame 18
  • 19. What is a Job for us? An asynchronous function call Function ptr + 4 uintptr_t parameters Cross-platform scheduler: EA JobManager Often uses work stealing 2 types of Jobs in Frostbite: CPU job (good) General code moved into job instead of threads SPU job (great!) Stateless pure functions, no side effects Data-oriented, explicit memory DMA to local store Designed to run on the PS3 SPUs = also very fast on in-order CPU Can hot-swap quick iterations  19
  • 20. struct FB_ALIGN(16) EntityRenderCullJobData { enum { MaxSphereTreeCount = 2, MaxStaticCullTreeCount = 2 }; uint sphereTreeCount; const SphereNode* sphereTrees[MaxSphereTreeCount]; u8 viewCount; u8 frustumCount; u8 viewIntersectFlags[32]; Frustum frustums[32]; .... (cut out 2/3 of struct for display size) u32 maxOutEntityCount; // Output data, pre-allocated by callee u32 outEntityCount; EntityRenderCullInfo* outEntities; }; void entityRenderCullJob(EntityRenderCullJobData* data); void validate(const EntityRenderCullJobData& data); Frustum culling of dynamic entities in sphere tree EntityRenderCull job example 20
  • 21. struct FB_ALIGN(16) EntityRenderCullJobData { enum { MaxSphereTreeCount = 2, MaxStaticCullTreeCount = 2 }; uint sphereTreeCount; const SphereNode* sphereTrees[MaxSphereTreeCount]; u8 viewCount; u8 frustumCount; u8 viewIntersectFlags[32]; Frustum frustums[32]; .... (cut out 2/3 of struct for display size) u32 maxOutEntityCount; // Output data, pre-allocated by callee u32 outEntityCount; EntityRenderCullInfo* outEntities; }; void entityRenderCullJob(EntityRenderCullJobData* data); void validate(const EntityRenderCullJobData& data); Frustum culling of dynamic entities in sphere tree struct contains all input data needed EntityRenderCull job example 21
  • 22. struct FB_ALIGN(16) EntityRenderCullJobData { enum { MaxSphereTreeCount = 2, MaxStaticCullTreeCount = 2 }; uint sphereTreeCount; const SphereNode* sphereTrees[MaxSphereTreeCount]; u8 viewCount; u8 frustumCount; u8 viewIntersectFlags[32]; Frustum frustums[32]; .... (cut out 2/3 of struct for display size) u32 maxOutEntityCount; // Output data, pre-allocated by callee u32 outEntityCount; EntityRenderCullInfo* outEntities; }; void entityRenderCullJob(EntityRenderCullJobData* data); void validate(const EntityRenderCullJobData& data); Frustum culling of dynamic entities in sphere tree struct contains all input data needed Max output data pre-allocated by callee EntityRenderCull job example 22
  • 23. struct FB_ALIGN(16) EntityRenderCullJobData { enum { MaxSphereTreeCount = 2, MaxStaticCullTreeCount = 2 }; uint sphereTreeCount; const SphereNode* sphereTrees[MaxSphereTreeCount]; u8 viewCount; u8 frustumCount; u8 viewIntersectFlags[32]; Frustum frustums[32]; .... (cut out 2/3 of struct for display size) u32 maxOutEntityCount; // Output data, pre-allocated by callee u32 outEntityCount; EntityRenderCullInfo* outEntities; }; void entityRenderCullJob(EntityRenderCullJobData* data); void validate(const EntityRenderCullJobData& data); Frustum culling of dynamic entities in sphere tree struct contains all input data needed Max output data pre-allocated by callee Single job function Compile both as CPU & SPU job EntityRenderCull job example 23
  • 24. struct FB_ALIGN(16) EntityRenderCullJobData { enum { MaxSphereTreeCount = 2, MaxStaticCullTreeCount = 2 }; uint sphereTreeCount; const SphereNode* sphereTrees[MaxSphereTreeCount]; u8 viewCount; u8 frustumCount; u8 viewIntersectFlags[32]; Frustum frustums[32]; .... (cut out 2/3 of struct for display size) u32 maxOutEntityCount; // Output data, pre-allocated by callee u32 outEntityCount; EntityRenderCullInfo* outEntities; }; void entityRenderCullJob(EntityRenderCullJobData* data); void validate(const EntityRenderCullJobData& data); Frustum culling of dynamic entities in sphere tree struct contains all input data needed Max output data pre-allocated by callee Single job function Compile both as CPU & SPU job Optional struct validation func EntityRenderCull job example 24
  • 25. EntityRenderCull SPU setup // local store variables EntityRenderCullJobData g_jobData; float g_zBuffer[256*114]; u16 g_terrainHeightData[64*64]; int main(uintptr_t dataEa, uintptr_t, uintptr_t, uintptr_t) { dmaBlockGet("jobData", &g_jobData, dataEa, sizeof(g_jobData)); validate(g_jobData); if (g_jobData.zBufferTestEnable) { dmaAsyncGet("zBuffer", g_zBuffer, g_jobData.zBuffer, g_jobData.zBufferResX*g_jobData.zBufferResY*4); g_jobData.zBuffer = g_zBuffer; if (g_jobData.zBufferShadowTestEnable && g_jobData.terrainHeightData) { dmaAsyncGet("terrainHeight", g_terrainHeightData, g_jobData.terrainHeightData, g_jobData.terrainHeightDataSize); g_jobData.terrainHeightData = g_terrainHeightData; } dmaWaitAll(); // block on both DMAs } // run the actual job, will internally do streaming DMAs to the output entity list entityRenderCullJob(&g_jobData); // put back the data because we changed outEntityCount dmaBlockPut(dataEa, &g_jobData, sizeof(g_jobData)); return 0; } 25
  • 26.
  • 27. Mix CPU- & SPU-jobs
  • 28.
  • 31.
  • 33. aka Tasks and Kernels26
  • 35. Task-parallel algorithms & coordination 28
  • 36. Timing view Example: PC, 4 CPU cores, 2 GPUs in AFR (AMD Radeon 4870x2) Real-time in-game overlay See timing events & effective parallelism On CPU, SPU & GPU – for all platforms Use to reduce sync-points & optimize load balancing GPU timing through DX event queries Our main performance tool! 29
  • 37. Rendering jobs Rendering systems are heavily divided up into CPU- & SPU-jobs Jobs: Terrain geometry [3] Undergrowth generation [2] Decal projection [4] Particle simulation Frustum culling Occlusion culling Occlusion rasterization Command buffer generation [6] PS3: Triangle culling [6] Most will move to GPU Eventually.. A few have already! Latency wall, more power and GPU memory access Mostly one-way data flow 30
  • 38. Occlusion culling job example Problem: Buildings & env occlude large amounts of objects Obscured objects still have to: Update logic & animations Generate command buffer Processed on CPU & GPU = expensive & wasteful  Difficult to implement full culling: Destructible buildings Dynamic occludees Difficult to precompute From Battlefield: Bad Company PS3 31
  • 39. Solution: Software occlusion culling Rasterize coarse zbuffer on SPU/CPU 256x114 float Low-poly occluder meshes 100 m view distance Max 10000 vertices/frame Parallel vertex & raster SPU-jobs Cost: a few milliseconds Cull all objects against zbuffer Screen-space bounding-box test Before passed to all other systems Big performance savings! 32
  • 40. GPU occlusion culling Ideally want to use the GPU, but current APIs are limited: Occlusion queries introduces overhead & latency Conditional rendering only helps GPU Compute Shader impl. possible, but same latency wall Future 1: Low-latency GPU execution context Rasterization and testing done on GPU where it belongs Lockstep with CPU, need to read back within a few ms Possible on Larrabee, want standard on PC Potential WDDM issue Future 2: Move entire cull & rendering to ”GPU” World, cull, systems, dispatch. End goal 33
  • 41. Levels of code in Frostbite Editor (C#) Pipeline (C++) Game code (C++) System CPU-jobs (C++) System SPU-jobs (C++/asm) Generated shaders (HLSL) Compute kernels (HLSL) Offline CPU Runtime GPU 34
  • 42. Shader types Generated shaders [1] Graph-based surface shaders Treated as content, not code Artist created Generates HLSL code Used by all meshes and 3d surfaces Graphics / Compute kernels Hand-coded & optimized HLSL Statically linked in with C++ Pixel- & compute-shaders Lighting, post-processing & special effects Graph-based surface shader in FrostEd 2 35
  • 44. 3 major challenges/goals going forward: How do we make it easier to develop, maintain & parallelize general game code? What do we need to continue to innovate & scale up real-time computational graphics? How can we move & scale up advanced simulation and non-graphics tasks to data-parallel manycore processors? Challenges 37
  • 45. 3 major challenges/goals going forward: How do we make it easier to develop, maintain & parallelize general game code? What do we need to continue to innovate & scale up real-time computational graphics? How can we move & scale up advanced simulation and non-graphics tasks to data-parallel manycore processors? Challenges Most likely the same solution(s)! 38
  • 46. Challenge 1 “How do we make it easier to develop, maintain & parallelize general game code?” Shared State Concurrency is a killer Not a big believer in Software Transactional Memory either Because of performance and too ”optimistic” flow A more strict & adapted C++ model Support for true immutable & r/w-only memory access Per-thread/task memory access opt-in To reduce the possibility for side effects in parallel code As much compile-time validation as possible Micro-threads / coroutines as first class citizens More? (we are used to not having much, for us, practical innovation here) Other languages? 39
  • 47. Challenge 1 - Task parallelism Multiple task libraries EA JobManager Current solution, designed primarily within SPU-job limitations MS ConcRT, Apple GCD, Intel TBB All has some good parts! Neither works on all of our platforms, key requirement OpenMP We don’t use it. Tiny band aid, doesn’t satisfy our control needs Need C++ enhancements to simplify usage C++ 0x lambdas / GCD blocks  Glacial C++ development & deployment  Want on all platforms, so lost on this console generation Moving away from semi-static job graphs Instead more dynamic on-demand job graphs 40
  • 48. Challenge 2 - Definition Goal: ”Real-time interactive graphics & simulation at a Pixar level of quality” Needed visual features: Global indirect lighting & reflections Complete anti-aliasing (frame buffers & shader) Sub-pixel geometry OIT Huge improvements in character animation These require massively more compute, BW and improved model! (animation can’t be solved with just more/better compute, so pretend it doesn’t exist for now) 41
  • 49. Challenge 2 - Problems Problems & limitations with current model: MSAA sample storage doesn’t scale to 16x+ Esp. with HDR & deferred shading GPU is handicapped by being spoon-fed by CPU Irregular workloads are difficult / inefficient Current HLSL is a limited language & model 42
  • 50. Challenge 2 - Solutions Sounds like a job for a high-throughput oriented massive data-parallel processor With a highly flexible programming model The CPU, as we know it, and its APIs are only in the way Pure software solution not practical as next step after DX11 PC 1) Multi-vendor & multi-architecture marketplace Skeptical we will reach a multi-vendor standard ISA within 3+ years Future consoles on the other hand, this would be preferred And would love to be proven wrong by the IHVs! Want a rich high-level compute model as next step Efficiently target both SW- & HW-pipeline architectures Even if we had 100% SW solution, to simplify development 1) Depending on the time frame 43
  • 51. ”Pipelined Compute Shaders” Queues as streaming I/O between compute kernels Simple & expressive model supporting irregular workloads Keeps data on chip, supports variable sized caches & cores Can target multiple types of HW & architectures Hybrid graphics/compute user-defined pipelines Language/API defining fixed stages inputs & outputs Pipelines can feed other pipelines (similar to DrawIndirect) Reyes-style Rendering with Ray Tracing Shade Split Raster Sub-D Prims Tess Frame Buffer Trace 44
  • 52. ”Pipelined Compute Shaders” Wanted for next DirectX and OpenCL/OpenGL As a standard, as soon as possible My main request/wish! Run on all: GPU, manycore and CPU IHV-specific solutions can be good start for R&D Model is also a good fit for many of our CPU/SPU jobs Parts of job graph can be seen as queues between stages Easier to write kernels/jobs with streaming I/O Instead of explicit fixed-buffers and ”memory passes” Or dynamic memory allocation 45
  • 53. Language? Language for this model is a big question But the concepts & infrastructure are what is important! Could be an extendedHLSL or ”data-parallel C++” Data-oriented imperative language (i.e. not standard C++) Think HLSL would probably be easier & the most explicit Amount of code is small and written from scratch SIMT-style implicit vectorization is preferred over explicit vectorization Easier to target multiple evolving architectures implicitly Our CPU code is still stuck at SSE2  46
  • 54. Language (cont.) Requirements: Full rich debugging, ideally in Visual Studio Asserts Internal kernel profiling Hot-swapping / edit-and-continue of kernels Opportunity for IHVs and platform providers to innovate here! Try to aim for an eventual cross-vendor standard Think of the co-development of Nvidia Cg and HLSL 47
  • 55. Unified development environment Want to debug/profile task- & data-parallel code seamlessly On all processors! CPU, GPU & manycore From any vendor = requires standard APIs or ISAs Visual Studio 2010 looks promising for task-parallel PC code Usable by our offline tools & hopefully PC runtime Want to integrate our own JobManager Nvidia Nexus looks great for data-parallel GPU code Eventual must have for all HW, how? Huge step forward! 48 VS2010 Parallel Tasks
  • 56. Future hardware (1/2) 2015 = 50 TFLOPS, we would spend it on: 80% graphics 15% simulation 4% misc 1% game (wouldn’t use all 500 GFLOPS for game logic & glue!) OOE CPUs more efficient for the majority of our game code But for the vast majority of our FLOPS these are fully irrelevant Can evolve to a small dot on a sea of DP cores Or run on scalar ISA wasting vector instructions on a few cores In other words: no need for separate CPU and GPU! 49
  • 57. Future hardware (2/2) Single main memory & address space Critical to share resources between graphics, simulation and game in immersive dynamic worlds Configurable kernel local stores / cache Similar to Nvidia Fermi & Intel Larrabee Local stores = reliability & good for regular loads Caches = essential for irregular data structures Cache coherency? Not always important for kernels But essential for general code, can partition? 50
  • 58. Conclusions Developer productivity can’t be limited by model It should enhance productivity & perf on all levels Tools & language constructs play a critical role Lots of opportunity for innovation and standardization! We are willing to go great lengths to utilize any HW Ifthat platform is part of our core business target and can makes a difference We for one welcome our parallel future! 51
  • 59. Thanks to DICE, EA and the Frostbite team The graphics/gamedev community on Twitter Steve McCalla, Mike Burrows Chas Boyd Nicolas Thibieroz, Mark Leather Dan Wexler, Yury Uralsky Kayvon Fatahalian 52
  • 60. References Previous Frostbite-related talks: [1] Johan Andersson. ”Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing Techniques ”. GDC 2007. http://repi.blogspot.com/2009/01/conference-slides.html [2] Natasha Tartarchuk & Johan Andersson. ”Rendering Architecture and Real-time Procedural Shading & Texturing Techniques”. GDC 2007. http://developer.amd.com/Assets/Andersson-Tatarchuk-FrostbiteRenderingArchitecture(GDC07_AMD_Session).pdf [3] Johan Andersson. ”Terrain Rendering in Frostbite using Procedural Shader Splatting”. Siggraph 2007. http://developer.amd.com/media/gpu_assets/Andersson-TerrainRendering(Siggraph07).pdf [4] Daniel Johansson & Johan Andersson. “Shadows & Decals – D3D10 techniques from Frostbite”. GDC 2009. http://repi.blogspot.com/2009/03/gdc09-shadows-decals-d3d10-techniques.html [5] Bill Bilodeau & Johan Andersson. “Your Game Needs Direct3D 11, So Get Started Now!”. GDC 2009. http://repi.blogspot.com/2009/04/gdc09-your-game-needs-direct3d-11-so.html [6] Johan Andersson. ”Parallel Graphics in Frostbite”. Siggraph 2009, Beyond Programmable Shading course. http://repi.blogspot.com/2009/08/siggraph09-parallel-graphics-in.html 53
  • 61. Questions? Email:johan.andersson@dice.se Blog: http://repi.se Twitter: @repi 54 Contact me. I do not bite, much..

Editor's Notes

  1. Xbox 360 version + offline super-high AA and resolution
  2. Isolate systems/areasWe do not use exceptions, nor much error handling
  3. Havok & Kynapse
  4. ”Glue code”
  5. Better code structure!Gustafson’s LawFixed 33 ms/f
  6. Data-layout
  7. We have a lot of jobs across the engine, too many to go through so I chose to focus more on some of the rendering.Our intention is to move all of these to the GPU
  8. Conservative
  9. Want to rasterize on GPU, not CPU CPU Û GPU job dependencies
  10. Simple keywords as override & sealed have been a help in other areas
  11. Would require multi-vendor open ISA standard
  12. Bindings to C very interesting!
  13. OpenCL