The next
generation
of
for Game Engines
GPU APIs
Pooya Eimandar
Fanap
March 2018
POOYA EIMANDAR
• Lead developer of Wolf.Engine (an open source 3D game engine)
• Project Manager and lead developer of Project Falcon since 2017
• CEO at BaziPardaz Ltd since 2011
• Founder at WolfSource.io
• Member of Microsoft Partner Network since 2013
• Author at PackT Publications and GameDev.Net
• Lecturer at The University of Applied Science and Technology National Foundation of
Computer Games (2013 – 2015)
• Lecturer at Iran Game Development Institute (2014 – 2015)
• Lead developer of Persian Game Engine (2010 - 2014)
• Member of IGDF jury panel for the best computer games technology. (2014 - 2015)
• https://persianengine.codeplex.com
(codename Black Kitten)
• https://github.com/WolfSource/Wolf.Engine
• The use of motion sensors in medical and
health industry. (Jan 27, 2014 - First
Conference of Game & Medical Health)
• DirectX Graphics Diagnostic. (Oct 11, 2013.
GameDev.Net)
PUBLICATIONS
• In Oct 1958, Physicist William Higinbotham created
first video
• 1970s : Golden age of Arcade Games powered by
Fujitsu’s MB14241, Atari 2600’s Television Interface
Adaptor and etc.
• 1992 : Silicon Graphics Inc., started developing
OpenGL in 1991 and released it in January 1992
• 1995: Microsoft DirectX released as Windows Game
SDK for Windows 95
HISTORY
HLSL PIPELINE
vs_1_1 supports 128 instructions such as: add, vs, log, mov, max, min m4x4
vs_2_0 supports 256 instructions
vs_3_0 supports minimum 512 instructions and up to the number of slots in
D3DCAPS9.MaxPixelShader30InstructionSlots
Vs_4_0 and later versions : No restriction
Memory Resources (Buffer, Texture)
Pixel
Shader
Output
Merger
Input
Assembler
Vertex
Shader
Hull
Shader
Tessellator
Domain
Shader
Geometry
Shader
Rasterizer
Stream Output
Control
Shader
Evaluation
Shader
Fragment
Shader
GLSL PIPELINE
THE EVOLUTION OF GPU APIS
• 1992 OpenGL: Fixed Functions Pipeline
• 1995 DirectX on Microsoft Windows 95, Microsoft stopped supporting OpenGL(till now v.1.1)
• 1996 3dfx’s Glide: Geometry & Texture Mapping
• 1998 DirectX 6: IHV independent + Multi Texturing
• 1999 DirectX 7: Hardware Texturing & Lighting + Cube Maps
• 2000 DirectX 8: Programmable Shaders
• 2002 DirectX 9: Floating point texture mapping, multiple RTs, Multiple-Element Textures, texture lookups
in the vertex shader and stencil buffer techniques
• 2004 OpenGL 2.0: GLSL
• 2006 DirectX 10: Major Update
• 2009 DirectX 11: Compute Shader
• 2010 OpenGL 3.3 + OpenGL 4.0: It was designed to target hardware able to support Direct3D 10/11
• 2012 DirectX 11.1: Direct2D + Direct3D, Integrated with WINRT
• 2013 DirectX 11.2: Dircet2D Geometry Rasterization, swap chain composition
• 2014 DirectX 11.3: Xbox One
• 2014 Apple Metal : released for IOS 8
• 2015 Apple Metal : released for Mac OSX El Cptain
• 2015 DirectX 12: Windows 10
• 2016 Vulkan : The next generation of OpenGL
• 2017 OpenGL 4.6: released at 25th Anniversary of OpenGL
THE NEXT GENERATION
• 2013 : AMD originally developed Mantle in cooperation with DICE
• Mantle was designed as an alternative to Direct3D and OpenGL
• 2015: Mantle's public SDK was suspended, as DirectX 12 and the Mantle-derived
Vulkan (Next Generation of OpenGL) rose in popularity
DirectX 12
Since 2014 on Apple IOS 8
Since 2015 on Mac OSX El Capitan
Xbox One (DX11.X)
Xbox One X
Windows 10
Windows 7/8/8.1/10
Linux
Android
Almost Cross Platform
• Low driver overhead
• Minimize runtime validation
• Multithreaded GPU command buffer recording
from CPU Cores
• Explicit Memory Management, Local Host, Device
Host, Shared Memory between CPU and GPU
• Provide explicit access to multiple GPUs
WHY DO WE NEED TO MIGRATE TO NEW APIS?
Application Application
responsible for
memory
allocation and
thread
management to
record command
buffers
Direct GPU Control
GPU
Next Gen APIs
DirectX 12
Vulkan
Metal
Traditional APIs
DirectX 11
OpenGL
OpenGL ES
Controlling GPU
offered by
traditional
graphics drivers
for managing
memory, context
and etc.
OpenGL
void load_texture_from_memory_rgba(_In_ uint8_t* pRGBAData)
{
glBindTexture(GL_TEXTURE_2D, ”texture_name”);
glTexImage2D(GL_TEXTURE_2D,
0,
GL_SRGBA8,
_width, _height,
0,
GL_UNSIGNED_BYTE,
GL_RGBA,
pRGBAData);
glGenerateMipmap(GL_TEXTURE_2D);
}
SAMPLE CODE
void create_image() {
const VkImageCreateInfo _image_create_info = {
VK_STRUCTURE_TYPE_IMAGE_CREATE_INFO, // Type
nullptr, // Next
0, // Flags
_image_type, // ImageType
_format, // Format
{ _width, _height, _depth },
_mip_map_levels , // MipLevels
_layer_count, // ArrayLayers
VK_SAMPLE_COUNT_1_BIT, // Samples
VK_IMAGE_TILING_OPTIMAL, // Tiling
_usage_flags, // Usage
VK_SHARING_MODE_EXCLUSIVE, // SharingMode
0, // QueueFamilyIndexCount
nullptr, // QueueFamilyIndices
VK_IMAGE_LAYOUT_UNDEFINED // InitialLayout };
vkCreateImage(vk_device, &_image_create_info, nullptr, &_image_view); }
SAMPLE CODE
void load_texture_from_memory_rgba(_In_ uint8_t*
pRGBAData)
{
create_image();
allocate_memory();
//bind to memory
vkBindImageMemory(vk_device, _image_view, _memory, 0)
copy_data_to_texture_2D(pRGBAData);
create_sampler();
create_image_view();
}
Vulkan
DIRECT GPU COMPONENTS
Heap Memory
Image
Image View
Image View
Buffer
Sampler
Frame Buffer Render Pass
Command
Buffer
Pool
Main Command Buffer
Second
Command
Buffer
Graphics Pipeline
Barrier Synchronization
Begin Render Pass
Bind Graphics Pipeline
Set Dynamic States
Bind to Buffers
Update Buffer
Bind Descriptor Sets
Draw
Execute Commands
End Render Pass
Buffer
Descriptor Set
Descriptor
Set
Pool Queue
Device
MEMORY ALLOCATION
Heap Memory
Memory
Chuck
Memory
Chuck
Memory
Chuck
Buffer 1 Buffer 2 Buffer 3
Vertex BufferIndex BufferUniform
MEMORY ALLOCATION
Memory Chuck
Buffer 1 Buffer 2 Buffer 3
Vertex BufferIndex BufferUniform
MEMORY ALLOCATION
Memory Chuck
Buffer
Vertex BufferIndex BufferUniform
THREAD SYNCHRONIZATION
Main Command Buffer
Execute Commands
Four synchronization types:
• Fences, being used to communicate completion of
execution of command buffer submissions to queues
back to the application.
• Semaphores, being generally associated with resources
or groups of resources and can be used to marshal
ownership of shared data. Their status is not visible to
the host. (Queues
• Events, providing a finer-grained synchronization
primitive which can be signaled at command level
granularity by both device and host, and can be waited
upon by either.
• Barriers, providing execution and memory
synchronization between sets of commands.
cmd buffer cmd buffer cmd buffer
cmd buffer cmd bufferFence 1
Fence 2
Barrier Synchronization
GRAPHICS PIPELINE
Main Command Buffer
Graphics Pipeline
Bind Graphics Pipeline
Graphics Pipeline
• Snapshot from all GPU States
• Rasterization state
• Shader Stage
• Vertex Input
• Tessellation state
• Multi Sample State
• Depth & Stencil State
• Color Blend State
DRAW
Main Command Buffer
Draw
Draw methods:
• Direct Draw
• Set Vertex, Index
• Call Draw
• Too Slow, Many Draw Calls
• Instanced Draw
• Setup instance buffer
• Draw all with same instance buffer
• Draw each object with same number of vertices and
indices
• InDirect Draw
• the buffer can be generated and updated offline with no
need to actually update the command buffers that
contain the actual drawing functions
• On indirect call draws all objects with associated vertex
and index buffer
WOLF ENGINE
• http://WolfSource.io
• https://github.com/WolfSource/Wolf.Engine
• The Wolf Engine is the next generation of Persian Game Engine which is a
cross-platform open source game engine. The Wolf is a comprehensive set
of C++ open source libraries for rendering and game developing.
• Script language : LUA
• Binding Languages: PyWolf, a Python binding for Wolf Engine
CONTACT ME
• https://Fanap.ir
• http://WolfSource.io
• http://PooyaEimandar.com
• Twitter.com/Wolf_Engine
• Linkedin.com/in/PooyaEimandar
• Github.com/PooyaEimandar
• T.me/WolfSource.io
THANK YOU!
• We are hiring at FANAP!
• Jobs.fanap.ir

The next generation of GPU APIs for Game Engines

  • 1.
    The next generation of for GameEngines GPU APIs Pooya Eimandar Fanap March 2018
  • 2.
    POOYA EIMANDAR • Leaddeveloper of Wolf.Engine (an open source 3D game engine) • Project Manager and lead developer of Project Falcon since 2017 • CEO at BaziPardaz Ltd since 2011 • Founder at WolfSource.io • Member of Microsoft Partner Network since 2013 • Author at PackT Publications and GameDev.Net • Lecturer at The University of Applied Science and Technology National Foundation of Computer Games (2013 – 2015) • Lecturer at Iran Game Development Institute (2014 – 2015) • Lead developer of Persian Game Engine (2010 - 2014) • Member of IGDF jury panel for the best computer games technology. (2014 - 2015)
  • 3.
    • https://persianengine.codeplex.com (codename BlackKitten) • https://github.com/WolfSource/Wolf.Engine • The use of motion sensors in medical and health industry. (Jan 27, 2014 - First Conference of Game & Medical Health) • DirectX Graphics Diagnostic. (Oct 11, 2013. GameDev.Net) PUBLICATIONS
  • 4.
    • In Oct1958, Physicist William Higinbotham created first video • 1970s : Golden age of Arcade Games powered by Fujitsu’s MB14241, Atari 2600’s Television Interface Adaptor and etc. • 1992 : Silicon Graphics Inc., started developing OpenGL in 1991 and released it in January 1992 • 1995: Microsoft DirectX released as Windows Game SDK for Windows 95 HISTORY
  • 5.
    HLSL PIPELINE vs_1_1 supports128 instructions such as: add, vs, log, mov, max, min m4x4 vs_2_0 supports 256 instructions vs_3_0 supports minimum 512 instructions and up to the number of slots in D3DCAPS9.MaxPixelShader30InstructionSlots Vs_4_0 and later versions : No restriction Memory Resources (Buffer, Texture) Pixel Shader Output Merger Input Assembler Vertex Shader Hull Shader Tessellator Domain Shader Geometry Shader Rasterizer Stream Output Control Shader Evaluation Shader Fragment Shader GLSL PIPELINE
  • 6.
    THE EVOLUTION OFGPU APIS • 1992 OpenGL: Fixed Functions Pipeline • 1995 DirectX on Microsoft Windows 95, Microsoft stopped supporting OpenGL(till now v.1.1) • 1996 3dfx’s Glide: Geometry & Texture Mapping • 1998 DirectX 6: IHV independent + Multi Texturing • 1999 DirectX 7: Hardware Texturing & Lighting + Cube Maps • 2000 DirectX 8: Programmable Shaders • 2002 DirectX 9: Floating point texture mapping, multiple RTs, Multiple-Element Textures, texture lookups in the vertex shader and stencil buffer techniques • 2004 OpenGL 2.0: GLSL • 2006 DirectX 10: Major Update • 2009 DirectX 11: Compute Shader • 2010 OpenGL 3.3 + OpenGL 4.0: It was designed to target hardware able to support Direct3D 10/11 • 2012 DirectX 11.1: Direct2D + Direct3D, Integrated with WINRT • 2013 DirectX 11.2: Dircet2D Geometry Rasterization, swap chain composition • 2014 DirectX 11.3: Xbox One • 2014 Apple Metal : released for IOS 8 • 2015 Apple Metal : released for Mac OSX El Cptain • 2015 DirectX 12: Windows 10 • 2016 Vulkan : The next generation of OpenGL • 2017 OpenGL 4.6: released at 25th Anniversary of OpenGL
  • 7.
    THE NEXT GENERATION •2013 : AMD originally developed Mantle in cooperation with DICE • Mantle was designed as an alternative to Direct3D and OpenGL • 2015: Mantle's public SDK was suspended, as DirectX 12 and the Mantle-derived Vulkan (Next Generation of OpenGL) rose in popularity DirectX 12 Since 2014 on Apple IOS 8 Since 2015 on Mac OSX El Capitan Xbox One (DX11.X) Xbox One X Windows 10 Windows 7/8/8.1/10 Linux Android Almost Cross Platform
  • 8.
    • Low driveroverhead • Minimize runtime validation • Multithreaded GPU command buffer recording from CPU Cores • Explicit Memory Management, Local Host, Device Host, Shared Memory between CPU and GPU • Provide explicit access to multiple GPUs WHY DO WE NEED TO MIGRATE TO NEW APIS? Application Application responsible for memory allocation and thread management to record command buffers Direct GPU Control GPU Next Gen APIs DirectX 12 Vulkan Metal Traditional APIs DirectX 11 OpenGL OpenGL ES Controlling GPU offered by traditional graphics drivers for managing memory, context and etc.
  • 9.
    OpenGL void load_texture_from_memory_rgba(_In_ uint8_t*pRGBAData) { glBindTexture(GL_TEXTURE_2D, ”texture_name”); glTexImage2D(GL_TEXTURE_2D, 0, GL_SRGBA8, _width, _height, 0, GL_UNSIGNED_BYTE, GL_RGBA, pRGBAData); glGenerateMipmap(GL_TEXTURE_2D); } SAMPLE CODE
  • 10.
    void create_image() { constVkImageCreateInfo _image_create_info = { VK_STRUCTURE_TYPE_IMAGE_CREATE_INFO, // Type nullptr, // Next 0, // Flags _image_type, // ImageType _format, // Format { _width, _height, _depth }, _mip_map_levels , // MipLevels _layer_count, // ArrayLayers VK_SAMPLE_COUNT_1_BIT, // Samples VK_IMAGE_TILING_OPTIMAL, // Tiling _usage_flags, // Usage VK_SHARING_MODE_EXCLUSIVE, // SharingMode 0, // QueueFamilyIndexCount nullptr, // QueueFamilyIndices VK_IMAGE_LAYOUT_UNDEFINED // InitialLayout }; vkCreateImage(vk_device, &_image_create_info, nullptr, &_image_view); } SAMPLE CODE void load_texture_from_memory_rgba(_In_ uint8_t* pRGBAData) { create_image(); allocate_memory(); //bind to memory vkBindImageMemory(vk_device, _image_view, _memory, 0) copy_data_to_texture_2D(pRGBAData); create_sampler(); create_image_view(); } Vulkan
  • 11.
    DIRECT GPU COMPONENTS HeapMemory Image Image View Image View Buffer Sampler Frame Buffer Render Pass Command Buffer Pool Main Command Buffer Second Command Buffer Graphics Pipeline Barrier Synchronization Begin Render Pass Bind Graphics Pipeline Set Dynamic States Bind to Buffers Update Buffer Bind Descriptor Sets Draw Execute Commands End Render Pass Buffer Descriptor Set Descriptor Set Pool Queue Device
  • 12.
    MEMORY ALLOCATION Heap Memory Memory Chuck Memory Chuck Memory Chuck Buffer1 Buffer 2 Buffer 3 Vertex BufferIndex BufferUniform
  • 13.
    MEMORY ALLOCATION Memory Chuck Buffer1 Buffer 2 Buffer 3 Vertex BufferIndex BufferUniform
  • 14.
  • 15.
    THREAD SYNCHRONIZATION Main CommandBuffer Execute Commands Four synchronization types: • Fences, being used to communicate completion of execution of command buffer submissions to queues back to the application. • Semaphores, being generally associated with resources or groups of resources and can be used to marshal ownership of shared data. Their status is not visible to the host. (Queues • Events, providing a finer-grained synchronization primitive which can be signaled at command level granularity by both device and host, and can be waited upon by either. • Barriers, providing execution and memory synchronization between sets of commands. cmd buffer cmd buffer cmd buffer cmd buffer cmd bufferFence 1 Fence 2 Barrier Synchronization
  • 16.
    GRAPHICS PIPELINE Main CommandBuffer Graphics Pipeline Bind Graphics Pipeline Graphics Pipeline • Snapshot from all GPU States • Rasterization state • Shader Stage • Vertex Input • Tessellation state • Multi Sample State • Depth & Stencil State • Color Blend State
  • 17.
    DRAW Main Command Buffer Draw Drawmethods: • Direct Draw • Set Vertex, Index • Call Draw • Too Slow, Many Draw Calls • Instanced Draw • Setup instance buffer • Draw all with same instance buffer • Draw each object with same number of vertices and indices • InDirect Draw • the buffer can be generated and updated offline with no need to actually update the command buffers that contain the actual drawing functions • On indirect call draws all objects with associated vertex and index buffer
  • 18.
    WOLF ENGINE • http://WolfSource.io •https://github.com/WolfSource/Wolf.Engine • The Wolf Engine is the next generation of Persian Game Engine which is a cross-platform open source game engine. The Wolf is a comprehensive set of C++ open source libraries for rendering and game developing. • Script language : LUA • Binding Languages: PyWolf, a Python binding for Wolf Engine
  • 19.
    CONTACT ME • https://Fanap.ir •http://WolfSource.io • http://PooyaEimandar.com • Twitter.com/Wolf_Engine • Linkedin.com/in/PooyaEimandar • Github.com/PooyaEimandar • T.me/WolfSource.io
  • 20.
    THANK YOU! • Weare hiring at FANAP! • Jobs.fanap.ir