SlideShare a Scribd company logo
1 of 32
Download to read offline
Make Your Own Ray
Tracing GPU with
FPGA
COSCUP 2023
Owen Wu
fallingcat@gmail.com
Agenda
● Self-Intro
● Session Overview
● How to Start
○ HDL
○ EDA
○ FPGA
● Think Differently
● Ray Tracing
● HomebrewGPU projrect
Self-Intro
Self-Intro
● Game developer
○ Game engine development
○ PC
○ Console
○ Mobile
● GPU software engineer
○ Optimization from the perspective of hardware
○ AMD
○ Arm
● https://tinyurl.com/owenwu
Session Overview
Session Overview
● This session is for software engineer
● This session is NOT for hardware engineer
● Basic intro for the beginner
● Making a chip is very easy and cheap nowaday
○ 6 months from zero to a workable GPU
● Turn you algorithm into hardware, think
differently
● Many open sourced projects on GitHub
How to Start
How to Start
● HDL (Haardware Description Language)
○ Language
● EDA (Electronic Design Automation)
○ Compiler
● FPGA (Field Programmable Gate Array)
○ Hardware
HDL
Hardware Description
Language
HDL
● VHDL
○ Ada
● Verilog
○ C
● Connect modules to design a whole chip
● Clock is the way to sync all modules
○ 100M Hz - generate 100M signals in one second
● You can think module as a function in C
● Every module can only do very limited works
○ Works need to be finished in one clock
● Books for begineer
○ Programming FPGAs: Getting Started with Verilog
○ Introduction to Verilog
EDA
Electronic Design
Automation
EDA
● Convert HDL to bitstream file
● Upload bitstream file to FPGA to execute
● Many steps
○ IC Design
○ Synthesis
○ Verification
○ Physical Design
● You can think EDA as a compiler
● FPGA makers provide basic EDA for free
○ Xilinx Vivado
● You can also try EDA online
○ https://8bitworkshop.com/v3.10.1/?platform=verilog&file=cloc
k_divider.v
FPGA
Field Programmable Gate
Array
FPGA
● Upload bitstream file to configure logic blocks
● FPGA development board integrate many components
○ VGA/HDMI output
○ Memory
○ LED
○ 7 segment disply
○ SD Card
○ SoC
● FPGA has different number of logic cells
○ Which decides how complex the design can be
● Elbert V2 for beginner
● Nexys A7 for more complex design
Think Differently
Parallel
● Software is serial
● Hardware is parallel
● Every modules work simultaneously
● Software optimizatios may not work with
hardware
● Don’t use software thinking when designing
hardware
B()
MUX
A()
R
C
Latency v.s. Throughput
● CPU performance depends on latency
● Low latency means that the instruction can be completed quickly
● Instruction has order dependency
● GPU only care how long it takes to finish a frame
● Pixel doesn’t have order dependency
● Pixels can be executed simultaneously
● If the latency of one pixel is 32 clock
● The hardware executes 64 streams simultaneously
● The throughput will be 2 pixel per clock
Pipeline
● Hardware has many different modules
● All modules need to work simultaneously to get
the best performance
● Use pipeline to split the tasks
● Every module process different pixel at the
same time
Surface (5 clk) Shadow (5 clk) Shading (5 clk)
15 clk/pixel
Surface Shadow Shading
pixel 1
pixel2 pixel 1
pixel3 pixel2 pixel 1
pixel 4 pixel 3 pixel 2
5 clk/pixel
clk 0
clk 5
clk 10
clk 15
Ray Tracing
Ray Tracing
● Ray tracing is easy to implement
● Ray hit objects then reflect to camera
● Invert the ray
● Cast a ray from each pixel of the screen
● Find a closest hit of ray and objects
● Decide the color of the pixel
○ If there is a hit, the color of hit object
○ If there is no hit, the color of background
● Ray Tracing in One Weekend at GitHub
Ray Tracing - Reflection
● For the hit on a object
● If the object is reflective
● Cast a reflection ray from hit point
● Find the closet hit of the reflection ray
● Recursively cast reflection rays
● Blend the colosr of reflected objects into
final shading
Ray Tracing - Shadow
● For the hit on a object
● Cast a new ray from hit
point toward light
● Is there is any hit of the new
ray
● If yes, the pixel is in shadow
● Otherwise the pixel is not in
shadow
BVH(Bounding Volume Hierarchy)
● Accelerate the hit detection between ray and primitives
○ Quickly exclude the nodes which don’t have intersection
● Use AABB(Axis-aligned Bounding Box) to split the space
● Traverse the AABB until reach the leaf
● Detect the hits of ray between the primitive in leaf
AABB
AABB AABB
AABB AABB
Left
Node
Right
Node
HomebrewGPU Project
HomebrewGPU project
● Open sourced project
○ https://github.com/fallingcat/HomebrewGPU
● Implement a basic ray tracing GPU
○ Voxel based rendering
○ BVH acceleration
○ Shading/Reflection/Refraction/Shadow
● 6 months from zero to complete
HomebrewGPU project
Architecture
Architecture
● Thread Generator
○ Generate one thread per clock for each ray core
○ Each thread presents one pixel
○ The thread will go through ray core and output the final color
● BVH Structure
○ BVH structure stores the BVH tree structure data
○ Accepts the node or leaf query from ray core
○ Output the node or leaf data to ray core
● Primitive Unit
○ Primitive Unit stores the raw data of all primitives
○ Accepts the query from ray core and output primitive data
● Ray Core
○ Ray core process one thread to output the final color
○ Accepts the thread from thread generator or reflection/refraction
ray
● Frame Buffer Writer
○ Cache the output of ray cores and write the pixel to frame buffer
○ Some threads with reflection/refraction take longer to get the final
color
○ Wait util all threads in one cache set are finished then write the
data to the frame buffer
Architecture
● Surface stage
○ Process the ray from camera and find the closest hit of the ray
○ Pass the hit information to next stage
● Shadow stage
○ Cast a ray from closest hit position to light source
○ It will pass the shadow information to next stage
● Shade stage
○ Use the closest hit information to decide if it's the final color or
reflection/refraction will occur
○ Cast a reflection/refraction ray and pass the data back to surface
stage
○ Recursively feed back to surface stage
Design Verification
Q & A
● Owen (fallingcat@gmail.com)
● https://github.com/fallingcat/Ho
mebrewGPU
Thanks!

More Related Content

What's hot

Killzone Shadow Fall Demo Postmortem
Killzone Shadow Fall Demo PostmortemKillzone Shadow Fall Demo Postmortem
Killzone Shadow Fall Demo PostmortemGuerrilla
 
A Bit More Deferred Cry Engine3
A Bit More Deferred   Cry Engine3A Bit More Deferred   Cry Engine3
A Bit More Deferred Cry Engine3guest11b095
 
Advanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineAdvanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineNarann29
 
Taking Killzone Shadow Fall Image Quality Into The Next Generation
Taking Killzone Shadow Fall Image Quality Into The Next GenerationTaking Killzone Shadow Fall Image Quality Into The Next Generation
Taking Killzone Shadow Fall Image Quality Into The Next GenerationGuerrilla
 
Triangle Visibility buffer
Triangle Visibility bufferTriangle Visibility buffer
Triangle Visibility bufferWolfgang Engel
 
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)Johan Andersson
 
[0122 구경원]게임에서의 충돌처리
[0122 구경원]게임에서의 충돌처리[0122 구경원]게임에서의 충돌처리
[0122 구경원]게임에서의 충돌처리KyeongWon Koo
 
[Unite2015 박민근] 유니티 최적화 테크닉 총정리
[Unite2015 박민근] 유니티 최적화 테크닉 총정리[Unite2015 박민근] 유니티 최적화 테크닉 총정리
[Unite2015 박민근] 유니티 최적화 테크닉 총정리MinGeun Park
 
[KGC2011_박민근] 신입 게임 개발자가 알아야 할 것들
[KGC2011_박민근] 신입 게임 개발자가 알아야 할 것들[KGC2011_박민근] 신입 게임 개발자가 알아야 할 것들
[KGC2011_박민근] 신입 게임 개발자가 알아야 할 것들MinGeun Park
 
Dx11 performancereloaded
Dx11 performancereloadedDx11 performancereloaded
Dx11 performancereloadedmistercteam
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologyTiago Sousa
 
Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666Tiago Sousa
 
Unity optimize mobile game performance
Unity optimize mobile game performanceUnity optimize mobile game performance
Unity optimize mobile game performanceRiver Wang
 
Extending the Animation Rigging package with C# – Unite Copenhagen 2019
Extending the Animation Rigging package with C# – Unite Copenhagen 2019Extending the Animation Rigging package with C# – Unite Copenhagen 2019
Extending the Animation Rigging package with C# – Unite Copenhagen 2019Unity Technologies
 
Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Graham Wihlidal
 
Next generation graphics programming on xbox 360
Next generation graphics programming on xbox 360Next generation graphics programming on xbox 360
Next generation graphics programming on xbox 360VIKAS SINGH BHADOURIA
 
Moving Frostbite to Physically Based Rendering
Moving Frostbite to Physically Based RenderingMoving Frostbite to Physically Based Rendering
Moving Frostbite to Physically Based RenderingElectronic Arts / DICE
 
The Guerrilla Guide to Game Code
The Guerrilla Guide to Game CodeThe Guerrilla Guide to Game Code
The Guerrilla Guide to Game CodeGuerrilla
 

What's hot (20)

Killzone Shadow Fall Demo Postmortem
Killzone Shadow Fall Demo PostmortemKillzone Shadow Fall Demo Postmortem
Killzone Shadow Fall Demo Postmortem
 
A Bit More Deferred Cry Engine3
A Bit More Deferred   Cry Engine3A Bit More Deferred   Cry Engine3
A Bit More Deferred Cry Engine3
 
Advanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineAdvanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering Pipeline
 
Taking Killzone Shadow Fall Image Quality Into The Next Generation
Taking Killzone Shadow Fall Image Quality Into The Next GenerationTaking Killzone Shadow Fall Image Quality Into The Next Generation
Taking Killzone Shadow Fall Image Quality Into The Next Generation
 
Triangle Visibility buffer
Triangle Visibility bufferTriangle Visibility buffer
Triangle Visibility buffer
 
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
 
OpenGL ES 3.1 Reference Card
OpenGL ES 3.1 Reference CardOpenGL ES 3.1 Reference Card
OpenGL ES 3.1 Reference Card
 
[0122 구경원]게임에서의 충돌처리
[0122 구경원]게임에서의 충돌처리[0122 구경원]게임에서의 충돌처리
[0122 구경원]게임에서의 충돌처리
 
[Unite2015 박민근] 유니티 최적화 테크닉 총정리
[Unite2015 박민근] 유니티 최적화 테크닉 총정리[Unite2015 박민근] 유니티 최적화 테크닉 총정리
[Unite2015 박민근] 유니티 최적화 테크닉 총정리
 
[KGC2011_박민근] 신입 게임 개발자가 알아야 할 것들
[KGC2011_박민근] 신입 게임 개발자가 알아야 할 것들[KGC2011_박민근] 신입 게임 개발자가 알아야 할 것들
[KGC2011_박민근] 신입 게임 개발자가 알아야 할 것들
 
Dx11 performancereloaded
Dx11 performancereloadedDx11 performancereloaded
Dx11 performancereloaded
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics Technology
 
Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666
 
Unity optimize mobile game performance
Unity optimize mobile game performanceUnity optimize mobile game performance
Unity optimize mobile game performance
 
Extending the Animation Rigging package with C# – Unite Copenhagen 2019
Extending the Animation Rigging package with C# – Unite Copenhagen 2019Extending the Animation Rigging package with C# – Unite Copenhagen 2019
Extending the Animation Rigging package with C# – Unite Copenhagen 2019
 
Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016
 
Next generation graphics programming on xbox 360
Next generation graphics programming on xbox 360Next generation graphics programming on xbox 360
Next generation graphics programming on xbox 360
 
Moving Frostbite to Physically Based Rendering
Moving Frostbite to Physically Based RenderingMoving Frostbite to Physically Based Rendering
Moving Frostbite to Physically Based Rendering
 
The Guerrilla Guide to Game Code
The Guerrilla Guide to Game CodeThe Guerrilla Guide to Game Code
The Guerrilla Guide to Game Code
 
Bending the Graphics Pipeline
Bending the Graphics PipelineBending the Graphics Pipeline
Bending the Graphics Pipeline
 

Similar to COSCUP 2023 - Make Your Own Ray Tracing GPU with FPGA

Smedberg niklas bringing_aaa_graphics
Smedberg niklas bringing_aaa_graphicsSmedberg niklas bringing_aaa_graphics
Smedberg niklas bringing_aaa_graphicschangehee lee
 
【Unite 2018 Tokyo】スクリプタブルレンダーパイプライン入門
【Unite 2018 Tokyo】スクリプタブルレンダーパイプライン入門【Unite 2018 Tokyo】スクリプタブルレンダーパイプライン入門
【Unite 2018 Tokyo】スクリプタブルレンダーパイプライン入門Unity Technologies Japan K.K.
 
Developing games and graphic visualizations in Pascal
Developing games and graphic visualizations in PascalDeveloping games and graphic visualizations in Pascal
Developing games and graphic visualizations in PascalMichalis Kamburelis
 
Fun with Network Interfaces
Fun with Network InterfacesFun with Network Interfaces
Fun with Network InterfacesKernel TLV
 
Computer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming IComputer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming I💻 Anton Gerdelan
 
High Performance Rust UI.pdf
High Performance Rust UI.pdfHigh Performance Rust UI.pdf
High Performance Rust UI.pdfmraaaaa
 
GDCE 2015: Blueprint Components to C++
GDCE 2015: Blueprint Components to C++GDCE 2015: Blueprint Components to C++
GDCE 2015: Blueprint Components to C++Gerke Max Preussner
 
BKK16-306 ART ii
BKK16-306 ART iiBKK16-306 ART ii
BKK16-306 ART iiLinaro
 
Mirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in GoMirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in Golinuxlab_conf
 
Power of WebGL (FSTO 2014)
Power of WebGL (FSTO 2014)Power of WebGL (FSTO 2014)
Power of WebGL (FSTO 2014)Verold
 
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farmKernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farmAnne Nicolas
 
LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205Linaro
 
Yocto and IoT - a retrospective
Yocto and IoT - a retrospectiveYocto and IoT - a retrospective
Yocto and IoT - a retrospectiveOpen-RnD
 
The rendering technology of 'lords of the fallen' philip hammer
The rendering technology of 'lords of the fallen'   philip hammerThe rendering technology of 'lords of the fallen'   philip hammer
The rendering technology of 'lords of the fallen' philip hammerMary Chan
 
MOVED: The challenge of SVE in QEMU - SFO17-103
MOVED: The challenge of SVE in QEMU - SFO17-103MOVED: The challenge of SVE in QEMU - SFO17-103
MOVED: The challenge of SVE in QEMU - SFO17-103Linaro
 
Dfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshopDfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshopTamas K Lengyel
 
NetflixOSS meetup lightning talks and roadmap
NetflixOSS meetup lightning talks and roadmapNetflixOSS meetup lightning talks and roadmap
NetflixOSS meetup lightning talks and roadmapRuslan Meshenberg
 
The Next Generation of PhyreEngine
The Next Generation of PhyreEngineThe Next Generation of PhyreEngine
The Next Generation of PhyreEngineSlide_N
 
How to build Kick Ass Games in the Cloud
How to build Kick Ass Games in the CloudHow to build Kick Ass Games in the Cloud
How to build Kick Ass Games in the CloudChris Schalk
 
Castle Game Engine and the joy of making and using a custom game engine
Castle Game Engine and the joy  of making and using a custom game engineCastle Game Engine and the joy  of making and using a custom game engine
Castle Game Engine and the joy of making and using a custom game engineMichalis Kamburelis
 

Similar to COSCUP 2023 - Make Your Own Ray Tracing GPU with FPGA (20)

Smedberg niklas bringing_aaa_graphics
Smedberg niklas bringing_aaa_graphicsSmedberg niklas bringing_aaa_graphics
Smedberg niklas bringing_aaa_graphics
 
【Unite 2018 Tokyo】スクリプタブルレンダーパイプライン入門
【Unite 2018 Tokyo】スクリプタブルレンダーパイプライン入門【Unite 2018 Tokyo】スクリプタブルレンダーパイプライン入門
【Unite 2018 Tokyo】スクリプタブルレンダーパイプライン入門
 
Developing games and graphic visualizations in Pascal
Developing games and graphic visualizations in PascalDeveloping games and graphic visualizations in Pascal
Developing games and graphic visualizations in Pascal
 
Fun with Network Interfaces
Fun with Network InterfacesFun with Network Interfaces
Fun with Network Interfaces
 
Computer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming IComputer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming I
 
High Performance Rust UI.pdf
High Performance Rust UI.pdfHigh Performance Rust UI.pdf
High Performance Rust UI.pdf
 
GDCE 2015: Blueprint Components to C++
GDCE 2015: Blueprint Components to C++GDCE 2015: Blueprint Components to C++
GDCE 2015: Blueprint Components to C++
 
BKK16-306 ART ii
BKK16-306 ART iiBKK16-306 ART ii
BKK16-306 ART ii
 
Mirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in GoMirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in Go
 
Power of WebGL (FSTO 2014)
Power of WebGL (FSTO 2014)Power of WebGL (FSTO 2014)
Power of WebGL (FSTO 2014)
 
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farmKernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farm
 
LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205
 
Yocto and IoT - a retrospective
Yocto and IoT - a retrospectiveYocto and IoT - a retrospective
Yocto and IoT - a retrospective
 
The rendering technology of 'lords of the fallen' philip hammer
The rendering technology of 'lords of the fallen'   philip hammerThe rendering technology of 'lords of the fallen'   philip hammer
The rendering technology of 'lords of the fallen' philip hammer
 
MOVED: The challenge of SVE in QEMU - SFO17-103
MOVED: The challenge of SVE in QEMU - SFO17-103MOVED: The challenge of SVE in QEMU - SFO17-103
MOVED: The challenge of SVE in QEMU - SFO17-103
 
Dfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshopDfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshop
 
NetflixOSS meetup lightning talks and roadmap
NetflixOSS meetup lightning talks and roadmapNetflixOSS meetup lightning talks and roadmap
NetflixOSS meetup lightning talks and roadmap
 
The Next Generation of PhyreEngine
The Next Generation of PhyreEngineThe Next Generation of PhyreEngine
The Next Generation of PhyreEngine
 
How to build Kick Ass Games in the Cloud
How to build Kick Ass Games in the CloudHow to build Kick Ass Games in the Cloud
How to build Kick Ass Games in the Cloud
 
Castle Game Engine and the joy of making and using a custom game engine
Castle Game Engine and the joy  of making and using a custom game engineCastle Game Engine and the joy  of making and using a custom game engine
Castle Game Engine and the joy of making and using a custom game engine
 

More from Owen Wu

Unreal Fest 2023 - Lumen with Immortalis
Unreal Fest 2023 - Lumen with ImmortalisUnreal Fest 2023 - Lumen with Immortalis
Unreal Fest 2023 - Lumen with ImmortalisOwen Wu
 
Unity mobile game performance profiling – using arm mobile studio
Unity mobile game performance profiling – using arm mobile studioUnity mobile game performance profiling – using arm mobile studio
Unity mobile game performance profiling – using arm mobile studioOwen Wu
 
[Unite Seoul 2020] Mobile Graphics Best Practices for Artists
[Unite Seoul 2020] Mobile Graphics Best Practices for Artists[Unite Seoul 2020] Mobile Graphics Best Practices for Artists
[Unite Seoul 2020] Mobile Graphics Best Practices for ArtistsOwen Wu
 
[TGDF 2020] Mobile Graphics Best Practices for Artist
[TGDF 2020] Mobile Graphics Best Practices for Artist[TGDF 2020] Mobile Graphics Best Practices for Artist
[TGDF 2020] Mobile Graphics Best Practices for ArtistOwen Wu
 
[Unity Forum 2019] Mobile Graphics Optimization Guides
[Unity Forum 2019] Mobile Graphics Optimization Guides[Unity Forum 2019] Mobile Graphics Optimization Guides
[Unity Forum 2019] Mobile Graphics Optimization GuidesOwen Wu
 
[TGDF 2019] Mali GPU Architecture and Mobile Studio
[TGDF 2019] Mali GPU Architecture and Mobile Studio[TGDF 2019] Mali GPU Architecture and Mobile Studio
[TGDF 2019] Mali GPU Architecture and Mobile StudioOwen Wu
 
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio [Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio Owen Wu
 
[GDC 2012] Enhancing Graphics in Unreal Engine 3 Titles Using AMD Code Submis...
[GDC 2012] Enhancing Graphics in Unreal Engine 3 Titles Using AMD Code Submis...[GDC 2012] Enhancing Graphics in Unreal Engine 3 Titles Using AMD Code Submis...
[GDC 2012] Enhancing Graphics in Unreal Engine 3 Titles Using AMD Code Submis...Owen Wu
 
[TGDF 2014] 進階Shader技術
[TGDF 2014] 進階Shader技術[TGDF 2014] 進階Shader技術
[TGDF 2014] 進階Shader技術Owen Wu
 

More from Owen Wu (9)

Unreal Fest 2023 - Lumen with Immortalis
Unreal Fest 2023 - Lumen with ImmortalisUnreal Fest 2023 - Lumen with Immortalis
Unreal Fest 2023 - Lumen with Immortalis
 
Unity mobile game performance profiling – using arm mobile studio
Unity mobile game performance profiling – using arm mobile studioUnity mobile game performance profiling – using arm mobile studio
Unity mobile game performance profiling – using arm mobile studio
 
[Unite Seoul 2020] Mobile Graphics Best Practices for Artists
[Unite Seoul 2020] Mobile Graphics Best Practices for Artists[Unite Seoul 2020] Mobile Graphics Best Practices for Artists
[Unite Seoul 2020] Mobile Graphics Best Practices for Artists
 
[TGDF 2020] Mobile Graphics Best Practices for Artist
[TGDF 2020] Mobile Graphics Best Practices for Artist[TGDF 2020] Mobile Graphics Best Practices for Artist
[TGDF 2020] Mobile Graphics Best Practices for Artist
 
[Unity Forum 2019] Mobile Graphics Optimization Guides
[Unity Forum 2019] Mobile Graphics Optimization Guides[Unity Forum 2019] Mobile Graphics Optimization Guides
[Unity Forum 2019] Mobile Graphics Optimization Guides
 
[TGDF 2019] Mali GPU Architecture and Mobile Studio
[TGDF 2019] Mali GPU Architecture and Mobile Studio[TGDF 2019] Mali GPU Architecture and Mobile Studio
[TGDF 2019] Mali GPU Architecture and Mobile Studio
 
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio [Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
 
[GDC 2012] Enhancing Graphics in Unreal Engine 3 Titles Using AMD Code Submis...
[GDC 2012] Enhancing Graphics in Unreal Engine 3 Titles Using AMD Code Submis...[GDC 2012] Enhancing Graphics in Unreal Engine 3 Titles Using AMD Code Submis...
[GDC 2012] Enhancing Graphics in Unreal Engine 3 Titles Using AMD Code Submis...
 
[TGDF 2014] 進階Shader技術
[TGDF 2014] 進階Shader技術[TGDF 2014] 進階Shader技術
[TGDF 2014] 進階Shader技術
 

Recently uploaded

College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...Call Girls in Nagpur High Profile
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 

Recently uploaded (20)

College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 

COSCUP 2023 - Make Your Own Ray Tracing GPU with FPGA

  • 1. Make Your Own Ray Tracing GPU with FPGA COSCUP 2023 Owen Wu fallingcat@gmail.com
  • 2. Agenda ● Self-Intro ● Session Overview ● How to Start ○ HDL ○ EDA ○ FPGA ● Think Differently ● Ray Tracing ● HomebrewGPU projrect
  • 4. Self-Intro ● Game developer ○ Game engine development ○ PC ○ Console ○ Mobile ● GPU software engineer ○ Optimization from the perspective of hardware ○ AMD ○ Arm ● https://tinyurl.com/owenwu
  • 6. Session Overview ● This session is for software engineer ● This session is NOT for hardware engineer ● Basic intro for the beginner ● Making a chip is very easy and cheap nowaday ○ 6 months from zero to a workable GPU ● Turn you algorithm into hardware, think differently ● Many open sourced projects on GitHub
  • 8. How to Start ● HDL (Haardware Description Language) ○ Language ● EDA (Electronic Design Automation) ○ Compiler ● FPGA (Field Programmable Gate Array) ○ Hardware
  • 10. HDL ● VHDL ○ Ada ● Verilog ○ C ● Connect modules to design a whole chip ● Clock is the way to sync all modules ○ 100M Hz - generate 100M signals in one second ● You can think module as a function in C ● Every module can only do very limited works ○ Works need to be finished in one clock ● Books for begineer ○ Programming FPGAs: Getting Started with Verilog ○ Introduction to Verilog
  • 12. EDA ● Convert HDL to bitstream file ● Upload bitstream file to FPGA to execute ● Many steps ○ IC Design ○ Synthesis ○ Verification ○ Physical Design ● You can think EDA as a compiler ● FPGA makers provide basic EDA for free ○ Xilinx Vivado ● You can also try EDA online ○ https://8bitworkshop.com/v3.10.1/?platform=verilog&file=cloc k_divider.v
  • 14. FPGA ● Upload bitstream file to configure logic blocks ● FPGA development board integrate many components ○ VGA/HDMI output ○ Memory ○ LED ○ 7 segment disply ○ SD Card ○ SoC ● FPGA has different number of logic cells ○ Which decides how complex the design can be ● Elbert V2 for beginner ● Nexys A7 for more complex design
  • 16. Parallel ● Software is serial ● Hardware is parallel ● Every modules work simultaneously ● Software optimizatios may not work with hardware ● Don’t use software thinking when designing hardware B() MUX A() R C
  • 17. Latency v.s. Throughput ● CPU performance depends on latency ● Low latency means that the instruction can be completed quickly ● Instruction has order dependency ● GPU only care how long it takes to finish a frame ● Pixel doesn’t have order dependency ● Pixels can be executed simultaneously ● If the latency of one pixel is 32 clock ● The hardware executes 64 streams simultaneously ● The throughput will be 2 pixel per clock
  • 18. Pipeline ● Hardware has many different modules ● All modules need to work simultaneously to get the best performance ● Use pipeline to split the tasks ● Every module process different pixel at the same time Surface (5 clk) Shadow (5 clk) Shading (5 clk) 15 clk/pixel Surface Shadow Shading pixel 1 pixel2 pixel 1 pixel3 pixel2 pixel 1 pixel 4 pixel 3 pixel 2 5 clk/pixel clk 0 clk 5 clk 10 clk 15
  • 20. Ray Tracing ● Ray tracing is easy to implement ● Ray hit objects then reflect to camera ● Invert the ray ● Cast a ray from each pixel of the screen ● Find a closest hit of ray and objects ● Decide the color of the pixel ○ If there is a hit, the color of hit object ○ If there is no hit, the color of background ● Ray Tracing in One Weekend at GitHub
  • 21. Ray Tracing - Reflection ● For the hit on a object ● If the object is reflective ● Cast a reflection ray from hit point ● Find the closet hit of the reflection ray ● Recursively cast reflection rays ● Blend the colosr of reflected objects into final shading
  • 22. Ray Tracing - Shadow ● For the hit on a object ● Cast a new ray from hit point toward light ● Is there is any hit of the new ray ● If yes, the pixel is in shadow ● Otherwise the pixel is not in shadow
  • 23. BVH(Bounding Volume Hierarchy) ● Accelerate the hit detection between ray and primitives ○ Quickly exclude the nodes which don’t have intersection ● Use AABB(Axis-aligned Bounding Box) to split the space ● Traverse the AABB until reach the leaf ● Detect the hits of ray between the primitive in leaf AABB AABB AABB AABB AABB Left Node Right Node
  • 25. HomebrewGPU project ● Open sourced project ○ https://github.com/fallingcat/HomebrewGPU ● Implement a basic ray tracing GPU ○ Voxel based rendering ○ BVH acceleration ○ Shading/Reflection/Refraction/Shadow ● 6 months from zero to complete
  • 28. Architecture ● Thread Generator ○ Generate one thread per clock for each ray core ○ Each thread presents one pixel ○ The thread will go through ray core and output the final color ● BVH Structure ○ BVH structure stores the BVH tree structure data ○ Accepts the node or leaf query from ray core ○ Output the node or leaf data to ray core ● Primitive Unit ○ Primitive Unit stores the raw data of all primitives ○ Accepts the query from ray core and output primitive data ● Ray Core ○ Ray core process one thread to output the final color ○ Accepts the thread from thread generator or reflection/refraction ray ● Frame Buffer Writer ○ Cache the output of ray cores and write the pixel to frame buffer ○ Some threads with reflection/refraction take longer to get the final color ○ Wait util all threads in one cache set are finished then write the data to the frame buffer
  • 29. Architecture ● Surface stage ○ Process the ray from camera and find the closest hit of the ray ○ Pass the hit information to next stage ● Shadow stage ○ Cast a ray from closest hit position to light source ○ It will pass the shadow information to next stage ● Shade stage ○ Use the closest hit information to decide if it's the final color or reflection/refraction will occur ○ Cast a reflection/refraction ray and pass the data back to surface stage ○ Recursively feed back to surface stage
  • 31. Q & A ● Owen (fallingcat@gmail.com) ● https://github.com/fallingcat/Ho mebrewGPU