SlideShare a Scribd company logo
Make Your Own Ray
Tracing GPU with
FPGA
COSCUP 2023
Owen Wu
fallingcat@gmail.com
Agenda
● Self-Intro
● Session Overview
● How to Start
○ HDL
○ EDA
○ FPGA
● Think Differently
● Ray Tracing
● HomebrewGPU projrect
Self-Intro
Self-Intro
● Game developer
○ Game engine development
○ PC
○ Console
○ Mobile
● GPU software engineer
○ Optimization from the perspective of hardware
○ AMD
○ Arm
● https://tinyurl.com/owenwu
Session Overview
Session Overview
● This session is for software engineer
● This session is NOT for hardware engineer
● Basic intro for the beginner
● Making a chip is very easy and cheap nowaday
○ 6 months from zero to a workable GPU
● Turn you algorithm into hardware, think
differently
● Many open sourced projects on GitHub
How to Start
How to Start
● HDL (Haardware Description Language)
○ Language
● EDA (Electronic Design Automation)
○ Compiler
● FPGA (Field Programmable Gate Array)
○ Hardware
HDL
Hardware Description
Language
HDL
● VHDL
○ Ada
● Verilog
○ C
● Connect modules to design a whole chip
● Clock is the way to sync all modules
○ 100M Hz - generate 100M signals in one second
● You can think module as a function in C
● Every module can only do very limited works
○ Works need to be finished in one clock
● Books for begineer
○ Programming FPGAs: Getting Started with Verilog
○ Introduction to Verilog
EDA
Electronic Design
Automation
EDA
● Convert HDL to bitstream file
● Upload bitstream file to FPGA to execute
● Many steps
○ IC Design
○ Synthesis
○ Verification
○ Physical Design
● You can think EDA as a compiler
● FPGA makers provide basic EDA for free
○ Xilinx Vivado
● You can also try EDA online
○ https://8bitworkshop.com/v3.10.1/?platform=verilog&file=cloc
k_divider.v
FPGA
Field Programmable Gate
Array
FPGA
● Upload bitstream file to configure logic blocks
● FPGA development board integrate many components
○ VGA/HDMI output
○ Memory
○ LED
○ 7 segment disply
○ SD Card
○ SoC
● FPGA has different number of logic cells
○ Which decides how complex the design can be
● Elbert V2 for beginner
● Nexys A7 for more complex design
Think Differently
Parallel
● Software is serial
● Hardware is parallel
● Every modules work simultaneously
● Software optimizatios may not work with
hardware
● Don’t use software thinking when designing
hardware
B()
MUX
A()
R
C
Latency v.s. Throughput
● CPU performance depends on latency
● Low latency means that the instruction can be completed quickly
● Instruction has order dependency
● GPU only care how long it takes to finish a frame
● Pixel doesn’t have order dependency
● Pixels can be executed simultaneously
● If the latency of one pixel is 32 clock
● The hardware executes 64 streams simultaneously
● The throughput will be 2 pixel per clock
Pipeline
● Hardware has many different modules
● All modules need to work simultaneously to get
the best performance
● Use pipeline to split the tasks
● Every module process different pixel at the
same time
Surface (5 clk) Shadow (5 clk) Shading (5 clk)
15 clk/pixel
Surface Shadow Shading
pixel 1
pixel2 pixel 1
pixel3 pixel2 pixel 1
pixel 4 pixel 3 pixel 2
5 clk/pixel
clk 0
clk 5
clk 10
clk 15
Ray Tracing
Ray Tracing
● Ray tracing is easy to implement
● Ray hit objects then reflect to camera
● Invert the ray
● Cast a ray from each pixel of the screen
● Find a closest hit of ray and objects
● Decide the color of the pixel
○ If there is a hit, the color of hit object
○ If there is no hit, the color of background
● Ray Tracing in One Weekend at GitHub
Ray Tracing - Reflection
● For the hit on a object
● If the object is reflective
● Cast a reflection ray from hit point
● Find the closet hit of the reflection ray
● Recursively cast reflection rays
● Blend the colosr of reflected objects into
final shading
Ray Tracing - Shadow
● For the hit on a object
● Cast a new ray from hit
point toward light
● Is there is any hit of the new
ray
● If yes, the pixel is in shadow
● Otherwise the pixel is not in
shadow
BVH(Bounding Volume Hierarchy)
● Accelerate the hit detection between ray and primitives
○ Quickly exclude the nodes which don’t have intersection
● Use AABB(Axis-aligned Bounding Box) to split the space
● Traverse the AABB until reach the leaf
● Detect the hits of ray between the primitive in leaf
AABB
AABB AABB
AABB AABB
Left
Node
Right
Node
HomebrewGPU Project
HomebrewGPU project
● Open sourced project
○ https://github.com/fallingcat/HomebrewGPU
● Implement a basic ray tracing GPU
○ Voxel based rendering
○ BVH acceleration
○ Shading/Reflection/Refraction/Shadow
● 6 months from zero to complete
HomebrewGPU project
Architecture
Architecture
● Thread Generator
○ Generate one thread per clock for each ray core
○ Each thread presents one pixel
○ The thread will go through ray core and output the final color
● BVH Structure
○ BVH structure stores the BVH tree structure data
○ Accepts the node or leaf query from ray core
○ Output the node or leaf data to ray core
● Primitive Unit
○ Primitive Unit stores the raw data of all primitives
○ Accepts the query from ray core and output primitive data
● Ray Core
○ Ray core process one thread to output the final color
○ Accepts the thread from thread generator or reflection/refraction
ray
● Frame Buffer Writer
○ Cache the output of ray cores and write the pixel to frame buffer
○ Some threads with reflection/refraction take longer to get the final
color
○ Wait util all threads in one cache set are finished then write the
data to the frame buffer
Architecture
● Surface stage
○ Process the ray from camera and find the closest hit of the ray
○ Pass the hit information to next stage
● Shadow stage
○ Cast a ray from closest hit position to light source
○ It will pass the shadow information to next stage
● Shade stage
○ Use the closest hit information to decide if it's the final color or
reflection/refraction will occur
○ Cast a reflection/refraction ray and pass the data back to surface
stage
○ Recursively feed back to surface stage
Design Verification
Q & A
● Owen (fallingcat@gmail.com)
● https://github.com/fallingcat/Ho
mebrewGPU
Thanks!

More Related Content

What's hot

DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3
Electronic Arts / DICE
 
Three dimensional transformations
Three dimensional transformationsThree dimensional transformations
Three dimensional transformations
Nareek
 
BKK16-315 Graphics Stack Update
BKK16-315 Graphics Stack UpdateBKK16-315 Graphics Stack Update
BKK16-315 Graphics Stack Update
Linaro
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
AMD Developer Central
 
Android Internals
Android InternalsAndroid Internals
Android Internals
Opersys inc.
 
Linux-Internals-and-Networking
Linux-Internals-and-NetworkingLinux-Internals-and-Networking
Linux-Internals-and-Networking
Emertxe Information Technologies Pvt Ltd
 
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
National Cheng Kung University
 
淺談探索 Linux 系統設計之道
淺談探索 Linux 系統設計之道 淺談探索 Linux 系統設計之道
淺談探索 Linux 系統設計之道
National Cheng Kung University
 
smallpt: Global Illumination in 99 lines of C++
smallpt:  Global Illumination in 99 lines of C++smallpt:  Global Illumination in 99 lines of C++
smallpt: Global Illumination in 99 lines of C++
鍾誠 陳鍾誠
 
Live2D with Unity - 그녀들을 움직이게 하는 기술 (알콜코더 박민근)
Live2D with Unity - 그녀들을 움직이게 하는 기술 (알콜코더 박민근)Live2D with Unity - 그녀들을 움직이게 하는 기술 (알콜코더 박민근)
Live2D with Unity - 그녀들을 움직이게 하는 기술 (알콜코더 박민근)
MinGeun Park
 
Intrinsics: Low-level engine development with Burst - Unite Copenhagen 2019
Intrinsics: Low-level engine development with Burst - Unite Copenhagen 2019 Intrinsics: Low-level engine development with Burst - Unite Copenhagen 2019
Intrinsics: Low-level engine development with Burst - Unite Copenhagen 2019
Unity Technologies
 
用十分鐘 向jserv學習作業系統設計
用十分鐘  向jserv學習作業系統設計用十分鐘  向jserv學習作業系統設計
用十分鐘 向jserv學習作業系統設計
鍾誠 陳鍾誠
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
AMD Developer Central
 
【Unite Tokyo 2018】『崩壊3rd』開発者が語るアニメ風レンダリングの極意
【Unite Tokyo 2018】『崩壊3rd』開発者が語るアニメ風レンダリングの極意【Unite Tokyo 2018】『崩壊3rd』開発者が語るアニメ風レンダリングの極意
【Unite Tokyo 2018】『崩壊3rd』開発者が語るアニメ風レンダリングの極意
UnityTechnologiesJapan002
 
The Android graphics path, in depth
The Android graphics path, in depthThe Android graphics path, in depth
The Android graphics path, in depth
Chris Simmonds
 
CEDEC 2016 Metal と Vulkan を用いた水彩画レンダリング技法の紹介
CEDEC 2016 Metal と Vulkan を用いた水彩画レンダリング技法の紹介CEDEC 2016 Metal と Vulkan を用いた水彩画レンダリング技法の紹介
CEDEC 2016 Metal と Vulkan を用いた水彩画レンダリング技法の紹介
Drecom Co., Ltd.
 
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea ArcangeliKernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
Anne Nicolas
 
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
taeseon ryu
 
Sw update elce2017
Sw update elce2017Sw update elce2017
Sw update elce2017
Stefano Babic
 
"The OpenCV Open Source Computer Vision Library: What’s New and What’s Coming...
"The OpenCV Open Source Computer Vision Library: What’s New and What’s Coming..."The OpenCV Open Source Computer Vision Library: What’s New and What’s Coming...
"The OpenCV Open Source Computer Vision Library: What’s New and What’s Coming...
Edge AI and Vision Alliance
 

What's hot (20)

DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3
 
Three dimensional transformations
Three dimensional transformationsThree dimensional transformations
Three dimensional transformations
 
BKK16-315 Graphics Stack Update
BKK16-315 Graphics Stack UpdateBKK16-315 Graphics Stack Update
BKK16-315 Graphics Stack Update
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
 
Android Internals
Android InternalsAndroid Internals
Android Internals
 
Linux-Internals-and-Networking
Linux-Internals-and-NetworkingLinux-Internals-and-Networking
Linux-Internals-and-Networking
 
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
 
淺談探索 Linux 系統設計之道
淺談探索 Linux 系統設計之道 淺談探索 Linux 系統設計之道
淺談探索 Linux 系統設計之道
 
smallpt: Global Illumination in 99 lines of C++
smallpt:  Global Illumination in 99 lines of C++smallpt:  Global Illumination in 99 lines of C++
smallpt: Global Illumination in 99 lines of C++
 
Live2D with Unity - 그녀들을 움직이게 하는 기술 (알콜코더 박민근)
Live2D with Unity - 그녀들을 움직이게 하는 기술 (알콜코더 박민근)Live2D with Unity - 그녀들을 움직이게 하는 기술 (알콜코더 박민근)
Live2D with Unity - 그녀들을 움직이게 하는 기술 (알콜코더 박민근)
 
Intrinsics: Low-level engine development with Burst - Unite Copenhagen 2019
Intrinsics: Low-level engine development with Burst - Unite Copenhagen 2019 Intrinsics: Low-level engine development with Burst - Unite Copenhagen 2019
Intrinsics: Low-level engine development with Burst - Unite Copenhagen 2019
 
用十分鐘 向jserv學習作業系統設計
用十分鐘  向jserv學習作業系統設計用十分鐘  向jserv學習作業系統設計
用十分鐘 向jserv學習作業系統設計
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
 
【Unite Tokyo 2018】『崩壊3rd』開発者が語るアニメ風レンダリングの極意
【Unite Tokyo 2018】『崩壊3rd』開発者が語るアニメ風レンダリングの極意【Unite Tokyo 2018】『崩壊3rd』開発者が語るアニメ風レンダリングの極意
【Unite Tokyo 2018】『崩壊3rd』開発者が語るアニメ風レンダリングの極意
 
The Android graphics path, in depth
The Android graphics path, in depthThe Android graphics path, in depth
The Android graphics path, in depth
 
CEDEC 2016 Metal と Vulkan を用いた水彩画レンダリング技法の紹介
CEDEC 2016 Metal と Vulkan を用いた水彩画レンダリング技法の紹介CEDEC 2016 Metal と Vulkan を用いた水彩画レンダリング技法の紹介
CEDEC 2016 Metal と Vulkan を用いた水彩画レンダリング技法の紹介
 
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea ArcangeliKernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
 
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
 
Sw update elce2017
Sw update elce2017Sw update elce2017
Sw update elce2017
 
"The OpenCV Open Source Computer Vision Library: What’s New and What’s Coming...
"The OpenCV Open Source Computer Vision Library: What’s New and What’s Coming..."The OpenCV Open Source Computer Vision Library: What’s New and What’s Coming...
"The OpenCV Open Source Computer Vision Library: What’s New and What’s Coming...
 

Similar to COSCUP 2023 - Make Your Own Ray Tracing GPU with FPGA

Smedberg niklas bringing_aaa_graphics
Smedberg niklas bringing_aaa_graphicsSmedberg niklas bringing_aaa_graphics
Smedberg niklas bringing_aaa_graphics
changehee lee
 
【Unite 2018 Tokyo】スクリプタブルレンダーパイプライン入門
【Unite 2018 Tokyo】スクリプタブルレンダーパイプライン入門【Unite 2018 Tokyo】スクリプタブルレンダーパイプライン入門
【Unite 2018 Tokyo】スクリプタブルレンダーパイプライン入門
Unity Technologies Japan K.K.
 
Developing games and graphic visualizations in Pascal
Developing games and graphic visualizations in PascalDeveloping games and graphic visualizations in Pascal
Developing games and graphic visualizations in Pascal
Michalis Kamburelis
 
Fun with Network Interfaces
Fun with Network InterfacesFun with Network Interfaces
Fun with Network Interfaces
Kernel TLV
 
Computer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming IComputer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming I
💻 Anton Gerdelan
 
High Performance Rust UI.pdf
High Performance Rust UI.pdfHigh Performance Rust UI.pdf
High Performance Rust UI.pdf
mraaaaa
 
GDCE 2015: Blueprint Components to C++
GDCE 2015: Blueprint Components to C++GDCE 2015: Blueprint Components to C++
GDCE 2015: Blueprint Components to C++
Gerke Max Preussner
 
BKK16-306 ART ii
BKK16-306 ART iiBKK16-306 ART ii
BKK16-306 ART ii
Linaro
 
Mirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in GoMirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in Go
linuxlab_conf
 
Power of WebGL (FSTO 2014)
Power of WebGL (FSTO 2014)Power of WebGL (FSTO 2014)
Power of WebGL (FSTO 2014)
Verold
 
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farmKernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Anne Nicolas
 
LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205
Linaro
 
Yocto and IoT - a retrospective
Yocto and IoT - a retrospectiveYocto and IoT - a retrospective
Yocto and IoT - a retrospective
Open-RnD
 
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
Philip Hammer
 
The rendering technology of 'lords of the fallen' philip hammer
The rendering technology of 'lords of the fallen'   philip hammerThe rendering technology of 'lords of the fallen'   philip hammer
The rendering technology of 'lords of the fallen' philip hammer
Mary Chan
 
MOVED: The challenge of SVE in QEMU - SFO17-103
MOVED: The challenge of SVE in QEMU - SFO17-103MOVED: The challenge of SVE in QEMU - SFO17-103
MOVED: The challenge of SVE in QEMU - SFO17-103
Linaro
 
Dfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshopDfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshop
Tamas K Lengyel
 
NetflixOSS meetup lightning talks and roadmap
NetflixOSS meetup lightning talks and roadmapNetflixOSS meetup lightning talks and roadmap
NetflixOSS meetup lightning talks and roadmap
Ruslan Meshenberg
 
The Next Generation of PhyreEngine
The Next Generation of PhyreEngineThe Next Generation of PhyreEngine
The Next Generation of PhyreEngine
Slide_N
 
How to build Kick Ass Games in the Cloud
How to build Kick Ass Games in the CloudHow to build Kick Ass Games in the Cloud
How to build Kick Ass Games in the Cloud
Chris Schalk
 

Similar to COSCUP 2023 - Make Your Own Ray Tracing GPU with FPGA (20)

Smedberg niklas bringing_aaa_graphics
Smedberg niklas bringing_aaa_graphicsSmedberg niklas bringing_aaa_graphics
Smedberg niklas bringing_aaa_graphics
 
【Unite 2018 Tokyo】スクリプタブルレンダーパイプライン入門
【Unite 2018 Tokyo】スクリプタブルレンダーパイプライン入門【Unite 2018 Tokyo】スクリプタブルレンダーパイプライン入門
【Unite 2018 Tokyo】スクリプタブルレンダーパイプライン入門
 
Developing games and graphic visualizations in Pascal
Developing games and graphic visualizations in PascalDeveloping games and graphic visualizations in Pascal
Developing games and graphic visualizations in Pascal
 
Fun with Network Interfaces
Fun with Network InterfacesFun with Network Interfaces
Fun with Network Interfaces
 
Computer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming IComputer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming I
 
High Performance Rust UI.pdf
High Performance Rust UI.pdfHigh Performance Rust UI.pdf
High Performance Rust UI.pdf
 
GDCE 2015: Blueprint Components to C++
GDCE 2015: Blueprint Components to C++GDCE 2015: Blueprint Components to C++
GDCE 2015: Blueprint Components to C++
 
BKK16-306 ART ii
BKK16-306 ART iiBKK16-306 ART ii
BKK16-306 ART ii
 
Mirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in GoMirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in Go
 
Power of WebGL (FSTO 2014)
Power of WebGL (FSTO 2014)Power of WebGL (FSTO 2014)
Power of WebGL (FSTO 2014)
 
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farmKernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farm
 
LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205
 
Yocto and IoT - a retrospective
Yocto and IoT - a retrospectiveYocto and IoT - a retrospective
Yocto and IoT - a retrospective
 
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
 
The rendering technology of 'lords of the fallen' philip hammer
The rendering technology of 'lords of the fallen'   philip hammerThe rendering technology of 'lords of the fallen'   philip hammer
The rendering technology of 'lords of the fallen' philip hammer
 
MOVED: The challenge of SVE in QEMU - SFO17-103
MOVED: The challenge of SVE in QEMU - SFO17-103MOVED: The challenge of SVE in QEMU - SFO17-103
MOVED: The challenge of SVE in QEMU - SFO17-103
 
Dfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshopDfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshop
 
NetflixOSS meetup lightning talks and roadmap
NetflixOSS meetup lightning talks and roadmapNetflixOSS meetup lightning talks and roadmap
NetflixOSS meetup lightning talks and roadmap
 
The Next Generation of PhyreEngine
The Next Generation of PhyreEngineThe Next Generation of PhyreEngine
The Next Generation of PhyreEngine
 
How to build Kick Ass Games in the Cloud
How to build Kick Ass Games in the CloudHow to build Kick Ass Games in the Cloud
How to build Kick Ass Games in the Cloud
 

More from Owen Wu

Unreal Fest 2023 - Lumen with Immortalis
Unreal Fest 2023 - Lumen with ImmortalisUnreal Fest 2023 - Lumen with Immortalis
Unreal Fest 2023 - Lumen with Immortalis
Owen Wu
 
Unity mobile game performance profiling – using arm mobile studio
Unity mobile game performance profiling – using arm mobile studioUnity mobile game performance profiling – using arm mobile studio
Unity mobile game performance profiling – using arm mobile studio
Owen Wu
 
[Unite Seoul 2020] Mobile Graphics Best Practices for Artists
[Unite Seoul 2020] Mobile Graphics Best Practices for Artists[Unite Seoul 2020] Mobile Graphics Best Practices for Artists
[Unite Seoul 2020] Mobile Graphics Best Practices for Artists
Owen Wu
 
[TGDF 2020] Mobile Graphics Best Practices for Artist
[TGDF 2020] Mobile Graphics Best Practices for Artist[TGDF 2020] Mobile Graphics Best Practices for Artist
[TGDF 2020] Mobile Graphics Best Practices for Artist
Owen Wu
 
[Unity Forum 2019] Mobile Graphics Optimization Guides
[Unity Forum 2019] Mobile Graphics Optimization Guides[Unity Forum 2019] Mobile Graphics Optimization Guides
[Unity Forum 2019] Mobile Graphics Optimization Guides
Owen Wu
 
[TGDF 2019] Mali GPU Architecture and Mobile Studio
[TGDF 2019] Mali GPU Architecture and Mobile Studio[TGDF 2019] Mali GPU Architecture and Mobile Studio
[TGDF 2019] Mali GPU Architecture and Mobile Studio
Owen Wu
 
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio [Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
Owen Wu
 
[GDC 2012] Enhancing Graphics in Unreal Engine 3 Titles Using AMD Code Submis...
[GDC 2012] Enhancing Graphics in Unreal Engine 3 Titles Using AMD Code Submis...[GDC 2012] Enhancing Graphics in Unreal Engine 3 Titles Using AMD Code Submis...
[GDC 2012] Enhancing Graphics in Unreal Engine 3 Titles Using AMD Code Submis...
Owen Wu
 
[TGDF 2014] 進階Shader技術
[TGDF 2014] 進階Shader技術[TGDF 2014] 進階Shader技術
[TGDF 2014] 進階Shader技術
Owen Wu
 

More from Owen Wu (9)

Unreal Fest 2023 - Lumen with Immortalis
Unreal Fest 2023 - Lumen with ImmortalisUnreal Fest 2023 - Lumen with Immortalis
Unreal Fest 2023 - Lumen with Immortalis
 
Unity mobile game performance profiling – using arm mobile studio
Unity mobile game performance profiling – using arm mobile studioUnity mobile game performance profiling – using arm mobile studio
Unity mobile game performance profiling – using arm mobile studio
 
[Unite Seoul 2020] Mobile Graphics Best Practices for Artists
[Unite Seoul 2020] Mobile Graphics Best Practices for Artists[Unite Seoul 2020] Mobile Graphics Best Practices for Artists
[Unite Seoul 2020] Mobile Graphics Best Practices for Artists
 
[TGDF 2020] Mobile Graphics Best Practices for Artist
[TGDF 2020] Mobile Graphics Best Practices for Artist[TGDF 2020] Mobile Graphics Best Practices for Artist
[TGDF 2020] Mobile Graphics Best Practices for Artist
 
[Unity Forum 2019] Mobile Graphics Optimization Guides
[Unity Forum 2019] Mobile Graphics Optimization Guides[Unity Forum 2019] Mobile Graphics Optimization Guides
[Unity Forum 2019] Mobile Graphics Optimization Guides
 
[TGDF 2019] Mali GPU Architecture and Mobile Studio
[TGDF 2019] Mali GPU Architecture and Mobile Studio[TGDF 2019] Mali GPU Architecture and Mobile Studio
[TGDF 2019] Mali GPU Architecture and Mobile Studio
 
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio [Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
 
[GDC 2012] Enhancing Graphics in Unreal Engine 3 Titles Using AMD Code Submis...
[GDC 2012] Enhancing Graphics in Unreal Engine 3 Titles Using AMD Code Submis...[GDC 2012] Enhancing Graphics in Unreal Engine 3 Titles Using AMD Code Submis...
[GDC 2012] Enhancing Graphics in Unreal Engine 3 Titles Using AMD Code Submis...
 
[TGDF 2014] 進階Shader技術
[TGDF 2014] 進階Shader技術[TGDF 2014] 進階Shader技術
[TGDF 2014] 進階Shader技術
 

Recently uploaded

Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
Hitesh Mohapatra
 
Introduction to AI Safety (public presentation).pptx
Introduction to AI Safety (public presentation).pptxIntroduction to AI Safety (public presentation).pptx
Introduction to AI Safety (public presentation).pptx
MiscAnnoy1
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
Roger Rozario
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
bijceesjournal
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
KrishnaveniKrishnara1
 
Engineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdfEngineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdf
abbyasa1014
 
john krisinger-the science and history of the alcoholic beverage.pptx
john krisinger-the science and history of the alcoholic beverage.pptxjohn krisinger-the science and history of the alcoholic beverage.pptx
john krisinger-the science and history of the alcoholic beverage.pptx
Madan Karki
 
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
ecqow
 
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURSCompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
RamonNovais6
 
The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.
sachin chaurasia
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
insn4465
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
VICTOR MAESTRE RAMIREZ
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
Madan Karki
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
Nada Hikmah
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
Dr Ramhari Poudyal
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
co23btech11018
 
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENTNATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
Addu25809
 
UNLOCKING HEALTHCARE 4.0: NAVIGATING CRITICAL SUCCESS FACTORS FOR EFFECTIVE I...
UNLOCKING HEALTHCARE 4.0: NAVIGATING CRITICAL SUCCESS FACTORS FOR EFFECTIVE I...UNLOCKING HEALTHCARE 4.0: NAVIGATING CRITICAL SUCCESS FACTORS FOR EFFECTIVE I...
UNLOCKING HEALTHCARE 4.0: NAVIGATING CRITICAL SUCCESS FACTORS FOR EFFECTIVE I...
amsjournal
 

Recently uploaded (20)

Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
 
Introduction to AI Safety (public presentation).pptx
Introduction to AI Safety (public presentation).pptxIntroduction to AI Safety (public presentation).pptx
Introduction to AI Safety (public presentation).pptx
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
 
Engineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdfEngineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdf
 
john krisinger-the science and history of the alcoholic beverage.pptx
john krisinger-the science and history of the alcoholic beverage.pptxjohn krisinger-the science and history of the alcoholic beverage.pptx
john krisinger-the science and history of the alcoholic beverage.pptx
 
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
 
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURSCompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
 
The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
 
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENTNATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
 
UNLOCKING HEALTHCARE 4.0: NAVIGATING CRITICAL SUCCESS FACTORS FOR EFFECTIVE I...
UNLOCKING HEALTHCARE 4.0: NAVIGATING CRITICAL SUCCESS FACTORS FOR EFFECTIVE I...UNLOCKING HEALTHCARE 4.0: NAVIGATING CRITICAL SUCCESS FACTORS FOR EFFECTIVE I...
UNLOCKING HEALTHCARE 4.0: NAVIGATING CRITICAL SUCCESS FACTORS FOR EFFECTIVE I...
 

COSCUP 2023 - Make Your Own Ray Tracing GPU with FPGA

  • 1. Make Your Own Ray Tracing GPU with FPGA COSCUP 2023 Owen Wu fallingcat@gmail.com
  • 2. Agenda ● Self-Intro ● Session Overview ● How to Start ○ HDL ○ EDA ○ FPGA ● Think Differently ● Ray Tracing ● HomebrewGPU projrect
  • 4. Self-Intro ● Game developer ○ Game engine development ○ PC ○ Console ○ Mobile ● GPU software engineer ○ Optimization from the perspective of hardware ○ AMD ○ Arm ● https://tinyurl.com/owenwu
  • 6. Session Overview ● This session is for software engineer ● This session is NOT for hardware engineer ● Basic intro for the beginner ● Making a chip is very easy and cheap nowaday ○ 6 months from zero to a workable GPU ● Turn you algorithm into hardware, think differently ● Many open sourced projects on GitHub
  • 8. How to Start ● HDL (Haardware Description Language) ○ Language ● EDA (Electronic Design Automation) ○ Compiler ● FPGA (Field Programmable Gate Array) ○ Hardware
  • 10. HDL ● VHDL ○ Ada ● Verilog ○ C ● Connect modules to design a whole chip ● Clock is the way to sync all modules ○ 100M Hz - generate 100M signals in one second ● You can think module as a function in C ● Every module can only do very limited works ○ Works need to be finished in one clock ● Books for begineer ○ Programming FPGAs: Getting Started with Verilog ○ Introduction to Verilog
  • 12. EDA ● Convert HDL to bitstream file ● Upload bitstream file to FPGA to execute ● Many steps ○ IC Design ○ Synthesis ○ Verification ○ Physical Design ● You can think EDA as a compiler ● FPGA makers provide basic EDA for free ○ Xilinx Vivado ● You can also try EDA online ○ https://8bitworkshop.com/v3.10.1/?platform=verilog&file=cloc k_divider.v
  • 14. FPGA ● Upload bitstream file to configure logic blocks ● FPGA development board integrate many components ○ VGA/HDMI output ○ Memory ○ LED ○ 7 segment disply ○ SD Card ○ SoC ● FPGA has different number of logic cells ○ Which decides how complex the design can be ● Elbert V2 for beginner ● Nexys A7 for more complex design
  • 16. Parallel ● Software is serial ● Hardware is parallel ● Every modules work simultaneously ● Software optimizatios may not work with hardware ● Don’t use software thinking when designing hardware B() MUX A() R C
  • 17. Latency v.s. Throughput ● CPU performance depends on latency ● Low latency means that the instruction can be completed quickly ● Instruction has order dependency ● GPU only care how long it takes to finish a frame ● Pixel doesn’t have order dependency ● Pixels can be executed simultaneously ● If the latency of one pixel is 32 clock ● The hardware executes 64 streams simultaneously ● The throughput will be 2 pixel per clock
  • 18. Pipeline ● Hardware has many different modules ● All modules need to work simultaneously to get the best performance ● Use pipeline to split the tasks ● Every module process different pixel at the same time Surface (5 clk) Shadow (5 clk) Shading (5 clk) 15 clk/pixel Surface Shadow Shading pixel 1 pixel2 pixel 1 pixel3 pixel2 pixel 1 pixel 4 pixel 3 pixel 2 5 clk/pixel clk 0 clk 5 clk 10 clk 15
  • 20. Ray Tracing ● Ray tracing is easy to implement ● Ray hit objects then reflect to camera ● Invert the ray ● Cast a ray from each pixel of the screen ● Find a closest hit of ray and objects ● Decide the color of the pixel ○ If there is a hit, the color of hit object ○ If there is no hit, the color of background ● Ray Tracing in One Weekend at GitHub
  • 21. Ray Tracing - Reflection ● For the hit on a object ● If the object is reflective ● Cast a reflection ray from hit point ● Find the closet hit of the reflection ray ● Recursively cast reflection rays ● Blend the colosr of reflected objects into final shading
  • 22. Ray Tracing - Shadow ● For the hit on a object ● Cast a new ray from hit point toward light ● Is there is any hit of the new ray ● If yes, the pixel is in shadow ● Otherwise the pixel is not in shadow
  • 23. BVH(Bounding Volume Hierarchy) ● Accelerate the hit detection between ray and primitives ○ Quickly exclude the nodes which don’t have intersection ● Use AABB(Axis-aligned Bounding Box) to split the space ● Traverse the AABB until reach the leaf ● Detect the hits of ray between the primitive in leaf AABB AABB AABB AABB AABB Left Node Right Node
  • 25. HomebrewGPU project ● Open sourced project ○ https://github.com/fallingcat/HomebrewGPU ● Implement a basic ray tracing GPU ○ Voxel based rendering ○ BVH acceleration ○ Shading/Reflection/Refraction/Shadow ● 6 months from zero to complete
  • 28. Architecture ● Thread Generator ○ Generate one thread per clock for each ray core ○ Each thread presents one pixel ○ The thread will go through ray core and output the final color ● BVH Structure ○ BVH structure stores the BVH tree structure data ○ Accepts the node or leaf query from ray core ○ Output the node or leaf data to ray core ● Primitive Unit ○ Primitive Unit stores the raw data of all primitives ○ Accepts the query from ray core and output primitive data ● Ray Core ○ Ray core process one thread to output the final color ○ Accepts the thread from thread generator or reflection/refraction ray ● Frame Buffer Writer ○ Cache the output of ray cores and write the pixel to frame buffer ○ Some threads with reflection/refraction take longer to get the final color ○ Wait util all threads in one cache set are finished then write the data to the frame buffer
  • 29. Architecture ● Surface stage ○ Process the ray from camera and find the closest hit of the ray ○ Pass the hit information to next stage ● Shadow stage ○ Cast a ray from closest hit position to light source ○ It will pass the shadow information to next stage ● Shade stage ○ Use the closest hit information to decide if it's the final color or reflection/refraction will occur ○ Cast a reflection/refraction ray and pass the data back to surface stage ○ Recursively feed back to surface stage
  • 31. Q & A ● Owen (fallingcat@gmail.com) ● https://github.com/fallingcat/Ho mebrewGPU