SlideShare a Scribd company logo
1 of 42
Download to read offline
Anatomy of ROCgdb
(GDB for AMD GPUs)
GNU Cauldron 2022
Pedro Alves, Simon Marchi, Lancelot Six,
Zoran Zaric, Tony Tye, Laurent Morichetti
2 |
Cautionary Statement
This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) such AMD’s vision, mission and focus; AMD’s market
opportunity and total addressable markets; AMD’s technology and architecture roadmaps; the features, functionality, performance, availability, timing and expected
benefits of future AMD products and product roadmaps; AMD’s path forward in data center, PCs and gaming; AMD’s market and financial momentum; and the
expected benefits from the acquisition of Xilinx, Inc., which are made pursuant to the Safe Harbor provisions of the Private Securities Litigation Reform Act of 1995.
Forward-looking statements are commonly identified by words such as "would," "may," "expects," "believes," "plans," "intends," "projects" and other terms with
similar meaning. Investors are cautioned that the forward-looking statements in this presentation are based on current beliefs, assumptions and expectations, speak
only as of the date of this presentation and involve risks and uncertainties that could cause actual results to differ materially from current expectations. Such
statements are subject to certain known and unknown risks and uncertainties, many of which are difficult to predict and generally beyond AMD's control, that could
cause actual results and other future events to differ materially from those expressed in, or implied or projected by, the forward-looking information and statements.
Investors are urged to review in detail the risks and uncertainties in AMD’s Securities and Exchange Commission filings, including but not limited to AMD’s most
recent reports on Forms 10-K and 10-Q. AMD does not assume, and hereby disclaims, any obligation to update forward-looking statements made in this
presentation, except as may be required by law.
3 |
Outline
• What are ROCgdb / ROCm™ / HIP
• GPU compute kernels (HIP)
• GPU debugging challenges
• ROCgdb's component diagram
• GPU + Host threads under the same inferior, target stack
• SIMT lanes, commands, and lane divergence
• DWARF extensions
• Address spaces
• Other related contributions
• Upstreaming status
4 |
What is ROCgdb
• GDB port (+extras) targeting AMD GPUs
• Debug ROCm™ (Radeon Open Compute) applications
• HIP (Heterogeneous Interface for Portability)
• OpenCL™
• Offload compute kernel workloads on AMD GPUs
5 |
What is HIP
• Heterogeneous-compute Interface for Portability
• C++ runtime API and kernel language
• Create portable applications:
• Run on AMD's accelerators as well as CUDA devices
• Uses the underlying Radeon Open Compute (ROCm™) or CUDA platform installed on a system
HIP:
• Is open-source
• Provides API to leverage GPU acceleration
• Syntactically similar to CUDA
• Good talk if you want to learn more:
• https://www.exascaleproject.org/event/amd-gpuprogramming-hip/
6 |
GPU compute kernels (HIP example)
__device__ void bar (int *out) { … }
__device__ void foo (int *out) { … }
__global__ void kernel (int *out) {
int tid_x = hipBlockIdx_x * hipBlockDim_x + hipThreadIdx_x;
if (tid_x % 2)
foo (out);
else
bar (out);
}
int main () {
int *device_out;
hipMalloc (&device_out, 16 * 16 * sizeof (int));
kernel<<<16, 16>>> (device_out);
std::vector<int> res (16 * 16);
hipMemcpy (res.data (), device_out, 16 * 16 * sizeof (int), hipMemcpyDeviceToHost);
}
GPU/device
code
CPU/host
code
7 |
GPU Debugging Challenges
• Multiple memory address spaces
• Many scalar registers
• Many wide vector registers
• Language threads of execution => lanes in a SIMD/SIMT execution model
Example GPU Hardware
…
VGPR 0
…
VGPR 1
…
VGPR 255
…
Variable
X
Single Source
Language Thread
SIMD/SIMT Execution Model
Lane 0 Lane 1 Lane 3
Lane 2 Lane 4 Lane 5 Lane 63
8 |
ROCgdb's component diagram
+---------------------------+ +-------------+ +------------------+
| GDB | amd-dbgapi target | <-> | AMD | | Linux kernel |
| +-------------------+ | Debug | +--------+ |
| | amdgcn gdbarch | <-> | API | <=> | kfd | |
| +-------------------+ | | | driver | |
| | solib-rocm | <-> | (dbgapi.so) | +--------+---------+
+---------------------------+ +-------------+
^
|
+---------------------+
| Code Object Manager |
| (libamd_comgr.so) |
+---------------------+
BFD is not used // both target_ops and gdbarch talk to dbgapi
9 |
Host threads and GPU threads (waves) under single inferior
(gdb) info threads
Id Target Id Frame
1 Thread ... (LWP 476966) main () from libhsa-runtime64.so.1
2 Thread ... (LWP 476969) in ioctl () at syscall-template.S:78
4 Thread ... (LWP 477504) in ioctl () at syscall-template.S:78
* 5 AMDGPU Wave 1:1:1:1 (0,0,0)/0 my_kernel () at kernel.cc:41
6 AMDGPU Wave 1:1:1:2 (0,0,0)/1 my_kernel () at kernel.cc:41
7 AMDGPU Wave 1:1:1:3 (0,0,0)/2 my_kernel () at kernel.cc:41
...
• GDB GPU threads are mapped to GPU waves
• Same program & unified memory
• Those wave Id numbers will be explained shortly
10 |
GPU Threads (waves)'s Target Id
(gdb) info threads
...
/- agent
| /- queue
| | /- dispatch
| | | /- wave id
| | | |
9 AMDGPU Wave 1:2:1:3 (0,0,0)/2 my_kernel () at kernel.cc:41
^^^^^ |
| - wave number in work group
- work group coordinates in work grid
...
(and yes, "info {agent, queue, dispatch}" commands)
11 |
Host + GPU threads under single inferior, target stack
(gdb) maint print target-stack
The current target stack is:
- amd-dbgapi (GPU debugging using the AMD Debugger API) # arch_stratum
- multi-thread (multi-threaded child process.) # thread_stratum
- native (Native process) # process_stratum
- exec (Local exec file) # file_stratum
- None (None) # dummy_stratum
(gdb)
• New target on top of the stack, in the arch_stratum slot
• Pushed when the (native) inferior is started, un-pushed on kill/detach/exit
12 |
Host + GPU threads under single inferior, target stack
bool
amd_dbgapi_target::foo_target_method (....)
{
if (!ptid_is_gpu (inferior_ptid))
return beneath ()->foo_target_method (....);
// handle GPU things.
}
13 |
ptid_is_gpu hack^W in detail
/* Return true if the given ptid is a GPU thread (wave) ptid. */
static inline bool ptid_is_gpu (ptid_t ptid) {
/* FIXME: Currently using values that are known not to conflict with
other processes to indicate if it is a GPU thread. ptid.pid 1 is
the init process and is the only process that could have a
ptid.lwp of 1. The init process cannot have a GPU. No other
process can have a ptid.lwp of 1. The GPU wave ID is stored in
the ptid.tid. */
return ptid.pid () != 1 && ptid.lwp () == 1;
}
• Same target stack as the native target => make sure gpu ptids don't collide with host threads
• gpu ptids: (process_id, 1, wave_id)
14 |
Host + GPU threads under single inferior, target stack redesign?
• We've wished the amd-dbgapi target was a process_stratum target
• Not needing ptid_is_gpu would be great
• We've experimented with and/or debated solutions, including:
• Removing restriction of only one target per stratum => list of targets per stratum
• Making each inferior have a set of target stacks, one per device
• However:
• Changes are invasive and in core of gdb => not good to carry downstream
• OTOH, hard to justify changes upstream if no upstream port needs them
• Every problem we've ran into is solvable in the current design (minus ptid hack)
• Our plan is to upstream using current stack design, ptid hack included
• Does not break any current target
• Does not prevent other targets from doing something different
• Target stack redesign can then happen upstream, w/ at least one port making use of it
15 |
SIMT Lanes
New entity under threads: threads become vectorized, multiple
lanes under one thread.
GDB threads are mapped to GPU waves. All lanes
progress side-by-side forming a wavefront.
One physical PC for the whole thread (for all lanes), but:
• Each lane works with its own slice of the register set, on its
share of data, its version of locals in scope.
• Lanes can be seen as multiple "regular" threads running in
lockstep.
(Note: lane divergence => provides illusion that different lanes
execute code at different PCs. More later.)
"current lane" concept added (augmenting "current inferior",
"current thread").
…
VGPR 0
…
VGPR 1
…
VGPR 255
…
Variable
X
Single Source
Language Thread
SIMD/SIMT Execution Model
Lane 0 Lane 1 Lane 3
Lane 2 Lane 4 Lane 5 Lane 63
16 |
SIMT Lanes, command examples
(gdb) info lanes
Id State Target Id Frame
1 A AMDGPU Lane 1:1:1:1/1 (0,0,0)[1,0,0] my_kernel ...
2 A AMDGPU Lane 1:1:1:1/2 (0,0,0)[2,0,0] my_kernel ...
3 A AMDGPU Lane 1:1:1:1/3 (0,0,0)[3,0,0] my_kernel ...
...
63 A AMDGPU Lane 1:1:1:1/63 (0,0,0)[63,0,0] my_kernel ...
• Usage: info lanes [-all | -active | -inactive]... [ID]...
17 |
SIMT Lanes, command examples
(gdb) lane 2
[Switching to thread 5, lane 2 (AMDGPU Lane 1:1:1:1/2 (0,0,0)[2,0,0])]
#0 my_kernel (C_d=0x7fffe5c00000, ) at kernel.cc:41
41 size_t offset = (hipBlockIdx_x * hipBlockDim_x + hipThreadIdx_x);
(gdb) c
Continuing.
[Switching to thread 5, lane 0 (AMDGPU Lane 1:2:1:1/0 (0,0,0)[0,0,0])]
Thread 5 "dw2-lane-pc" hit Breakpoint 1, with lanes [0-4 10-20],
func (gid=0, in=..., out=...) at kernel.cc:89
89 {
(gdb) b 155 if $_lane > 3
18 |
SIMT Lanes, command examples
(gdb) lane 2
…
(gdb) print local_func_var
$1 = 123
(gdb) lane 3
…
(gdb) print local_func_var
$2 = 500
(gdb) lane apply –active all print local_func_var
Lane 2 (AMDGPU Lane 1:2:1:2/2 (1,0,0)[2,0,0]):
$3 = 123
Lane 3 (AMDGPU Lane 1:2:1:2/3 (1,0,0)[3,0,0]):
$4 = 500
19 |
SIMT Lanes's Target Id
/- agent
| /- queue
| | /- dispatch
| | | /- wave id
| | | |
| | | |
AMDGPU Lane 1:2:1:3/6 (0,0,0)[4,1,3]
| ^^^^^ ^^^^^
| | |
| | - work item coordinates in work group
| - work group coordinates in work grid
- lane index
20 |
Lane divergence
// if (foo (lid)) {
NoOp;
// } else {
elem = in[lid] + 3;
// }
if (foo (lid)) {
elem = in[lid] + 1;
} else {
elem = in[lid] + 3;
}
// if (foo (lid)) {
elem = in[lid] + 1;
// } else {
NoOp;
// }
L0 L1 L2 L3 L4 L5 L6 L7 ... L31
L2 L3 L4 L5 L6 L7 L8 L0 L1 L9 L10 L11 ... L31
Else lanes Then lanes
21 |
Without lane divergence support, step 1
Stepping stops in all branches => surprising
__device__ void
function (unsigned tid, const int *in, int *out)
{
int elem;
>> (1) if (tid % 2) <<<<<<<<<
(3) elem = in[tid] + 1;
else
(2) elem = in[tid] + 3;
(4) atomicAdd (out, elem);
}
22 |
Without lane divergence support, step 2
Stepping stops in all branches => surprising
__device__ void
function (unsigned tid, const int *in, int *out)
{
int elem;
(1) if (tid % 2)
(3) elem = in[tid] + 1;
else
>> (2) elem = in[tid] + 3; <<<<<<<<<
(4) atomicAdd (out, elem);
}
23 |
Without lane divergence support, step 3
Stepping stops in all branches => surprising
__device__ void
function (unsigned tid, const int *in, int *out)
{
int elem;
(1) if (tid % 2)
>> (3) elem = in[tid] + 1; <<<<<<<<<
else
(2) elem = in[tid] + 3;
(4) atomicAdd (out, elem);
}
24 |
Without lane divergence support, step 4
Stepping stops in all branches => surprising
__device__ void
function (unsigned tid, const int *in, int *out)
{
int elem;
(1) if (tid % 2)
(3) elem = in[tid] + 1;
else
(2) elem = in[tid] + 3;
>> (4) atomicAdd (out, elem); <<<<<<<<<
}
25 |
With lane divergence support, step 1
Stepping doesn't stop if current lane is inactive => intuitive
__device__ void
function (unsigned tid, const int *in, int *out)
{
int elem;
>> (1) if (tid % 2) <<<<<<<<<
(X) elem = in[tid] + 1;
else
(2) elem = in[tid] + 3;
(3) atomicAdd (out, elem);
}
26 |
With lane divergence support, step 2
Stepping doesn't stop if current lane is inactive => intuitive
__device__ void
function (unsigned tid, const int *in, int *out)
{
int elem;
(1) if (tid % 2)
(X) elem = in[tid] + 1;
else
>> (2) elem = in[tid] + 3; <<<<<<<<<
(3) atomicAdd (out, elem);
}
27 |
With lane divergence support, step 3
Stepping doesn't stop if current lane is inactive => intuitive
__device__ void
function (unsigned tid, const int *in, int *out)
{
int elem;
(1) if (tid % 2)
(X) elem = in[tid] + 1;
else
(2) elem = in[tid] + 3;
>> (3) atomicAdd (out, elem); <<<<<<<<<
}
28 |
Lane divergence, lane state
WITHOUT lane divergence debug info
(gdb) info lanes
Id State Target Id Frame
1 A AMDGPU Lane 1:1:1:1/1 (0,0,0)[1,0,0] kernel.cc:34
2 I AMDGPU Lane 1:1:1:1/2 (0,0,0)[2,0,0] <inactive>
3 A AMDGPU Lane 1:1:1:1/3 (0,0,0)[3,0,0] kernel.cc:34
...
63 A AMDGPU Lane 1:1:1:1/63 (0,0,0)[63,0,0] kernel.cc:41
A - active / I - inactive
29 |
Lane divergence, lane state
WITH lane divergence debug info
(gdb) info lanes
Id State Target Id Frame
1 A AMDGPU Lane 1:1:1:1/1 (0,0,0)[1,0,0] kernel.cc:34
2 D AMDGPU Lane 1:1:1:1/2 (0,0,0)[2,0,0] kernel.cc:41
3 A AMDGPU Lane 1:1:1:1/3 (0,0,0)[3,0,0] kernel.cc:34
...
63 A AMDGPU Lane 1:1:1:1/63 (0,0,0)[63,0,0] kernel.cc:41
A - active / D - divergent
30 |
Lane divergence, lane PC
Use source/logical PC instead of physical PC throughout
/* The frame's source/logical `resume' address. Returns the physical
thread-wide PC register. */
extern CORE_ADDR get_frame_pc (struct frame_info *);
+ /* The frame's source/logical `resume' address. This returns the
+ source/logical PC register, not the physical register. */
+ extern CORE_ADDR get_frame_lane_pc (struct frame_info *);
NEW!
31 |
DWARF extensions
DWARF Extensions For Heterogeneous Debugging
https://llvm.org/docs/AMDGPUDwarfExtensionsForHeterogeneousDebugging.html
• Allow Location Description on the DWARF Expression Stack
• Generalize CFI & DWARF Base Objects to Allow Any locdesc Kind
• Generalize DWARF Operation Expressions to Support Multiple Places
• Generalize Offsetting of Location Descriptions
• General Support for Address Spaces
• Operations to create Vector Composite locdescs
• Support for Divergent Control Flow of SIMT Hardware
• More...
32 |
DWARF extensions, Cauldron 2021
• Last year's Cauldron presentation, by Tony Tye, Scott Linder, and Zoran Zaric:
• DWARF Extensions for Optimized SIMT/SIMD (GPU) Debugging
• https://www.youtube.com/watch?v=Iv2WO67nklc
33 |
Architecture address spaces
• Independent from the current global memory concept
• Not part of source language syntax => address space cannot be defined as type qualifier
• Part of an address information => CORE_ADDR concept needs extending
• Require special pointer and reference type handling (and potentially more)
• Should not be exposed in user expressions, except when creating an address directly
• Address spaces can be relative to lane/wave/core/device
Per lane memory
Per wave memory
Core local
Per lane memory
Per wave memory
Per lane memory
Per wave memory
Per lane memory
Per wave memory
34 |
Architecture address spaces
Not part of source language syntax => address space cannot be defined as type qualifier
__device__ int global_var;
__device__ void func (int *arg) { // arg can point to memory in any address space
int local_var[3] = { *arg };
// …
if (local_var[1] == global_var) {
// …
}
}
• Typically, local variables => private_lane address space
• But, for optimization reasons, they could be put elsewhere
• DWARF describes where that is
35 |
Architecture address spaces, CORE_ADDR
• CORE_ADDR and the typical bit hacks would work, though high bits won't always be free:
• Pointer/memory tagging on Aarch64, soon x86 too (UAI for AMD, LAM for Intel)
• CORE_ADDR better represents offset into address space
• Introducing tuple to carry both address space and offset:
struct address {
addr_space_id addr_space;
CORE_ADDR offset;
};
• Needed mostly for addresses that come from DWARF debug info, and user expressions
• Many many places can infer global (default) address space from context
=> can continue working with CORE_ADDR
36 |
Address spaces notation
Introducing the '#' operator to compose pointer from:
• An address space name (maintenance print address-spaces)
• An offset
(gdb) p &k
$1 = (int *) private_lane#0x0
(gdb) p private_lane#0x1
Operation: OP_ASPACE
Operation: OP_LONG
Type: int
Constant: 0x0000000000000001
String: private_lane
$2 = (void *) private_lane#0x1
37 |
Architecture address spaces in DWARF
• Address space location description (DW_OP_LLVM_form_aspace_address)
# Variable located at address 0x0 of private_lane (0x5) address space
DW_TAG_variable
…
DW_AT_location
DW_OP_lit0 # address 0x0
DW_OP_lit5 # address space 0x5 (private_lane)
DW_OP_LLVM_form_aspace_address # pops two arguments from stack
• Pointer and reference type DIE attribute (DW_AT_LLVM_address_space)
# Type of a pointer object which holds a private_lane (0x5) address
DW_TAG_pointer_type
…
DW_AT_LLVM_address_space 0x5 # different from DW_AT_address_class
38 |
Other related contributions
• Ctrl-C redesign, there's a separate talk for this:
• Redesigning GDB's Ctrl-C handling
• Performance improvements for large number of threads
• List of threads with pending status
• Per-inferior ptid -> thread map
• Commit-resumed
• Step over clone and thread exit
• tail end of kernel code contains kernel exit instruction
39 |
Upstreaming status
• Linux Kernel module (kfd)
• Module exists upstream, but does not support debug there
• Finalizing debug interface, and plan to upstream soon ™
• https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver
• AMD Debug API (amd-dbgapi)
• https://github.com/ROCm-Developer-Tools/ROCdbgapi
• DWARF extensions
• DWARF for GPUs pseudo informal working group: AMD, Intel, Perforce (so far)
• Some bits agreed in group and filed for DWARF v6:
• DW_OP_push_lane, DW_AT_num_lanes
• Working on submitting rest
• GDB
• https://github.com/ROCm-Developer-Tools/ROCgdb
• Some BFD / binutils bits merged
• GDB submission in preparation
40 |
Disclaimer & Attribution
©2022 Advanced Micro Devices, Inc. All rights reserved.
AMD, the AMD Arrow logo, AMD Instinct™, AMD ROCm™, Radeon™, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product
names used in this publication are for identification purposes only and may be trademarks of their respective companies.
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions, and typographical errors. The
information contained herein is subject to change and may be rendered inaccurate releases, for many reasons, including but not limited to product and roadmap
changes, component and motherboard version changes, new model and/or product differences between differing manufacturers, software changes, BIOS flashes,
firmware upgrades, or the like. Any computer system has risks of security vulnerabilities that cannot be completely prevented or mitigated. AMD assumes no
obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this informat
ion and to make changes from time to
time to the content hereof without obligation of AMD to notify any person of such revisions or changes.
THIS INFORMATION IS PROVIDED 'AS IS." AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECTTO THE CONTENTS HEREOF AND
ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS, OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY
DISCLAIMS ANY IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY
, OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT
WILL AMD BE LIABLE TO ANY PERSON FOR ANY RELIANCE, DIRECT, INDIRECT, SPECIAL, OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE
USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLYADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
42 |
The end, time for questions
Anatomy of ROCgdb (GDB for AMD GPUs)

More Related Content

Similar to Anatomy of ROCgdb presentation at gcc cauldron 2022

GS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill BilodeauGS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill BilodeauAMD Developer Central
 
Introduction to cuda geek camp singapore 2011
Introduction to cuda   geek camp singapore 2011Introduction to cuda   geek camp singapore 2011
Introduction to cuda geek camp singapore 2011Raymond Tay
 
gpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsngpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsnARUNACHALAM468781
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Akihiro Hayashi
 
ELC-E Linux Awareness
ELC-E Linux AwarenessELC-E Linux Awareness
ELC-E Linux AwarenessPeter Griffin
 
Advanced Debugging with GDB
Advanced Debugging with GDBAdvanced Debugging with GDB
Advanced Debugging with GDBDavid Khosid
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugginglibfetion
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeOfer Rosenberg
 
lecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdflecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdfTigabu Yaya
 
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation GuideBKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation GuideLinaro
 
A beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDAA beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDAPiyush Mittal
 
Challenges in GPU compilers
Challenges in GPU compilersChallenges in GPU compilers
Challenges in GPU compilersAnastasiaStulova
 
Introduction to CUDA
Introduction to CUDAIntroduction to CUDA
Introduction to CUDARaymond Tay
 
Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDAMartin Peniak
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architectureDhaval Kaneria
 
Simd programming introduction
Simd programming introductionSimd programming introduction
Simd programming introductionChamp Yen
 

Similar to Anatomy of ROCgdb presentation at gcc cauldron 2022 (20)

GS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill BilodeauGS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill Bilodeau
 
Can FPGAs Compete with GPUs?
Can FPGAs Compete with GPUs?Can FPGAs Compete with GPUs?
Can FPGAs Compete with GPUs?
 
Introduction to cuda geek camp singapore 2011
Introduction to cuda   geek camp singapore 2011Introduction to cuda   geek camp singapore 2011
Introduction to cuda geek camp singapore 2011
 
gpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsngpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsn
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
 
ELC-E Linux Awareness
ELC-E Linux AwarenessELC-E Linux Awareness
ELC-E Linux Awareness
 
Advanced Debugging with GDB
Advanced Debugging with GDBAdvanced Debugging with GDB
Advanced Debugging with GDB
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugging
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universe
 
lecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdflecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdf
 
Programar para GPUs
Programar para GPUsProgramar para GPUs
Programar para GPUs
 
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation GuideBKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
 
A beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDAA beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDA
 
Challenges in GPU compilers
Challenges in GPU compilersChallenges in GPU compilers
Challenges in GPU compilers
 
Introduction to CUDA
Introduction to CUDAIntroduction to CUDA
Introduction to CUDA
 
Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDA
 
Lecture 04
Lecture 04Lecture 04
Lecture 04
 
Gpgpu
GpgpuGpgpu
Gpgpu
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architecture
 
Simd programming introduction
Simd programming introductionSimd programming introduction
Simd programming introduction
 

More from ssuser866937

GNU Toolchain Infrastructure at gcc cauldron
GNU Toolchain Infrastructure at gcc cauldronGNU Toolchain Infrastructure at gcc cauldron
GNU Toolchain Infrastructure at gcc cauldronssuser866937
 
cauldron-2022-docs-bof at gcc cauldron in 2022
cauldron-2022-docs-bof at gcc cauldron in 2022cauldron-2022-docs-bof at gcc cauldron in 2022
cauldron-2022-docs-bof at gcc cauldron in 2022ssuser866937
 
Cauldron_2022_ctf_frame at gcc cauldron 2022 in prague
Cauldron_2022_ctf_frame at gcc cauldron 2022 in pragueCauldron_2022_ctf_frame at gcc cauldron 2022 in prague
Cauldron_2022_ctf_frame at gcc cauldron 2022 in praguessuser866937
 
BoF-OpenMP-OpenACC-Offloading-Cauldron2022.pdf
BoF-OpenMP-OpenACC-Offloading-Cauldron2022.pdfBoF-OpenMP-OpenACC-Offloading-Cauldron2022.pdf
BoF-OpenMP-OpenACC-Offloading-Cauldron2022.pdfssuser866937
 
2022-ranger-update-Cauldron for gcc versions
2022-ranger-update-Cauldron for gcc versions2022-ranger-update-Cauldron for gcc versions
2022-ranger-update-Cauldron for gcc versionsssuser866937
 
2022 Cauldron Value Numbering for gcc versions
2022 Cauldron Value Numbering for gcc versions2022 Cauldron Value Numbering for gcc versions
2022 Cauldron Value Numbering for gcc versionsssuser866937
 
2022-Cauldron-If-Conversion-for-a-Partially-Predicated-VLIW-Architecture.pdf
2022-Cauldron-If-Conversion-for-a-Partially-Predicated-VLIW-Architecture.pdf2022-Cauldron-If-Conversion-for-a-Partially-Predicated-VLIW-Architecture.pdf
2022-Cauldron-If-Conversion-for-a-Partially-Predicated-VLIW-Architecture.pdfssuser866937
 
2022 Cauldron analyzer talk from david malcolm
2022 Cauldron analyzer talk from david malcolm2022 Cauldron analyzer talk from david malcolm
2022 Cauldron analyzer talk from david malcolmssuser866937
 
OpenMP-OpenACC-Offload-Cauldron2022-1.pdf
OpenMP-OpenACC-Offload-Cauldron2022-1.pdfOpenMP-OpenACC-Offload-Cauldron2022-1.pdf
OpenMP-OpenACC-Offload-Cauldron2022-1.pdfssuser866937
 
cs.ds-2211.13454.pdf
cs.ds-2211.13454.pdfcs.ds-2211.13454.pdf
cs.ds-2211.13454.pdfssuser866937
 

More from ssuser866937 (10)

GNU Toolchain Infrastructure at gcc cauldron
GNU Toolchain Infrastructure at gcc cauldronGNU Toolchain Infrastructure at gcc cauldron
GNU Toolchain Infrastructure at gcc cauldron
 
cauldron-2022-docs-bof at gcc cauldron in 2022
cauldron-2022-docs-bof at gcc cauldron in 2022cauldron-2022-docs-bof at gcc cauldron in 2022
cauldron-2022-docs-bof at gcc cauldron in 2022
 
Cauldron_2022_ctf_frame at gcc cauldron 2022 in prague
Cauldron_2022_ctf_frame at gcc cauldron 2022 in pragueCauldron_2022_ctf_frame at gcc cauldron 2022 in prague
Cauldron_2022_ctf_frame at gcc cauldron 2022 in prague
 
BoF-OpenMP-OpenACC-Offloading-Cauldron2022.pdf
BoF-OpenMP-OpenACC-Offloading-Cauldron2022.pdfBoF-OpenMP-OpenACC-Offloading-Cauldron2022.pdf
BoF-OpenMP-OpenACC-Offloading-Cauldron2022.pdf
 
2022-ranger-update-Cauldron for gcc versions
2022-ranger-update-Cauldron for gcc versions2022-ranger-update-Cauldron for gcc versions
2022-ranger-update-Cauldron for gcc versions
 
2022 Cauldron Value Numbering for gcc versions
2022 Cauldron Value Numbering for gcc versions2022 Cauldron Value Numbering for gcc versions
2022 Cauldron Value Numbering for gcc versions
 
2022-Cauldron-If-Conversion-for-a-Partially-Predicated-VLIW-Architecture.pdf
2022-Cauldron-If-Conversion-for-a-Partially-Predicated-VLIW-Architecture.pdf2022-Cauldron-If-Conversion-for-a-Partially-Predicated-VLIW-Architecture.pdf
2022-Cauldron-If-Conversion-for-a-Partially-Predicated-VLIW-Architecture.pdf
 
2022 Cauldron analyzer talk from david malcolm
2022 Cauldron analyzer talk from david malcolm2022 Cauldron analyzer talk from david malcolm
2022 Cauldron analyzer talk from david malcolm
 
OpenMP-OpenACC-Offload-Cauldron2022-1.pdf
OpenMP-OpenACC-Offload-Cauldron2022-1.pdfOpenMP-OpenACC-Offload-Cauldron2022-1.pdf
OpenMP-OpenACC-Offload-Cauldron2022-1.pdf
 
cs.ds-2211.13454.pdf
cs.ds-2211.13454.pdfcs.ds-2211.13454.pdf
cs.ds-2211.13454.pdf
 

Recently uploaded

Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Russian Call Girls Thane Swara 8617697112 Independent Escort Service Thane
Russian Call Girls Thane Swara 8617697112 Independent Escort Service ThaneRussian Call Girls Thane Swara 8617697112 Independent Escort Service Thane
Russian Call Girls Thane Swara 8617697112 Independent Escort Service ThaneCall girls in Ahmedabad High profile
 
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024APNIC
 
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With RoomVIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Roomgirls4nights
 
How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)Damian Radcliffe
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxellan12
 
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts servicevipmodelshub1
 
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130  Available With RoomVIP Kolkata Call Girl Kestopur 👉 8250192130  Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Roomdivyansh0kumar0
 
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607dollysharma2066
 
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With RoomVIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Roomdivyansh0kumar0
 
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Russian Call girls in Dubai +971563133746 Dubai Call girls
Russian  Call girls in Dubai +971563133746 Dubai  Call girlsRussian  Call girls in Dubai +971563133746 Dubai  Call girls
Russian Call girls in Dubai +971563133746 Dubai Call girlsstephieert
 
Gram Darshan PPT cyber rural in villages of india
Gram Darshan PPT cyber rural  in villages of indiaGram Darshan PPT cyber rural  in villages of india
Gram Darshan PPT cyber rural in villages of indiaimessage0108
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts servicesonalikaur4
 
On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024APNIC
 
Challengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya Shirtrahman018755
 
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceDelhi Call girls
 
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 

Recently uploaded (20)

Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
 
Russian Call Girls Thane Swara 8617697112 Independent Escort Service Thane
Russian Call Girls Thane Swara 8617697112 Independent Escort Service ThaneRussian Call Girls Thane Swara 8617697112 Independent Escort Service Thane
Russian Call Girls Thane Swara 8617697112 Independent Escort Service Thane
 
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
 
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
 
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With RoomVIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
 
How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
 
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
 
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130  Available With RoomVIP Kolkata Call Girl Kestopur 👉 8250192130  Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
 
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
 
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With RoomVIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
 
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Russian Call girls in Dubai +971563133746 Dubai Call girls
Russian  Call girls in Dubai +971563133746 Dubai  Call girlsRussian  Call girls in Dubai +971563133746 Dubai  Call girls
Russian Call girls in Dubai +971563133746 Dubai Call girls
 
Gram Darshan PPT cyber rural in villages of india
Gram Darshan PPT cyber rural  in villages of indiaGram Darshan PPT cyber rural  in villages of india
Gram Darshan PPT cyber rural in villages of india
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
 
On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024
 
Challengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya Shirt
 
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
 
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
 
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
 

Anatomy of ROCgdb presentation at gcc cauldron 2022

  • 1. Anatomy of ROCgdb (GDB for AMD GPUs) GNU Cauldron 2022 Pedro Alves, Simon Marchi, Lancelot Six, Zoran Zaric, Tony Tye, Laurent Morichetti
  • 2. 2 | Cautionary Statement This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) such AMD’s vision, mission and focus; AMD’s market opportunity and total addressable markets; AMD’s technology and architecture roadmaps; the features, functionality, performance, availability, timing and expected benefits of future AMD products and product roadmaps; AMD’s path forward in data center, PCs and gaming; AMD’s market and financial momentum; and the expected benefits from the acquisition of Xilinx, Inc., which are made pursuant to the Safe Harbor provisions of the Private Securities Litigation Reform Act of 1995. Forward-looking statements are commonly identified by words such as "would," "may," "expects," "believes," "plans," "intends," "projects" and other terms with similar meaning. Investors are cautioned that the forward-looking statements in this presentation are based on current beliefs, assumptions and expectations, speak only as of the date of this presentation and involve risks and uncertainties that could cause actual results to differ materially from current expectations. Such statements are subject to certain known and unknown risks and uncertainties, many of which are difficult to predict and generally beyond AMD's control, that could cause actual results and other future events to differ materially from those expressed in, or implied or projected by, the forward-looking information and statements. Investors are urged to review in detail the risks and uncertainties in AMD’s Securities and Exchange Commission filings, including but not limited to AMD’s most recent reports on Forms 10-K and 10-Q. AMD does not assume, and hereby disclaims, any obligation to update forward-looking statements made in this presentation, except as may be required by law.
  • 3. 3 | Outline • What are ROCgdb / ROCm™ / HIP • GPU compute kernels (HIP) • GPU debugging challenges • ROCgdb's component diagram • GPU + Host threads under the same inferior, target stack • SIMT lanes, commands, and lane divergence • DWARF extensions • Address spaces • Other related contributions • Upstreaming status
  • 4. 4 | What is ROCgdb • GDB port (+extras) targeting AMD GPUs • Debug ROCm™ (Radeon Open Compute) applications • HIP (Heterogeneous Interface for Portability) • OpenCL™ • Offload compute kernel workloads on AMD GPUs
  • 5. 5 | What is HIP • Heterogeneous-compute Interface for Portability • C++ runtime API and kernel language • Create portable applications: • Run on AMD's accelerators as well as CUDA devices • Uses the underlying Radeon Open Compute (ROCm™) or CUDA platform installed on a system HIP: • Is open-source • Provides API to leverage GPU acceleration • Syntactically similar to CUDA • Good talk if you want to learn more: • https://www.exascaleproject.org/event/amd-gpuprogramming-hip/
  • 6. 6 | GPU compute kernels (HIP example) __device__ void bar (int *out) { … } __device__ void foo (int *out) { … } __global__ void kernel (int *out) { int tid_x = hipBlockIdx_x * hipBlockDim_x + hipThreadIdx_x; if (tid_x % 2) foo (out); else bar (out); } int main () { int *device_out; hipMalloc (&device_out, 16 * 16 * sizeof (int)); kernel<<<16, 16>>> (device_out); std::vector<int> res (16 * 16); hipMemcpy (res.data (), device_out, 16 * 16 * sizeof (int), hipMemcpyDeviceToHost); } GPU/device code CPU/host code
  • 7. 7 | GPU Debugging Challenges • Multiple memory address spaces • Many scalar registers • Many wide vector registers • Language threads of execution => lanes in a SIMD/SIMT execution model Example GPU Hardware … VGPR 0 … VGPR 1 … VGPR 255 … Variable X Single Source Language Thread SIMD/SIMT Execution Model Lane 0 Lane 1 Lane 3 Lane 2 Lane 4 Lane 5 Lane 63
  • 8. 8 | ROCgdb's component diagram +---------------------------+ +-------------+ +------------------+ | GDB | amd-dbgapi target | <-> | AMD | | Linux kernel | | +-------------------+ | Debug | +--------+ | | | amdgcn gdbarch | <-> | API | <=> | kfd | | | +-------------------+ | | | driver | | | | solib-rocm | <-> | (dbgapi.so) | +--------+---------+ +---------------------------+ +-------------+ ^ | +---------------------+ | Code Object Manager | | (libamd_comgr.so) | +---------------------+ BFD is not used // both target_ops and gdbarch talk to dbgapi
  • 9. 9 | Host threads and GPU threads (waves) under single inferior (gdb) info threads Id Target Id Frame 1 Thread ... (LWP 476966) main () from libhsa-runtime64.so.1 2 Thread ... (LWP 476969) in ioctl () at syscall-template.S:78 4 Thread ... (LWP 477504) in ioctl () at syscall-template.S:78 * 5 AMDGPU Wave 1:1:1:1 (0,0,0)/0 my_kernel () at kernel.cc:41 6 AMDGPU Wave 1:1:1:2 (0,0,0)/1 my_kernel () at kernel.cc:41 7 AMDGPU Wave 1:1:1:3 (0,0,0)/2 my_kernel () at kernel.cc:41 ... • GDB GPU threads are mapped to GPU waves • Same program & unified memory • Those wave Id numbers will be explained shortly
  • 10. 10 | GPU Threads (waves)'s Target Id (gdb) info threads ... /- agent | /- queue | | /- dispatch | | | /- wave id | | | | 9 AMDGPU Wave 1:2:1:3 (0,0,0)/2 my_kernel () at kernel.cc:41 ^^^^^ | | - wave number in work group - work group coordinates in work grid ... (and yes, "info {agent, queue, dispatch}" commands)
  • 11. 11 | Host + GPU threads under single inferior, target stack (gdb) maint print target-stack The current target stack is: - amd-dbgapi (GPU debugging using the AMD Debugger API) # arch_stratum - multi-thread (multi-threaded child process.) # thread_stratum - native (Native process) # process_stratum - exec (Local exec file) # file_stratum - None (None) # dummy_stratum (gdb) • New target on top of the stack, in the arch_stratum slot • Pushed when the (native) inferior is started, un-pushed on kill/detach/exit
  • 12. 12 | Host + GPU threads under single inferior, target stack bool amd_dbgapi_target::foo_target_method (....) { if (!ptid_is_gpu (inferior_ptid)) return beneath ()->foo_target_method (....); // handle GPU things. }
  • 13. 13 | ptid_is_gpu hack^W in detail /* Return true if the given ptid is a GPU thread (wave) ptid. */ static inline bool ptid_is_gpu (ptid_t ptid) { /* FIXME: Currently using values that are known not to conflict with other processes to indicate if it is a GPU thread. ptid.pid 1 is the init process and is the only process that could have a ptid.lwp of 1. The init process cannot have a GPU. No other process can have a ptid.lwp of 1. The GPU wave ID is stored in the ptid.tid. */ return ptid.pid () != 1 && ptid.lwp () == 1; } • Same target stack as the native target => make sure gpu ptids don't collide with host threads • gpu ptids: (process_id, 1, wave_id)
  • 14. 14 | Host + GPU threads under single inferior, target stack redesign? • We've wished the amd-dbgapi target was a process_stratum target • Not needing ptid_is_gpu would be great • We've experimented with and/or debated solutions, including: • Removing restriction of only one target per stratum => list of targets per stratum • Making each inferior have a set of target stacks, one per device • However: • Changes are invasive and in core of gdb => not good to carry downstream • OTOH, hard to justify changes upstream if no upstream port needs them • Every problem we've ran into is solvable in the current design (minus ptid hack) • Our plan is to upstream using current stack design, ptid hack included • Does not break any current target • Does not prevent other targets from doing something different • Target stack redesign can then happen upstream, w/ at least one port making use of it
  • 15. 15 | SIMT Lanes New entity under threads: threads become vectorized, multiple lanes under one thread. GDB threads are mapped to GPU waves. All lanes progress side-by-side forming a wavefront. One physical PC for the whole thread (for all lanes), but: • Each lane works with its own slice of the register set, on its share of data, its version of locals in scope. • Lanes can be seen as multiple "regular" threads running in lockstep. (Note: lane divergence => provides illusion that different lanes execute code at different PCs. More later.) "current lane" concept added (augmenting "current inferior", "current thread"). … VGPR 0 … VGPR 1 … VGPR 255 … Variable X Single Source Language Thread SIMD/SIMT Execution Model Lane 0 Lane 1 Lane 3 Lane 2 Lane 4 Lane 5 Lane 63
  • 16. 16 | SIMT Lanes, command examples (gdb) info lanes Id State Target Id Frame 1 A AMDGPU Lane 1:1:1:1/1 (0,0,0)[1,0,0] my_kernel ... 2 A AMDGPU Lane 1:1:1:1/2 (0,0,0)[2,0,0] my_kernel ... 3 A AMDGPU Lane 1:1:1:1/3 (0,0,0)[3,0,0] my_kernel ... ... 63 A AMDGPU Lane 1:1:1:1/63 (0,0,0)[63,0,0] my_kernel ... • Usage: info lanes [-all | -active | -inactive]... [ID]...
  • 17. 17 | SIMT Lanes, command examples (gdb) lane 2 [Switching to thread 5, lane 2 (AMDGPU Lane 1:1:1:1/2 (0,0,0)[2,0,0])] #0 my_kernel (C_d=0x7fffe5c00000, ) at kernel.cc:41 41 size_t offset = (hipBlockIdx_x * hipBlockDim_x + hipThreadIdx_x); (gdb) c Continuing. [Switching to thread 5, lane 0 (AMDGPU Lane 1:2:1:1/0 (0,0,0)[0,0,0])] Thread 5 "dw2-lane-pc" hit Breakpoint 1, with lanes [0-4 10-20], func (gid=0, in=..., out=...) at kernel.cc:89 89 { (gdb) b 155 if $_lane > 3
  • 18. 18 | SIMT Lanes, command examples (gdb) lane 2 … (gdb) print local_func_var $1 = 123 (gdb) lane 3 … (gdb) print local_func_var $2 = 500 (gdb) lane apply –active all print local_func_var Lane 2 (AMDGPU Lane 1:2:1:2/2 (1,0,0)[2,0,0]): $3 = 123 Lane 3 (AMDGPU Lane 1:2:1:2/3 (1,0,0)[3,0,0]): $4 = 500
  • 19. 19 | SIMT Lanes's Target Id /- agent | /- queue | | /- dispatch | | | /- wave id | | | | | | | | AMDGPU Lane 1:2:1:3/6 (0,0,0)[4,1,3] | ^^^^^ ^^^^^ | | | | | - work item coordinates in work group | - work group coordinates in work grid - lane index
  • 20. 20 | Lane divergence // if (foo (lid)) { NoOp; // } else { elem = in[lid] + 3; // } if (foo (lid)) { elem = in[lid] + 1; } else { elem = in[lid] + 3; } // if (foo (lid)) { elem = in[lid] + 1; // } else { NoOp; // } L0 L1 L2 L3 L4 L5 L6 L7 ... L31 L2 L3 L4 L5 L6 L7 L8 L0 L1 L9 L10 L11 ... L31 Else lanes Then lanes
  • 21. 21 | Without lane divergence support, step 1 Stepping stops in all branches => surprising __device__ void function (unsigned tid, const int *in, int *out) { int elem; >> (1) if (tid % 2) <<<<<<<<< (3) elem = in[tid] + 1; else (2) elem = in[tid] + 3; (4) atomicAdd (out, elem); }
  • 22. 22 | Without lane divergence support, step 2 Stepping stops in all branches => surprising __device__ void function (unsigned tid, const int *in, int *out) { int elem; (1) if (tid % 2) (3) elem = in[tid] + 1; else >> (2) elem = in[tid] + 3; <<<<<<<<< (4) atomicAdd (out, elem); }
  • 23. 23 | Without lane divergence support, step 3 Stepping stops in all branches => surprising __device__ void function (unsigned tid, const int *in, int *out) { int elem; (1) if (tid % 2) >> (3) elem = in[tid] + 1; <<<<<<<<< else (2) elem = in[tid] + 3; (4) atomicAdd (out, elem); }
  • 24. 24 | Without lane divergence support, step 4 Stepping stops in all branches => surprising __device__ void function (unsigned tid, const int *in, int *out) { int elem; (1) if (tid % 2) (3) elem = in[tid] + 1; else (2) elem = in[tid] + 3; >> (4) atomicAdd (out, elem); <<<<<<<<< }
  • 25. 25 | With lane divergence support, step 1 Stepping doesn't stop if current lane is inactive => intuitive __device__ void function (unsigned tid, const int *in, int *out) { int elem; >> (1) if (tid % 2) <<<<<<<<< (X) elem = in[tid] + 1; else (2) elem = in[tid] + 3; (3) atomicAdd (out, elem); }
  • 26. 26 | With lane divergence support, step 2 Stepping doesn't stop if current lane is inactive => intuitive __device__ void function (unsigned tid, const int *in, int *out) { int elem; (1) if (tid % 2) (X) elem = in[tid] + 1; else >> (2) elem = in[tid] + 3; <<<<<<<<< (3) atomicAdd (out, elem); }
  • 27. 27 | With lane divergence support, step 3 Stepping doesn't stop if current lane is inactive => intuitive __device__ void function (unsigned tid, const int *in, int *out) { int elem; (1) if (tid % 2) (X) elem = in[tid] + 1; else (2) elem = in[tid] + 3; >> (3) atomicAdd (out, elem); <<<<<<<<< }
  • 28. 28 | Lane divergence, lane state WITHOUT lane divergence debug info (gdb) info lanes Id State Target Id Frame 1 A AMDGPU Lane 1:1:1:1/1 (0,0,0)[1,0,0] kernel.cc:34 2 I AMDGPU Lane 1:1:1:1/2 (0,0,0)[2,0,0] <inactive> 3 A AMDGPU Lane 1:1:1:1/3 (0,0,0)[3,0,0] kernel.cc:34 ... 63 A AMDGPU Lane 1:1:1:1/63 (0,0,0)[63,0,0] kernel.cc:41 A - active / I - inactive
  • 29. 29 | Lane divergence, lane state WITH lane divergence debug info (gdb) info lanes Id State Target Id Frame 1 A AMDGPU Lane 1:1:1:1/1 (0,0,0)[1,0,0] kernel.cc:34 2 D AMDGPU Lane 1:1:1:1/2 (0,0,0)[2,0,0] kernel.cc:41 3 A AMDGPU Lane 1:1:1:1/3 (0,0,0)[3,0,0] kernel.cc:34 ... 63 A AMDGPU Lane 1:1:1:1/63 (0,0,0)[63,0,0] kernel.cc:41 A - active / D - divergent
  • 30. 30 | Lane divergence, lane PC Use source/logical PC instead of physical PC throughout /* The frame's source/logical `resume' address. Returns the physical thread-wide PC register. */ extern CORE_ADDR get_frame_pc (struct frame_info *); + /* The frame's source/logical `resume' address. This returns the + source/logical PC register, not the physical register. */ + extern CORE_ADDR get_frame_lane_pc (struct frame_info *); NEW!
  • 31. 31 | DWARF extensions DWARF Extensions For Heterogeneous Debugging https://llvm.org/docs/AMDGPUDwarfExtensionsForHeterogeneousDebugging.html • Allow Location Description on the DWARF Expression Stack • Generalize CFI & DWARF Base Objects to Allow Any locdesc Kind • Generalize DWARF Operation Expressions to Support Multiple Places • Generalize Offsetting of Location Descriptions • General Support for Address Spaces • Operations to create Vector Composite locdescs • Support for Divergent Control Flow of SIMT Hardware • More...
  • 32. 32 | DWARF extensions, Cauldron 2021 • Last year's Cauldron presentation, by Tony Tye, Scott Linder, and Zoran Zaric: • DWARF Extensions for Optimized SIMT/SIMD (GPU) Debugging • https://www.youtube.com/watch?v=Iv2WO67nklc
  • 33. 33 | Architecture address spaces • Independent from the current global memory concept • Not part of source language syntax => address space cannot be defined as type qualifier • Part of an address information => CORE_ADDR concept needs extending • Require special pointer and reference type handling (and potentially more) • Should not be exposed in user expressions, except when creating an address directly • Address spaces can be relative to lane/wave/core/device Per lane memory Per wave memory Core local Per lane memory Per wave memory Per lane memory Per wave memory Per lane memory Per wave memory
  • 34. 34 | Architecture address spaces Not part of source language syntax => address space cannot be defined as type qualifier __device__ int global_var; __device__ void func (int *arg) { // arg can point to memory in any address space int local_var[3] = { *arg }; // … if (local_var[1] == global_var) { // … } } • Typically, local variables => private_lane address space • But, for optimization reasons, they could be put elsewhere • DWARF describes where that is
  • 35. 35 | Architecture address spaces, CORE_ADDR • CORE_ADDR and the typical bit hacks would work, though high bits won't always be free: • Pointer/memory tagging on Aarch64, soon x86 too (UAI for AMD, LAM for Intel) • CORE_ADDR better represents offset into address space • Introducing tuple to carry both address space and offset: struct address { addr_space_id addr_space; CORE_ADDR offset; }; • Needed mostly for addresses that come from DWARF debug info, and user expressions • Many many places can infer global (default) address space from context => can continue working with CORE_ADDR
  • 36. 36 | Address spaces notation Introducing the '#' operator to compose pointer from: • An address space name (maintenance print address-spaces) • An offset (gdb) p &k $1 = (int *) private_lane#0x0 (gdb) p private_lane#0x1 Operation: OP_ASPACE Operation: OP_LONG Type: int Constant: 0x0000000000000001 String: private_lane $2 = (void *) private_lane#0x1
  • 37. 37 | Architecture address spaces in DWARF • Address space location description (DW_OP_LLVM_form_aspace_address) # Variable located at address 0x0 of private_lane (0x5) address space DW_TAG_variable … DW_AT_location DW_OP_lit0 # address 0x0 DW_OP_lit5 # address space 0x5 (private_lane) DW_OP_LLVM_form_aspace_address # pops two arguments from stack • Pointer and reference type DIE attribute (DW_AT_LLVM_address_space) # Type of a pointer object which holds a private_lane (0x5) address DW_TAG_pointer_type … DW_AT_LLVM_address_space 0x5 # different from DW_AT_address_class
  • 38. 38 | Other related contributions • Ctrl-C redesign, there's a separate talk for this: • Redesigning GDB's Ctrl-C handling • Performance improvements for large number of threads • List of threads with pending status • Per-inferior ptid -> thread map • Commit-resumed • Step over clone and thread exit • tail end of kernel code contains kernel exit instruction
  • 39. 39 | Upstreaming status • Linux Kernel module (kfd) • Module exists upstream, but does not support debug there • Finalizing debug interface, and plan to upstream soon ™ • https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver • AMD Debug API (amd-dbgapi) • https://github.com/ROCm-Developer-Tools/ROCdbgapi • DWARF extensions • DWARF for GPUs pseudo informal working group: AMD, Intel, Perforce (so far) • Some bits agreed in group and filed for DWARF v6: • DW_OP_push_lane, DW_AT_num_lanes • Working on submitting rest • GDB • https://github.com/ROCm-Developer-Tools/ROCgdb • Some BFD / binutils bits merged • GDB submission in preparation
  • 40. 40 | Disclaimer & Attribution ©2022 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, AMD Instinct™, AMD ROCm™, Radeon™, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies. The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions, and typographical errors. The information contained herein is subject to change and may be rendered inaccurate releases, for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. Any computer system has risks of security vulnerabilities that cannot be completely prevented or mitigated. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this informat ion and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. THIS INFORMATION IS PROVIDED 'AS IS." AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECTTO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS, OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY , OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY RELIANCE, DIRECT, INDIRECT, SPECIAL, OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLYADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
  • 41.
  • 42. 42 | The end, time for questions Anatomy of ROCgdb (GDB for AMD GPUs)