SlideShare a Scribd company logo
HSA Kernel Code 
(KFD v0.6) 
Advisor: 徐慰中教授 
Student: 黃昱儒 
2014/7/25
Agenda 
● Introduction to HSA 
o hUMA 
o User Level Queueing 
● HSA Driver 
o Concepts 
▪ Flow Overview 
▪ User & Hardware Queues 
o Source Code Detail 
● IOMMU 
o Concepts 
▪ GCR3 
▪ PPR 
o Source Code Detail
hUMA
User Level Queuing - Before HSA
User Level Queuing
Application 1 
Queue 1 
1. AQL Packet 
2. Ring 
3. Doorbell 
HSA Device 
Application 1 
Queue 2 
Application 3 
Queue 1 
Application 3 
Queue 1 
HSA device access 
application’s ring 
Application 
kick doorbell 
IOMMU address translation 
(VA->PA)
HSA Software Stack
HSA Software Stack 
Application 
Runtime Library 
● open(“/dev/kfd”) 
● ioctl(KFD_IOC_SET_MEMORY_POLICY) 
● ioctl(KFD_IOC_CREATE_QUEUE) 
● ioctl(KFD_IOC_DESTROY_QUEUE) 
KFD IOMMU Driver 
HSA-aware Kernel 
HSA 
Device 
IOMMU
Agenda 
● Introduction to HSA 
o hUMA 
o User Level Queueing 
● HSA Driver 
o Concepts 
▪ Flow Overview 
▪ User & Hardware Queues 
o Source Code Detail 
● IOMMU 
o Concepts 
▪ GCR3 
▪ PPR 
o Source Code Detail
Concepts - HSA Run Flow 
Application KFD Driver 
Create user queues 
Create HW queue with user 
queue information 
Enqueu AQL packets, 
kick doorbell, and wait 
signal 
Nothing 
Application finish and 
destroy queues 
Release HW queue 
Initialization 
Computation 
Finish 
User - HW 
interaction
Scheduled Policy 
1. Hardware scheduler and allows 
oversubscription (more queues than HW 
slots) 
2. HW scheduling but does not allow 
oversubscription, so create_queue requests 
fail when we run out of HW slots 
3. Not use HW scheduling, so the driver 
manually assigns queues to HW slots by 
programming registers
pasid=1 
queue_id=0 
ring_base_address 
pasid=1 
queue_id=1 
ring_base_address 
HSA GPU’s configuration register mmio address 
Software Scheduler 
pasid=0 
queue_id=0 
ring_base_address 
doorbell 
Free hardware queue_id bitmap 
pasid=0 
queue_id=1 
ring_base_address 
doorbell 
doorbell 
doorbell 
queue 
acquire 
register 
(pipe, queue) 
Physical Address
HSA GPU’s configuration register mmio address 
Hardware Scheduler 
kernel_queue 
ring_base_address 
doorbell 
queue 
acquire 
register 
(pipe=4, queue=0) 
Physical Address
Hardware Scheduler - No Oversubscription 
PM4 Packet (Type3) 
IT_RUN_LIST 
run_list 
PM4 Packet (Type3) 
IT_MAP_PROCESS 
page_table_base 
pasid 
sh_mem_config 
PM4 Packet (Type3) 
IT_MAP_QUEUES 
mqd_addr 
(Memory Queue 
Descriptoy) 
3 Processes
Hardware Scheduler - Oversubscription 
PM4 Packet (Type3) 
IT_RUN_LIST 
run_list 
PM4 Packet (Type3) 
IT_MAP_PROCESS 
page_table_base 
pasid 
sh_mem_config 
PM4 Packet (Type3) 
IT_MAP_QUEUES 
mqd_addr 
(Memory Queue 
Descriptoy) 
PM4 Packet (Type3) 
IT_RUN_LIST 
run_list
Per Application 
Per Device 
Only for HW 
scheduling 
Per HW Queue
IOCTL Command Provided by KFD 
● KFD_IOC_CREATE_QUEUE 
o Create hardware queue from application’s information (ex: ring base address) 
● KFD_IOC_DESTROY_QUEUE 
o Release hardware queue 
● KFD_IOC_UPDATE_QUEUE 
● KFD_IOC_SET_MEMORY_POLICY 
o Set cache coherent policy 
● KFD_IOC_GET_CLOCK_COUNTERS 
o Get GPU clock counter 
● KFD_IOC_GET_PROCESS_APERTURES 
o Get apertures information of GPU 
● KFD_IOC_PMC_ACQUIRE_ACCESS 
● KFD_IOC_PMC_RELEASE_ACCESS 
o Exclusive access for performance counters
HSA Driver Flow 
● System intialization 
○ module_init 
○ device_init (Called by radeon) 
● Application open “/dev/kfd” device 
● Application send ioctl 
○ KFD_IOC_SET_MEMORY_POLICY 
○ KFD_IOC_CREATE_QUEUE 
● Application send ioctl 
○ KFD_IOC_DESTROY_QUEUE 
● Application termination
module_init(kfd_module_init) 
● radeon_kfd_pasid_init 
o Initialize PASID bitmap 
● radeon_kfd_chardev_init 
o register_chrdev: /dev/kfd 
o kfd_ops 
▪ Define open, ioctl member function
kgd2kfd_device_init 
● radeon_kfd_doorbell_init(kfd); 
● radeon_kfd_interrupt_init(kfd); 
● amd_iommu_set_invalidate_ctx_cb(kfd->pdev, 
iommu_pasid_shutdown_callback); 
● device_queue_manager_init(kfd); 
o dqm->initialize 
● dqm->start(kfd->dqm);
dqm->initialize For 
KFD_SCHED_POLICY_NO_HWS* 
● Prepare pipe, queue bitmap
kfd_open 
● radeon_kfd_create_process(current) 
o Create kfd_process 
o Assign PASID
KFD_IOC_SET_MEMORY_POLICY 
● Two policy 
o cache_policy_coherent 
o cache_policy_noncoherent 
● Okra 
o default policy=cache_policy_coherent 
o alternate policy=cache_policy_noncoherent
radeon_kfd_bind_process_to_device 
● Called when user application send ioctl 
command 
● amd_iommu_bind_pasid() 
o Register iommu with this kfd_process
KFD_IOC_CREATE_QUEUE 
● Create queue with informations from 
userspace 
● pqm_create_queue 
● Return queue_id and doorbell_address to 
userspace 
o queue_id is per kfd_process 
o doorbell_address map to device mmio address
pqm_create_queue 
● find_available_queue_slot 
o Assign qid (per kfd_process) 
● dqm->register_process 
o Register process to dqm (device queue manager) 
● create_cp_queue 
o Create with queue_properties get from application 
o Map doorbell mmio address to application 
● dqm->create_queue 
● dqm->execute_queue
dqm->create_queue For 
KFD_SCHED_POLICY_NO_HWS 
● init_mqd (memory queue descriptor) 
o Store queue configuration from application 
● Find unused (pipe, queue) from dqm (device 
queue manager) 
o If no, return -EBUSY 
o Maximum = 56
dqm->execute_queue For 
KFD_SCHED_POLICY_NO_HWS 
● Write queue configuration to device 
● load_mqd 
o ring_base_addr 
o doorbell_offset 
o queue_priority 
o ...
pasid=1 
queue_id=0 
ring_base_address 
pasid=1 
queue_id=1 
ring_base_address 
HSA GPU’s configuration register mmio address 
pasid=0 
queue_id=0 
ring_base_address 
Free hardware queue_id bitmap 
queue 
select 
register 
doorbell 
pasid=0 
queue_id=1 
ring_base_address 
doorbell 
doorbell 
doorbell 
Each process can have up to 1024 queues 
(pipe, queue) 
Physical Address
kgd2kfd_device_init 
● radeon_kfd_doorbell_init(kfd); 
● radeon_kfd_interrupt_init(kfd); 
● device_iommu_pasid_init(kfd); 
● kfd_topology_add_device(kfd); 
● amd_iommu_set_invalidate_ctx_cb(kfd->pdev, 
iommu_pasid_shutdown_callback); 
● device_queue_manager_init(kfd); 
o dqm->initialize 
● dqm->start(kfd->dqm);
dqm->start For 
KFD_SCHED_POLICY_HWS* 
● pm_init (packet manager) 
● kernel_queue_init 
o kernel_queue doorbell 
o kernel_queue ring address 
o load_mqd to write kernel_queue configuration to 
device
pqm_create_queue 
● find_available_queue_slot 
o Assign qid (per kfd_process) 
● dqm->register_process 
o Register process to dqm (device queue manager) 
● create_cp_queue 
o Create with queue_properties get from application 
o Map doorbell mmio address to application 
● dqm->create_queue 
● dqm->execute_queue
dqm->create_queue For 
KFD_SCHED_POLICY_HWS* 
● init_mqd (memory queue descriptor) 
o Store queue configuration from application
dqm->execute_queue For 
KFD_SCHED_POLICY_HWS* 
● dqm->destroy_queues 
● pm_send_runlist 
o pm_create_runlist_ib 
▪ Construct pm4 packet of MAP_PROCESS and 
MAP_QUEUES type 
● Packet contains application’s ring address 
o pm->kernel_queue->acquire_packet_buffer 
▪ Get a not used entry of kernel_queue 
o pm_create_runlist 
▪ Construct pm4 packet of RUN_LIST type 
o pm->kernel_queue->submit_packet 
▪ Kick kernel queue’s doorbell
Hardware Scheduler - No Oversubscription 
PM4 Packet (Type3) 
IT_RUN_LIST 
run_list 
PM4 Packet (Type3) 
IT_MAP_PROCESS 
page_table_base 
pasid 
sh_mem_config 
PM4 Packet (Type3) 
IT_MAP_QUEUES 
mqd_addr 
(Memory Queue 
Descriptoy) 
3 Processes
Hardware Scheduler - Oversubscription 
PM4 Packet (Type3) 
IT_RUN_LIST 
run_list 
PM4 Packet (Type3) 
IT_MAP_PROCESS 
page_table_base 
pasid 
sh_mem_config 
PM4 Packet (Type3) 
IT_MAP_QUEUES 
mqd_addr 
(Memory Queue 
Descriptoy) 
PM4 Packet (Type3) 
IT_RUN_LIST 
run_list
Software Scheduling HardwareScheduling 
● Prepare (pipe, queue) bitmap 
dqm-> 
initialize 
dqm-> 
start 
● Create kfd_process 
● Assign PASID 
kfd_open 
● Get queue_id 
● Map doorbell to application 
ioctl(CREAT 
E_QUEUE) 
● init_mqd 
● Find unused (pipe, queue) to 
assign HW queue_id 
dqm- 
>create_que 
ue 
● Write queue configuration to 
device 
dqm- 
>execute_qu 
eue 
dqm-> 
initialize 
● pm_init 
● kernel_queue_init 
dqm-> 
start 
● Create kfd_process 
● Assign PASID 
kfd_open 
● init_mqd 
dqm- 
>create_que 
ue 
● Create pm4 packet 
● Kick kernel_queue’s doorbell 
dqm- 
>execute_qu 
eue 
● Get queue_id 
● Map doorbell to application 
ioctl(CREAT 
E_QUEUE)
Application Computation ... 
● HW has ring_base_addr userspace address 
o Application enqueue AQL packet and wait signal 
● Application has HW doorbell mmio address 
o Use to kick hardware 
● Driver do nothing 
● Until application send 
ioctl(KFD_IOC_DESTROY_QUEUE) or 
application finish
Haredware Queue Deactivation 
1. Application send 
ioctl(KFD_IOC_DESTROY_QUEUE) 
2. Task exit notifier
Haredware Queue Deactivation (1) 
● ioctl(KFD_IOC_DESTROY_QUEUE) 
● pqm_destroy_queue 
o dqm->destroy_queue 
o Restore queue, pipe bitmap 
o dqm->execute_queues(dqm);
dqm->destroy_queue For 
KFD_SCHED_POLICY_NO_HWS 
● destroy_mqd 
o acquire_queue(kgd, pipe_id, queue_id); 
o write_register(kgd, 
CP_HQD_DEQUEUE_REQUEST, 
DEQUEUE_REQUEST_DRAIN);
dqm->destroy_queue For 
KFD_SCHED_POLICY_HWS* 
● dqm->destroy_queues 
o pm_send_unmap_queue 
▪ Send a pm4 packet of UNMAP_QUEUES 
o pm_send_query_status(KFD_FENCE_COMPLETE 
D)
Haredware Queue Deactivation (2) 
● Task exit notifier will call 
iommu_pasid_shutdown_callback 
o Register in kgd2kfd_device_init 
->amd_iommu_set_invalidate_ctx_cb 
o Will be called in mmu_notifier’s release function 
(mmu_notifier is registered in 
radeon_kfd_bind_process_to_device 
->amd_iommu_bind_pasid)
iommu_pasid_shutdown_callback 
● pqm_destroy_queue 
o dqm->destroy_queue 
o Restore queue, pipe bitmap 
o dqm->execute_queues(dqm);
Agenda 
● Introduction to HSA 
o hUMA 
o User Level Queueing 
● HSA Driver 
o Concepts 
▪ Flow Overview 
▪ User & Hardware Queues 
o Source Code Detail 
● IOMMU 
o Concepts 
▪ GCR3 
▪ PPR 
o Source Code Detail
Introduction to IOMMU 
● User application send AQL packet into ring 
address which is virtual address 
● Device accessing need translate VA to PA 
Doorbell 
Ring 
Address
HSA GPU 
Device table 
PASID=2 
GCR3 
Assign this entry with 
kfd_process->mm->pgd 
Physical Address
PRI & PPR 
● The operating system is usually required to 
pin memory pages used for I/O. 
● IOMMU Provide mechnism to let peripheral 
to use unpinned pages for I/O. 
● Only support in AMD IOMMU_v2
PRI & PPR 
● PRI(page request interface) 
o peripheral request memory management service 
from a host OS (eg, page fault service for peripheral) 
o Issued by peripheral 
● PPR(peripheral page service request) 
o When IOMMU receives a valid PRI request, it 
creates a PPR message in request log to request 
changes to virtual address space 
o Issued by IOMMU as interrupt 
● Use to request IO page table change 
o IOMMU driver can register PPR notifier
module_init(amd_iommu_v2_init) 
● amd_iommu_register_ppr_notifier(&ppr_nb); 
o PPR callback 
▪ ppr_notifier function
Set IOMMU With PASID 
● amd_iommu_bind_pasid 
● Called when kfd_process create 
o mmu_notifier_register(&pasid_state->mn, 
pasid_state->mm); 
o amd_iommu_domain_set_gcr3(dev_state->domain, 
pasid, __pa(pasid_state->mm->pgd));
HSA GPU 
Device table 
PASID=2 
GCR3 
Assign this entry with 
kfd_process->mm->pgd
PRI & PPR Flow 
Peripheral issue PRI to IOMMU 
IOMMU write PPR request to PPR log 
(log contains fault address, pasid, 
device_id, tag, flags) 
IOMMU send interrupt to CPU
PPR Flow 
When irq comes 
readl(iommu->mmio_base + MMIO_STATUS_OFFSET); 
if (status & MMIO_STATUS_PPR_INT_MASK) 
ppr_notifier 
Register in amd_iommv_v2_init 
do_fault
do_fault 
● get_user_pages() API to pin fault pages into 
memory 
o mm_struct, fault_addr
Flow Review 
Application 
Runtime Library 
● open(“/dev/kfd”) 
● ioctl(KFD_IOC_SET_MEMORY_POLICY) 
● ioctl(KFD_IOC_CREATE_QUEUE) 
● ioctl(KFD_IOC_DESTROY_QUEUE) 
KFD IOMMU Driver 
HSA-aware Kernel 
HSA 
Device 
IOMMU
Q&A 
Thanks!
Reference 
● https://github.com/HSAFoundation/HSA-Drivers- 
Linux-AMD 
● http://www.hsafoundation.com/standards/

More Related Content

What's hot

Linux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKBLinux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKB
shimosawa
 
The TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelThe TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux Kernel
Divye Kapoor
 
ACPI Debugging from Linux Kernel
ACPI Debugging from Linux KernelACPI Debugging from Linux Kernel
ACPI Debugging from Linux Kernel
SUSE Labs Taipei
 
Spi drivers
Spi driversSpi drivers
Spi drivers
pradeep_tewani
 
U-Boot - An universal bootloader
U-Boot - An universal bootloader U-Boot - An universal bootloader
U-Boot - An universal bootloader
Emertxe Information Technologies Pvt Ltd
 
LAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
LAS16-402: ARM Trusted Firmware – from Enterprise to EmbeddedLAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
LAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
Linaro
 
HKG15-107: ACPI Power Management on ARM64 Servers (v2)
HKG15-107: ACPI Power Management on ARM64 Servers (v2)HKG15-107: ACPI Power Management on ARM64 Servers (v2)
HKG15-107: ACPI Power Management on ARM64 Servers (v2)
Linaro
 
Trusted firmware deep_dive_v1.0_
Trusted firmware deep_dive_v1.0_Trusted firmware deep_dive_v1.0_
Trusted firmware deep_dive_v1.0_
Linaro
 
LCA14: LCA14-502: The way to a generic TrustZone® solution
LCA14: LCA14-502: The way to a generic TrustZone® solutionLCA14: LCA14-502: The way to a generic TrustZone® solution
LCA14: LCA14-502: The way to a generic TrustZone® solution
Linaro
 
The Linux Kernel Scheduler (For Beginners) - SFO17-421
The Linux Kernel Scheduler (For Beginners) - SFO17-421The Linux Kernel Scheduler (For Beginners) - SFO17-421
The Linux Kernel Scheduler (For Beginners) - SFO17-421
Linaro
 
Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...
Adrian Huang
 
DPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet ProcessingDPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet Processing
Michelle Holley
 
Embedded Android : System Development - Part IV
Embedded Android : System Development - Part IVEmbedded Android : System Development - Part IV
Embedded Android : System Development - Part IV
Emertxe Information Technologies Pvt Ltd
 
U-Boot presentation 2013
U-Boot presentation  2013U-Boot presentation  2013
U-Boot presentation 2013
Wave Digitech
 
Intel® RDT Hands-on Lab
Intel® RDT Hands-on LabIntel® RDT Hands-on Lab
Intel® RDT Hands-on Lab
Michelle Holley
 
Gluster technical overview
Gluster technical overviewGluster technical overview
Gluster technical overview
Gluster.org
 
Linux Initialization Process (1)
Linux Initialization Process (1)Linux Initialization Process (1)
Linux Initialization Process (1)
shimosawa
 
Accelerated Linux Core Dump Analysis training public slides
Accelerated Linux Core Dump Analysis training public slidesAccelerated Linux Core Dump Analysis training public slides
Accelerated Linux Core Dump Analysis training public slides
Dmitry Vostokov
 
USB Drivers
USB DriversUSB Drivers
USB Drivers
Anil Kumar Pugalia
 
Virtualization Support in ARMv8+
Virtualization Support in ARMv8+Virtualization Support in ARMv8+
Virtualization Support in ARMv8+
Aananth C N
 

What's hot (20)

Linux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKBLinux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKB
 
The TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelThe TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux Kernel
 
ACPI Debugging from Linux Kernel
ACPI Debugging from Linux KernelACPI Debugging from Linux Kernel
ACPI Debugging from Linux Kernel
 
Spi drivers
Spi driversSpi drivers
Spi drivers
 
U-Boot - An universal bootloader
U-Boot - An universal bootloader U-Boot - An universal bootloader
U-Boot - An universal bootloader
 
LAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
LAS16-402: ARM Trusted Firmware – from Enterprise to EmbeddedLAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
LAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
 
HKG15-107: ACPI Power Management on ARM64 Servers (v2)
HKG15-107: ACPI Power Management on ARM64 Servers (v2)HKG15-107: ACPI Power Management on ARM64 Servers (v2)
HKG15-107: ACPI Power Management on ARM64 Servers (v2)
 
Trusted firmware deep_dive_v1.0_
Trusted firmware deep_dive_v1.0_Trusted firmware deep_dive_v1.0_
Trusted firmware deep_dive_v1.0_
 
LCA14: LCA14-502: The way to a generic TrustZone® solution
LCA14: LCA14-502: The way to a generic TrustZone® solutionLCA14: LCA14-502: The way to a generic TrustZone® solution
LCA14: LCA14-502: The way to a generic TrustZone® solution
 
The Linux Kernel Scheduler (For Beginners) - SFO17-421
The Linux Kernel Scheduler (For Beginners) - SFO17-421The Linux Kernel Scheduler (For Beginners) - SFO17-421
The Linux Kernel Scheduler (For Beginners) - SFO17-421
 
Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...
 
DPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet ProcessingDPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet Processing
 
Embedded Android : System Development - Part IV
Embedded Android : System Development - Part IVEmbedded Android : System Development - Part IV
Embedded Android : System Development - Part IV
 
U-Boot presentation 2013
U-Boot presentation  2013U-Boot presentation  2013
U-Boot presentation 2013
 
Intel® RDT Hands-on Lab
Intel® RDT Hands-on LabIntel® RDT Hands-on Lab
Intel® RDT Hands-on Lab
 
Gluster technical overview
Gluster technical overviewGluster technical overview
Gluster technical overview
 
Linux Initialization Process (1)
Linux Initialization Process (1)Linux Initialization Process (1)
Linux Initialization Process (1)
 
Accelerated Linux Core Dump Analysis training public slides
Accelerated Linux Core Dump Analysis training public slidesAccelerated Linux Core Dump Analysis training public slides
Accelerated Linux Core Dump Analysis training public slides
 
USB Drivers
USB DriversUSB Drivers
USB Drivers
 
Virtualization Support in ARMv8+
Virtualization Support in ARMv8+Virtualization Support in ARMv8+
Virtualization Support in ARMv8+
 

Viewers also liked

Building a KVM-based Hypervisor for a Heterogeneous System Architecture Compl...
Building a KVM-based Hypervisor for a Heterogeneous System Architecture Compl...Building a KVM-based Hypervisor for a Heterogeneous System Architecture Compl...
Building a KVM-based Hypervisor for a Heterogeneous System Architecture Compl...
Hann Yu-Ju Huang
 
CGDC 2016 Building paragon in UE4
CGDC 2016 Building paragon in UE4CGDC 2016 Building paragon in UE4
CGDC 2016 Building paragon in UE4
Ning Hu
 
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadOpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
Tristan Lorach
 
NvFX GTC 2013
NvFX GTC 2013NvFX GTC 2013
NvFX GTC 2013
Tristan Lorach
 
Siggraph 2016 - Vulkan and nvidia : the essentials
Siggraph 2016 - Vulkan and nvidia : the essentialsSiggraph 2016 - Vulkan and nvidia : the essentials
Siggraph 2016 - Vulkan and nvidia : the essentials
Tristan Lorach
 
Vulkan 1.0 Quick Reference
Vulkan 1.0 Quick ReferenceVulkan 1.0 Quick Reference
Vulkan 1.0 Quick Reference
The Khronos Group Inc.
 

Viewers also liked (6)

Building a KVM-based Hypervisor for a Heterogeneous System Architecture Compl...
Building a KVM-based Hypervisor for a Heterogeneous System Architecture Compl...Building a KVM-based Hypervisor for a Heterogeneous System Architecture Compl...
Building a KVM-based Hypervisor for a Heterogeneous System Architecture Compl...
 
CGDC 2016 Building paragon in UE4
CGDC 2016 Building paragon in UE4CGDC 2016 Building paragon in UE4
CGDC 2016 Building paragon in UE4
 
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadOpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
 
NvFX GTC 2013
NvFX GTC 2013NvFX GTC 2013
NvFX GTC 2013
 
Siggraph 2016 - Vulkan and nvidia : the essentials
Siggraph 2016 - Vulkan and nvidia : the essentialsSiggraph 2016 - Vulkan and nvidia : the essentials
Siggraph 2016 - Vulkan and nvidia : the essentials
 
Vulkan 1.0 Quick Reference
Vulkan 1.0 Quick ReferenceVulkan 1.0 Quick Reference
Vulkan 1.0 Quick Reference
 

Similar to HSA Kernel Code (KFD v0.6)

Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugging
libfetion
 
Linux device drivers
Linux device drivers Linux device drivers
망고100 보드로 놀아보자 15
망고100 보드로 놀아보자 15망고100 보드로 놀아보자 15
망고100 보드로 놀아보자 15종인 전
 
Labs_BT_20221017.pptx
Labs_BT_20221017.pptxLabs_BT_20221017.pptx
Labs_BT_20221017.pptx
ssuserb4d806
 
Performance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux KernelPerformance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux Kernel
lcplcp1
 
Exploiting the Linux Kernel via Intel's SYSRET Implementation
Exploiting the Linux Kernel via Intel's SYSRET ImplementationExploiting the Linux Kernel via Intel's SYSRET Implementation
Exploiting the Linux Kernel via Intel's SYSRET Implementation
nkslides
 
Let's trace Linux Lernel with KGDB @ COSCUP 2021
Let's trace Linux Lernel with KGDB @ COSCUP 2021Let's trace Linux Lernel with KGDB @ COSCUP 2021
Let's trace Linux Lernel with KGDB @ COSCUP 2021
Jian-Hong Pan
 
Introduction of unit test on android kernel
Introduction of unit test on android kernelIntroduction of unit test on android kernel
Introduction of unit test on android kernel
Johnson Chou
 
Linux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloudLinux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloud
Andrea Righi
 
OSN days 2019 - Open Networking and Programmable Switch
OSN days 2019 - Open Networking and Programmable SwitchOSN days 2019 - Open Networking and Programmable Switch
OSN days 2019 - Open Networking and Programmable Switch
Chun Ming Ou
 
Avoiding Catastrophic Performance Loss
Avoiding Catastrophic Performance LossAvoiding Catastrophic Performance Loss
Avoiding Catastrophic Performance Loss
basisspace
 
FPGA on the Cloud
FPGA on the Cloud FPGA on the Cloud
FPGA on the Cloud
jtsagata
 
Osol Pgsql
Osol PgsqlOsol Pgsql
Osol Pgsql
Emanuel Calvo
 
Roll your own toy unix clone os
Roll your own toy unix clone osRoll your own toy unix clone os
Roll your own toy unix clone os
eramax
 
lecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdflecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdf
Tigabu Yaya
 
Kernel Recipes 2018 - New GPIO interface for linux user space - Bartosz Golas...
Kernel Recipes 2018 - New GPIO interface for linux user space - Bartosz Golas...Kernel Recipes 2018 - New GPIO interface for linux user space - Bartosz Golas...
Kernel Recipes 2018 - New GPIO interface for linux user space - Bartosz Golas...
Anne Nicolas
 
Anatomy of ROCgdb presentation at gcc cauldron 2022
Anatomy of ROCgdb presentation at gcc cauldron 2022Anatomy of ROCgdb presentation at gcc cauldron 2022
Anatomy of ROCgdb presentation at gcc cauldron 2022
ssuser866937
 
Bsdtw17: ruslan bukin: free bsd/risc-v and device drivers
Bsdtw17: ruslan bukin: free bsd/risc-v and device driversBsdtw17: ruslan bukin: free bsd/risc-v and device drivers
Bsdtw17: ruslan bukin: free bsd/risc-v and device drivers
Scott Tsai
 
Advanced Diagnostics 2
Advanced Diagnostics 2Advanced Diagnostics 2
Advanced Diagnostics 2Aero Plane
 

Similar to HSA Kernel Code (KFD v0.6) (20)

Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugging
 
Linux device drivers
Linux device drivers Linux device drivers
Linux device drivers
 
망고100 보드로 놀아보자 15
망고100 보드로 놀아보자 15망고100 보드로 놀아보자 15
망고100 보드로 놀아보자 15
 
Labs_BT_20221017.pptx
Labs_BT_20221017.pptxLabs_BT_20221017.pptx
Labs_BT_20221017.pptx
 
Performance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux KernelPerformance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux Kernel
 
Exploiting the Linux Kernel via Intel's SYSRET Implementation
Exploiting the Linux Kernel via Intel's SYSRET ImplementationExploiting the Linux Kernel via Intel's SYSRET Implementation
Exploiting the Linux Kernel via Intel's SYSRET Implementation
 
Let's trace Linux Lernel with KGDB @ COSCUP 2021
Let's trace Linux Lernel with KGDB @ COSCUP 2021Let's trace Linux Lernel with KGDB @ COSCUP 2021
Let's trace Linux Lernel with KGDB @ COSCUP 2021
 
Introduction of unit test on android kernel
Introduction of unit test on android kernelIntroduction of unit test on android kernel
Introduction of unit test on android kernel
 
Linux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloudLinux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloud
 
OSN days 2019 - Open Networking and Programmable Switch
OSN days 2019 - Open Networking and Programmable SwitchOSN days 2019 - Open Networking and Programmable Switch
OSN days 2019 - Open Networking and Programmable Switch
 
Avoiding Catastrophic Performance Loss
Avoiding Catastrophic Performance LossAvoiding Catastrophic Performance Loss
Avoiding Catastrophic Performance Loss
 
FPGA on the Cloud
FPGA on the Cloud FPGA on the Cloud
FPGA on the Cloud
 
Osol Pgsql
Osol PgsqlOsol Pgsql
Osol Pgsql
 
Roll your own toy unix clone os
Roll your own toy unix clone osRoll your own toy unix clone os
Roll your own toy unix clone os
 
lecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdflecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdf
 
Unix kernal
Unix kernalUnix kernal
Unix kernal
 
Kernel Recipes 2018 - New GPIO interface for linux user space - Bartosz Golas...
Kernel Recipes 2018 - New GPIO interface for linux user space - Bartosz Golas...Kernel Recipes 2018 - New GPIO interface for linux user space - Bartosz Golas...
Kernel Recipes 2018 - New GPIO interface for linux user space - Bartosz Golas...
 
Anatomy of ROCgdb presentation at gcc cauldron 2022
Anatomy of ROCgdb presentation at gcc cauldron 2022Anatomy of ROCgdb presentation at gcc cauldron 2022
Anatomy of ROCgdb presentation at gcc cauldron 2022
 
Bsdtw17: ruslan bukin: free bsd/risc-v and device drivers
Bsdtw17: ruslan bukin: free bsd/risc-v and device driversBsdtw17: ruslan bukin: free bsd/risc-v and device drivers
Bsdtw17: ruslan bukin: free bsd/risc-v and device drivers
 
Advanced Diagnostics 2
Advanced Diagnostics 2Advanced Diagnostics 2
Advanced Diagnostics 2
 

Recently uploaded

Visitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.appVisitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.app
NaapbooksPrivateLimi
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
Peter Caitens
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
Tendenci - The Open Source AMS (Association Management Software)
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
IES VE
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Natan Silnitsky
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
XfilesPro
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024
Sharepoint Designs
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
varshanayak241
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
vrstrong314
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2
 

Recently uploaded (20)

Visitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.appVisitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.app
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 

HSA Kernel Code (KFD v0.6)

  • 1. HSA Kernel Code (KFD v0.6) Advisor: 徐慰中教授 Student: 黃昱儒 2014/7/25
  • 2. Agenda ● Introduction to HSA o hUMA o User Level Queueing ● HSA Driver o Concepts ▪ Flow Overview ▪ User & Hardware Queues o Source Code Detail ● IOMMU o Concepts ▪ GCR3 ▪ PPR o Source Code Detail
  • 4. User Level Queuing - Before HSA
  • 6. Application 1 Queue 1 1. AQL Packet 2. Ring 3. Doorbell HSA Device Application 1 Queue 2 Application 3 Queue 1 Application 3 Queue 1 HSA device access application’s ring Application kick doorbell IOMMU address translation (VA->PA)
  • 8. HSA Software Stack Application Runtime Library ● open(“/dev/kfd”) ● ioctl(KFD_IOC_SET_MEMORY_POLICY) ● ioctl(KFD_IOC_CREATE_QUEUE) ● ioctl(KFD_IOC_DESTROY_QUEUE) KFD IOMMU Driver HSA-aware Kernel HSA Device IOMMU
  • 9. Agenda ● Introduction to HSA o hUMA o User Level Queueing ● HSA Driver o Concepts ▪ Flow Overview ▪ User & Hardware Queues o Source Code Detail ● IOMMU o Concepts ▪ GCR3 ▪ PPR o Source Code Detail
  • 10. Concepts - HSA Run Flow Application KFD Driver Create user queues Create HW queue with user queue information Enqueu AQL packets, kick doorbell, and wait signal Nothing Application finish and destroy queues Release HW queue Initialization Computation Finish User - HW interaction
  • 11. Scheduled Policy 1. Hardware scheduler and allows oversubscription (more queues than HW slots) 2. HW scheduling but does not allow oversubscription, so create_queue requests fail when we run out of HW slots 3. Not use HW scheduling, so the driver manually assigns queues to HW slots by programming registers
  • 12. pasid=1 queue_id=0 ring_base_address pasid=1 queue_id=1 ring_base_address HSA GPU’s configuration register mmio address Software Scheduler pasid=0 queue_id=0 ring_base_address doorbell Free hardware queue_id bitmap pasid=0 queue_id=1 ring_base_address doorbell doorbell doorbell queue acquire register (pipe, queue) Physical Address
  • 13. HSA GPU’s configuration register mmio address Hardware Scheduler kernel_queue ring_base_address doorbell queue acquire register (pipe=4, queue=0) Physical Address
  • 14. Hardware Scheduler - No Oversubscription PM4 Packet (Type3) IT_RUN_LIST run_list PM4 Packet (Type3) IT_MAP_PROCESS page_table_base pasid sh_mem_config PM4 Packet (Type3) IT_MAP_QUEUES mqd_addr (Memory Queue Descriptoy) 3 Processes
  • 15. Hardware Scheduler - Oversubscription PM4 Packet (Type3) IT_RUN_LIST run_list PM4 Packet (Type3) IT_MAP_PROCESS page_table_base pasid sh_mem_config PM4 Packet (Type3) IT_MAP_QUEUES mqd_addr (Memory Queue Descriptoy) PM4 Packet (Type3) IT_RUN_LIST run_list
  • 16. Per Application Per Device Only for HW scheduling Per HW Queue
  • 17. IOCTL Command Provided by KFD ● KFD_IOC_CREATE_QUEUE o Create hardware queue from application’s information (ex: ring base address) ● KFD_IOC_DESTROY_QUEUE o Release hardware queue ● KFD_IOC_UPDATE_QUEUE ● KFD_IOC_SET_MEMORY_POLICY o Set cache coherent policy ● KFD_IOC_GET_CLOCK_COUNTERS o Get GPU clock counter ● KFD_IOC_GET_PROCESS_APERTURES o Get apertures information of GPU ● KFD_IOC_PMC_ACQUIRE_ACCESS ● KFD_IOC_PMC_RELEASE_ACCESS o Exclusive access for performance counters
  • 18. HSA Driver Flow ● System intialization ○ module_init ○ device_init (Called by radeon) ● Application open “/dev/kfd” device ● Application send ioctl ○ KFD_IOC_SET_MEMORY_POLICY ○ KFD_IOC_CREATE_QUEUE ● Application send ioctl ○ KFD_IOC_DESTROY_QUEUE ● Application termination
  • 19. module_init(kfd_module_init) ● radeon_kfd_pasid_init o Initialize PASID bitmap ● radeon_kfd_chardev_init o register_chrdev: /dev/kfd o kfd_ops ▪ Define open, ioctl member function
  • 20. kgd2kfd_device_init ● radeon_kfd_doorbell_init(kfd); ● radeon_kfd_interrupt_init(kfd); ● amd_iommu_set_invalidate_ctx_cb(kfd->pdev, iommu_pasid_shutdown_callback); ● device_queue_manager_init(kfd); o dqm->initialize ● dqm->start(kfd->dqm);
  • 21. dqm->initialize For KFD_SCHED_POLICY_NO_HWS* ● Prepare pipe, queue bitmap
  • 22. kfd_open ● radeon_kfd_create_process(current) o Create kfd_process o Assign PASID
  • 23. KFD_IOC_SET_MEMORY_POLICY ● Two policy o cache_policy_coherent o cache_policy_noncoherent ● Okra o default policy=cache_policy_coherent o alternate policy=cache_policy_noncoherent
  • 24. radeon_kfd_bind_process_to_device ● Called when user application send ioctl command ● amd_iommu_bind_pasid() o Register iommu with this kfd_process
  • 25. KFD_IOC_CREATE_QUEUE ● Create queue with informations from userspace ● pqm_create_queue ● Return queue_id and doorbell_address to userspace o queue_id is per kfd_process o doorbell_address map to device mmio address
  • 26. pqm_create_queue ● find_available_queue_slot o Assign qid (per kfd_process) ● dqm->register_process o Register process to dqm (device queue manager) ● create_cp_queue o Create with queue_properties get from application o Map doorbell mmio address to application ● dqm->create_queue ● dqm->execute_queue
  • 27. dqm->create_queue For KFD_SCHED_POLICY_NO_HWS ● init_mqd (memory queue descriptor) o Store queue configuration from application ● Find unused (pipe, queue) from dqm (device queue manager) o If no, return -EBUSY o Maximum = 56
  • 28. dqm->execute_queue For KFD_SCHED_POLICY_NO_HWS ● Write queue configuration to device ● load_mqd o ring_base_addr o doorbell_offset o queue_priority o ...
  • 29. pasid=1 queue_id=0 ring_base_address pasid=1 queue_id=1 ring_base_address HSA GPU’s configuration register mmio address pasid=0 queue_id=0 ring_base_address Free hardware queue_id bitmap queue select register doorbell pasid=0 queue_id=1 ring_base_address doorbell doorbell doorbell Each process can have up to 1024 queues (pipe, queue) Physical Address
  • 30. kgd2kfd_device_init ● radeon_kfd_doorbell_init(kfd); ● radeon_kfd_interrupt_init(kfd); ● device_iommu_pasid_init(kfd); ● kfd_topology_add_device(kfd); ● amd_iommu_set_invalidate_ctx_cb(kfd->pdev, iommu_pasid_shutdown_callback); ● device_queue_manager_init(kfd); o dqm->initialize ● dqm->start(kfd->dqm);
  • 31. dqm->start For KFD_SCHED_POLICY_HWS* ● pm_init (packet manager) ● kernel_queue_init o kernel_queue doorbell o kernel_queue ring address o load_mqd to write kernel_queue configuration to device
  • 32. pqm_create_queue ● find_available_queue_slot o Assign qid (per kfd_process) ● dqm->register_process o Register process to dqm (device queue manager) ● create_cp_queue o Create with queue_properties get from application o Map doorbell mmio address to application ● dqm->create_queue ● dqm->execute_queue
  • 33. dqm->create_queue For KFD_SCHED_POLICY_HWS* ● init_mqd (memory queue descriptor) o Store queue configuration from application
  • 34. dqm->execute_queue For KFD_SCHED_POLICY_HWS* ● dqm->destroy_queues ● pm_send_runlist o pm_create_runlist_ib ▪ Construct pm4 packet of MAP_PROCESS and MAP_QUEUES type ● Packet contains application’s ring address o pm->kernel_queue->acquire_packet_buffer ▪ Get a not used entry of kernel_queue o pm_create_runlist ▪ Construct pm4 packet of RUN_LIST type o pm->kernel_queue->submit_packet ▪ Kick kernel queue’s doorbell
  • 35. Hardware Scheduler - No Oversubscription PM4 Packet (Type3) IT_RUN_LIST run_list PM4 Packet (Type3) IT_MAP_PROCESS page_table_base pasid sh_mem_config PM4 Packet (Type3) IT_MAP_QUEUES mqd_addr (Memory Queue Descriptoy) 3 Processes
  • 36. Hardware Scheduler - Oversubscription PM4 Packet (Type3) IT_RUN_LIST run_list PM4 Packet (Type3) IT_MAP_PROCESS page_table_base pasid sh_mem_config PM4 Packet (Type3) IT_MAP_QUEUES mqd_addr (Memory Queue Descriptoy) PM4 Packet (Type3) IT_RUN_LIST run_list
  • 37. Software Scheduling HardwareScheduling ● Prepare (pipe, queue) bitmap dqm-> initialize dqm-> start ● Create kfd_process ● Assign PASID kfd_open ● Get queue_id ● Map doorbell to application ioctl(CREAT E_QUEUE) ● init_mqd ● Find unused (pipe, queue) to assign HW queue_id dqm- >create_que ue ● Write queue configuration to device dqm- >execute_qu eue dqm-> initialize ● pm_init ● kernel_queue_init dqm-> start ● Create kfd_process ● Assign PASID kfd_open ● init_mqd dqm- >create_que ue ● Create pm4 packet ● Kick kernel_queue’s doorbell dqm- >execute_qu eue ● Get queue_id ● Map doorbell to application ioctl(CREAT E_QUEUE)
  • 38. Application Computation ... ● HW has ring_base_addr userspace address o Application enqueue AQL packet and wait signal ● Application has HW doorbell mmio address o Use to kick hardware ● Driver do nothing ● Until application send ioctl(KFD_IOC_DESTROY_QUEUE) or application finish
  • 39. Haredware Queue Deactivation 1. Application send ioctl(KFD_IOC_DESTROY_QUEUE) 2. Task exit notifier
  • 40. Haredware Queue Deactivation (1) ● ioctl(KFD_IOC_DESTROY_QUEUE) ● pqm_destroy_queue o dqm->destroy_queue o Restore queue, pipe bitmap o dqm->execute_queues(dqm);
  • 41. dqm->destroy_queue For KFD_SCHED_POLICY_NO_HWS ● destroy_mqd o acquire_queue(kgd, pipe_id, queue_id); o write_register(kgd, CP_HQD_DEQUEUE_REQUEST, DEQUEUE_REQUEST_DRAIN);
  • 42. dqm->destroy_queue For KFD_SCHED_POLICY_HWS* ● dqm->destroy_queues o pm_send_unmap_queue ▪ Send a pm4 packet of UNMAP_QUEUES o pm_send_query_status(KFD_FENCE_COMPLETE D)
  • 43. Haredware Queue Deactivation (2) ● Task exit notifier will call iommu_pasid_shutdown_callback o Register in kgd2kfd_device_init ->amd_iommu_set_invalidate_ctx_cb o Will be called in mmu_notifier’s release function (mmu_notifier is registered in radeon_kfd_bind_process_to_device ->amd_iommu_bind_pasid)
  • 44. iommu_pasid_shutdown_callback ● pqm_destroy_queue o dqm->destroy_queue o Restore queue, pipe bitmap o dqm->execute_queues(dqm);
  • 45. Agenda ● Introduction to HSA o hUMA o User Level Queueing ● HSA Driver o Concepts ▪ Flow Overview ▪ User & Hardware Queues o Source Code Detail ● IOMMU o Concepts ▪ GCR3 ▪ PPR o Source Code Detail
  • 46. Introduction to IOMMU ● User application send AQL packet into ring address which is virtual address ● Device accessing need translate VA to PA Doorbell Ring Address
  • 47. HSA GPU Device table PASID=2 GCR3 Assign this entry with kfd_process->mm->pgd Physical Address
  • 48. PRI & PPR ● The operating system is usually required to pin memory pages used for I/O. ● IOMMU Provide mechnism to let peripheral to use unpinned pages for I/O. ● Only support in AMD IOMMU_v2
  • 49. PRI & PPR ● PRI(page request interface) o peripheral request memory management service from a host OS (eg, page fault service for peripheral) o Issued by peripheral ● PPR(peripheral page service request) o When IOMMU receives a valid PRI request, it creates a PPR message in request log to request changes to virtual address space o Issued by IOMMU as interrupt ● Use to request IO page table change o IOMMU driver can register PPR notifier
  • 51. Set IOMMU With PASID ● amd_iommu_bind_pasid ● Called when kfd_process create o mmu_notifier_register(&pasid_state->mn, pasid_state->mm); o amd_iommu_domain_set_gcr3(dev_state->domain, pasid, __pa(pasid_state->mm->pgd));
  • 52. HSA GPU Device table PASID=2 GCR3 Assign this entry with kfd_process->mm->pgd
  • 53. PRI & PPR Flow Peripheral issue PRI to IOMMU IOMMU write PPR request to PPR log (log contains fault address, pasid, device_id, tag, flags) IOMMU send interrupt to CPU
  • 54. PPR Flow When irq comes readl(iommu->mmio_base + MMIO_STATUS_OFFSET); if (status & MMIO_STATUS_PPR_INT_MASK) ppr_notifier Register in amd_iommv_v2_init do_fault
  • 55. do_fault ● get_user_pages() API to pin fault pages into memory o mm_struct, fault_addr
  • 56. Flow Review Application Runtime Library ● open(“/dev/kfd”) ● ioctl(KFD_IOC_SET_MEMORY_POLICY) ● ioctl(KFD_IOC_CREATE_QUEUE) ● ioctl(KFD_IOC_DESTROY_QUEUE) KFD IOMMU Driver HSA-aware Kernel HSA Device IOMMU
  • 58. Reference ● https://github.com/HSAFoundation/HSA-Drivers- Linux-AMD ● http://www.hsafoundation.com/standards/

Editor's Notes

  1. User queue with information
  2. Module_param ,insmod can change sched_policy
  3. Driver’s help ring VA SW 7*8 = 56
  4. Oversubscription dqm->processes_count >= VMID_PER_DEVICE) || // 8 dqm->queue_count >= PIPE_PER_ME_CP_SCHEDULING * QUEUES_PER_PIPE))) // 24 SW 7*8 = 56
  5. http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/10/R6xx_R7xx_3D.pdf HSA compilant HW need to understand pm4 packet format of radeon http://www.spinics.net/linux/lists/kernel/msg1784187.html Type-0 Packet Write N DWORDs in the information body to the N consecutive registers, or to the register, pointed to by the BASE_INDEX field of the packet header . Type3:Carry out the operation indicated by field IT_OPCODE.
  6. http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/10/R6xx_R7xx_3D.pdf HSA compilant HW need to understand pm4 packet format of radeon http://www.spinics.net/linux/lists/kernel/msg1784187.html Radeon R7 for Kaveri Type-0 Packet Write N DWORDs in the information body to the N consecutive registers, or to the register, pointed to by the BASE_INDEX field of the packet header . Type3:Carry out the operation indicated by field IT_OPCODE.
  7. per_device_data radeon_dev
  8. KFD is HSA driver!
  9. Start code
  10. kfd_topology_add_device: dev->gpu_id
  11. Wait for spec
  12. per_device_data
  13. Wrap all mmio access to radeon
  14. Driver’s help
  15. kfd_topology_add_device: dev->gpu_id
  16. packet_manager’s most important member: kernel_queue
  17. http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/10/R6xx_R7xx_3D.pdf HSA compilant HW need to understand pm4 packet format of radeon http://www.spinics.net/linux/lists/kernel/msg1784187.html Type-0 Packet Write N DWORDs in the information body to the N consecutive registers, or to the register, pointed to by the BASE_INDEX field of the packet header . Type3:Carry out the operation indicated by field IT_OPCODE.
  18. http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/10/R6xx_R7xx_3D.pdf HSA compilant HW need to understand pm4 packet format of radeon http://www.spinics.net/linux/lists/kernel/msg1784187.html Type-0 Packet Write N DWORDs in the information body to the N consecutive registers, or to the register, pointed to by the BASE_INDEX field of the packet header . Type3:Carry out the operation indicated by field IT_OPCODE.
  19. Query also a packet
  20. SMMU functionality
  21. 以前沒差,IOMMU只摸device address For now, data in AQL packet is VA