HSA Kernel Code 
(KFD v0.6) 
Advisor: 徐慰中教授 
Student: 黃昱儒 
2014/7/25
Agenda 
● Introduction to HSA 
o hUMA 
o User Level Queueing 
● HSA Driver 
o Concepts 
▪ Flow Overview 
▪ User & Hardware Queues 
o Source Code Detail 
● IOMMU 
o Concepts 
▪ GCR3 
▪ PPR 
o Source Code Detail
hUMA
User Level Queuing - Before HSA
User Level Queuing
Application 1 
Queue 1 
1. AQL Packet 
2. Ring 
3. Doorbell 
HSA Device 
Application 1 
Queue 2 
Application 3 
Queue 1 
Application 3 
Queue 1 
HSA device access 
application’s ring 
Application 
kick doorbell 
IOMMU address translation 
(VA->PA)
HSA Software Stack
HSA Software Stack 
Application 
Runtime Library 
● open(“/dev/kfd”) 
● ioctl(KFD_IOC_SET_MEMORY_POLICY) 
● ioctl(KFD_IOC_CREATE_QUEUE) 
● ioctl(KFD_IOC_DESTROY_QUEUE) 
KFD IOMMU Driver 
HSA-aware Kernel 
HSA 
Device 
IOMMU
Agenda 
● Introduction to HSA 
o hUMA 
o User Level Queueing 
● HSA Driver 
o Concepts 
▪ Flow Overview 
▪ User & Hardware Queues 
o Source Code Detail 
● IOMMU 
o Concepts 
▪ GCR3 
▪ PPR 
o Source Code Detail
Concepts - HSA Run Flow 
Application KFD Driver 
Create user queues 
Create HW queue with user 
queue information 
Enqueu AQL packets, 
kick doorbell, and wait 
signal 
Nothing 
Application finish and 
destroy queues 
Release HW queue 
Initialization 
Computation 
Finish 
User - HW 
interaction
Scheduled Policy 
1. Hardware scheduler and allows 
oversubscription (more queues than HW 
slots) 
2. HW scheduling but does not allow 
oversubscription, so create_queue requests 
fail when we run out of HW slots 
3. Not use HW scheduling, so the driver 
manually assigns queues to HW slots by 
programming registers
pasid=1 
queue_id=0 
ring_base_address 
pasid=1 
queue_id=1 
ring_base_address 
HSA GPU’s configuration register mmio address 
Software Scheduler 
pasid=0 
queue_id=0 
ring_base_address 
doorbell 
Free hardware queue_id bitmap 
pasid=0 
queue_id=1 
ring_base_address 
doorbell 
doorbell 
doorbell 
queue 
acquire 
register 
(pipe, queue) 
Physical Address
HSA GPU’s configuration register mmio address 
Hardware Scheduler 
kernel_queue 
ring_base_address 
doorbell 
queue 
acquire 
register 
(pipe=4, queue=0) 
Physical Address
Hardware Scheduler - No Oversubscription 
PM4 Packet (Type3) 
IT_RUN_LIST 
run_list 
PM4 Packet (Type3) 
IT_MAP_PROCESS 
page_table_base 
pasid 
sh_mem_config 
PM4 Packet (Type3) 
IT_MAP_QUEUES 
mqd_addr 
(Memory Queue 
Descriptoy) 
3 Processes
Hardware Scheduler - Oversubscription 
PM4 Packet (Type3) 
IT_RUN_LIST 
run_list 
PM4 Packet (Type3) 
IT_MAP_PROCESS 
page_table_base 
pasid 
sh_mem_config 
PM4 Packet (Type3) 
IT_MAP_QUEUES 
mqd_addr 
(Memory Queue 
Descriptoy) 
PM4 Packet (Type3) 
IT_RUN_LIST 
run_list
Per Application 
Per Device 
Only for HW 
scheduling 
Per HW Queue
IOCTL Command Provided by KFD 
● KFD_IOC_CREATE_QUEUE 
o Create hardware queue from application’s information (ex: ring base address) 
● KFD_IOC_DESTROY_QUEUE 
o Release hardware queue 
● KFD_IOC_UPDATE_QUEUE 
● KFD_IOC_SET_MEMORY_POLICY 
o Set cache coherent policy 
● KFD_IOC_GET_CLOCK_COUNTERS 
o Get GPU clock counter 
● KFD_IOC_GET_PROCESS_APERTURES 
o Get apertures information of GPU 
● KFD_IOC_PMC_ACQUIRE_ACCESS 
● KFD_IOC_PMC_RELEASE_ACCESS 
o Exclusive access for performance counters
HSA Driver Flow 
● System intialization 
○ module_init 
○ device_init (Called by radeon) 
● Application open “/dev/kfd” device 
● Application send ioctl 
○ KFD_IOC_SET_MEMORY_POLICY 
○ KFD_IOC_CREATE_QUEUE 
● Application send ioctl 
○ KFD_IOC_DESTROY_QUEUE 
● Application termination
module_init(kfd_module_init) 
● radeon_kfd_pasid_init 
o Initialize PASID bitmap 
● radeon_kfd_chardev_init 
o register_chrdev: /dev/kfd 
o kfd_ops 
▪ Define open, ioctl member function
kgd2kfd_device_init 
● radeon_kfd_doorbell_init(kfd); 
● radeon_kfd_interrupt_init(kfd); 
● amd_iommu_set_invalidate_ctx_cb(kfd->pdev, 
iommu_pasid_shutdown_callback); 
● device_queue_manager_init(kfd); 
o dqm->initialize 
● dqm->start(kfd->dqm);
dqm->initialize For 
KFD_SCHED_POLICY_NO_HWS* 
● Prepare pipe, queue bitmap
kfd_open 
● radeon_kfd_create_process(current) 
o Create kfd_process 
o Assign PASID
KFD_IOC_SET_MEMORY_POLICY 
● Two policy 
o cache_policy_coherent 
o cache_policy_noncoherent 
● Okra 
o default policy=cache_policy_coherent 
o alternate policy=cache_policy_noncoherent
radeon_kfd_bind_process_to_device 
● Called when user application send ioctl 
command 
● amd_iommu_bind_pasid() 
o Register iommu with this kfd_process
KFD_IOC_CREATE_QUEUE 
● Create queue with informations from 
userspace 
● pqm_create_queue 
● Return queue_id and doorbell_address to 
userspace 
o queue_id is per kfd_process 
o doorbell_address map to device mmio address
pqm_create_queue 
● find_available_queue_slot 
o Assign qid (per kfd_process) 
● dqm->register_process 
o Register process to dqm (device queue manager) 
● create_cp_queue 
o Create with queue_properties get from application 
o Map doorbell mmio address to application 
● dqm->create_queue 
● dqm->execute_queue
dqm->create_queue For 
KFD_SCHED_POLICY_NO_HWS 
● init_mqd (memory queue descriptor) 
o Store queue configuration from application 
● Find unused (pipe, queue) from dqm (device 
queue manager) 
o If no, return -EBUSY 
o Maximum = 56
dqm->execute_queue For 
KFD_SCHED_POLICY_NO_HWS 
● Write queue configuration to device 
● load_mqd 
o ring_base_addr 
o doorbell_offset 
o queue_priority 
o ...
pasid=1 
queue_id=0 
ring_base_address 
pasid=1 
queue_id=1 
ring_base_address 
HSA GPU’s configuration register mmio address 
pasid=0 
queue_id=0 
ring_base_address 
Free hardware queue_id bitmap 
queue 
select 
register 
doorbell 
pasid=0 
queue_id=1 
ring_base_address 
doorbell 
doorbell 
doorbell 
Each process can have up to 1024 queues 
(pipe, queue) 
Physical Address
kgd2kfd_device_init 
● radeon_kfd_doorbell_init(kfd); 
● radeon_kfd_interrupt_init(kfd); 
● device_iommu_pasid_init(kfd); 
● kfd_topology_add_device(kfd); 
● amd_iommu_set_invalidate_ctx_cb(kfd->pdev, 
iommu_pasid_shutdown_callback); 
● device_queue_manager_init(kfd); 
o dqm->initialize 
● dqm->start(kfd->dqm);
dqm->start For 
KFD_SCHED_POLICY_HWS* 
● pm_init (packet manager) 
● kernel_queue_init 
o kernel_queue doorbell 
o kernel_queue ring address 
o load_mqd to write kernel_queue configuration to 
device
pqm_create_queue 
● find_available_queue_slot 
o Assign qid (per kfd_process) 
● dqm->register_process 
o Register process to dqm (device queue manager) 
● create_cp_queue 
o Create with queue_properties get from application 
o Map doorbell mmio address to application 
● dqm->create_queue 
● dqm->execute_queue
dqm->create_queue For 
KFD_SCHED_POLICY_HWS* 
● init_mqd (memory queue descriptor) 
o Store queue configuration from application
dqm->execute_queue For 
KFD_SCHED_POLICY_HWS* 
● dqm->destroy_queues 
● pm_send_runlist 
o pm_create_runlist_ib 
▪ Construct pm4 packet of MAP_PROCESS and 
MAP_QUEUES type 
● Packet contains application’s ring address 
o pm->kernel_queue->acquire_packet_buffer 
▪ Get a not used entry of kernel_queue 
o pm_create_runlist 
▪ Construct pm4 packet of RUN_LIST type 
o pm->kernel_queue->submit_packet 
▪ Kick kernel queue’s doorbell
Hardware Scheduler - No Oversubscription 
PM4 Packet (Type3) 
IT_RUN_LIST 
run_list 
PM4 Packet (Type3) 
IT_MAP_PROCESS 
page_table_base 
pasid 
sh_mem_config 
PM4 Packet (Type3) 
IT_MAP_QUEUES 
mqd_addr 
(Memory Queue 
Descriptoy) 
3 Processes
Hardware Scheduler - Oversubscription 
PM4 Packet (Type3) 
IT_RUN_LIST 
run_list 
PM4 Packet (Type3) 
IT_MAP_PROCESS 
page_table_base 
pasid 
sh_mem_config 
PM4 Packet (Type3) 
IT_MAP_QUEUES 
mqd_addr 
(Memory Queue 
Descriptoy) 
PM4 Packet (Type3) 
IT_RUN_LIST 
run_list
Software Scheduling HardwareScheduling 
● Prepare (pipe, queue) bitmap 
dqm-> 
initialize 
dqm-> 
start 
● Create kfd_process 
● Assign PASID 
kfd_open 
● Get queue_id 
● Map doorbell to application 
ioctl(CREAT 
E_QUEUE) 
● init_mqd 
● Find unused (pipe, queue) to 
assign HW queue_id 
dqm- 
>create_que 
ue 
● Write queue configuration to 
device 
dqm- 
>execute_qu 
eue 
dqm-> 
initialize 
● pm_init 
● kernel_queue_init 
dqm-> 
start 
● Create kfd_process 
● Assign PASID 
kfd_open 
● init_mqd 
dqm- 
>create_que 
ue 
● Create pm4 packet 
● Kick kernel_queue’s doorbell 
dqm- 
>execute_qu 
eue 
● Get queue_id 
● Map doorbell to application 
ioctl(CREAT 
E_QUEUE)
Application Computation ... 
● HW has ring_base_addr userspace address 
o Application enqueue AQL packet and wait signal 
● Application has HW doorbell mmio address 
o Use to kick hardware 
● Driver do nothing 
● Until application send 
ioctl(KFD_IOC_DESTROY_QUEUE) or 
application finish
Haredware Queue Deactivation 
1. Application send 
ioctl(KFD_IOC_DESTROY_QUEUE) 
2. Task exit notifier
Haredware Queue Deactivation (1) 
● ioctl(KFD_IOC_DESTROY_QUEUE) 
● pqm_destroy_queue 
o dqm->destroy_queue 
o Restore queue, pipe bitmap 
o dqm->execute_queues(dqm);
dqm->destroy_queue For 
KFD_SCHED_POLICY_NO_HWS 
● destroy_mqd 
o acquire_queue(kgd, pipe_id, queue_id); 
o write_register(kgd, 
CP_HQD_DEQUEUE_REQUEST, 
DEQUEUE_REQUEST_DRAIN);
dqm->destroy_queue For 
KFD_SCHED_POLICY_HWS* 
● dqm->destroy_queues 
o pm_send_unmap_queue 
▪ Send a pm4 packet of UNMAP_QUEUES 
o pm_send_query_status(KFD_FENCE_COMPLETE 
D)
Haredware Queue Deactivation (2) 
● Task exit notifier will call 
iommu_pasid_shutdown_callback 
o Register in kgd2kfd_device_init 
->amd_iommu_set_invalidate_ctx_cb 
o Will be called in mmu_notifier’s release function 
(mmu_notifier is registered in 
radeon_kfd_bind_process_to_device 
->amd_iommu_bind_pasid)
iommu_pasid_shutdown_callback 
● pqm_destroy_queue 
o dqm->destroy_queue 
o Restore queue, pipe bitmap 
o dqm->execute_queues(dqm);
Agenda 
● Introduction to HSA 
o hUMA 
o User Level Queueing 
● HSA Driver 
o Concepts 
▪ Flow Overview 
▪ User & Hardware Queues 
o Source Code Detail 
● IOMMU 
o Concepts 
▪ GCR3 
▪ PPR 
o Source Code Detail
Introduction to IOMMU 
● User application send AQL packet into ring 
address which is virtual address 
● Device accessing need translate VA to PA 
Doorbell 
Ring 
Address
HSA GPU 
Device table 
PASID=2 
GCR3 
Assign this entry with 
kfd_process->mm->pgd 
Physical Address
PRI & PPR 
● The operating system is usually required to 
pin memory pages used for I/O. 
● IOMMU Provide mechnism to let peripheral 
to use unpinned pages for I/O. 
● Only support in AMD IOMMU_v2
PRI & PPR 
● PRI(page request interface) 
o peripheral request memory management service 
from a host OS (eg, page fault service for peripheral) 
o Issued by peripheral 
● PPR(peripheral page service request) 
o When IOMMU receives a valid PRI request, it 
creates a PPR message in request log to request 
changes to virtual address space 
o Issued by IOMMU as interrupt 
● Use to request IO page table change 
o IOMMU driver can register PPR notifier
module_init(amd_iommu_v2_init) 
● amd_iommu_register_ppr_notifier(&ppr_nb); 
o PPR callback 
▪ ppr_notifier function
Set IOMMU With PASID 
● amd_iommu_bind_pasid 
● Called when kfd_process create 
o mmu_notifier_register(&pasid_state->mn, 
pasid_state->mm); 
o amd_iommu_domain_set_gcr3(dev_state->domain, 
pasid, __pa(pasid_state->mm->pgd));
HSA GPU 
Device table 
PASID=2 
GCR3 
Assign this entry with 
kfd_process->mm->pgd
PRI & PPR Flow 
Peripheral issue PRI to IOMMU 
IOMMU write PPR request to PPR log 
(log contains fault address, pasid, 
device_id, tag, flags) 
IOMMU send interrupt to CPU
PPR Flow 
When irq comes 
readl(iommu->mmio_base + MMIO_STATUS_OFFSET); 
if (status & MMIO_STATUS_PPR_INT_MASK) 
ppr_notifier 
Register in amd_iommv_v2_init 
do_fault
do_fault 
● get_user_pages() API to pin fault pages into 
memory 
o mm_struct, fault_addr
Flow Review 
Application 
Runtime Library 
● open(“/dev/kfd”) 
● ioctl(KFD_IOC_SET_MEMORY_POLICY) 
● ioctl(KFD_IOC_CREATE_QUEUE) 
● ioctl(KFD_IOC_DESTROY_QUEUE) 
KFD IOMMU Driver 
HSA-aware Kernel 
HSA 
Device 
IOMMU
Q&A 
Thanks!
Reference 
● https://github.com/HSAFoundation/HSA-Drivers- 
Linux-AMD 
● http://www.hsafoundation.com/standards/

HSA Kernel Code (KFD v0.6)

  • 1.
    HSA Kernel Code (KFD v0.6) Advisor: 徐慰中教授 Student: 黃昱儒 2014/7/25
  • 2.
    Agenda ● Introductionto HSA o hUMA o User Level Queueing ● HSA Driver o Concepts ▪ Flow Overview ▪ User & Hardware Queues o Source Code Detail ● IOMMU o Concepts ▪ GCR3 ▪ PPR o Source Code Detail
  • 3.
  • 4.
    User Level Queuing- Before HSA
  • 5.
  • 6.
    Application 1 Queue1 1. AQL Packet 2. Ring 3. Doorbell HSA Device Application 1 Queue 2 Application 3 Queue 1 Application 3 Queue 1 HSA device access application’s ring Application kick doorbell IOMMU address translation (VA->PA)
  • 7.
  • 8.
    HSA Software Stack Application Runtime Library ● open(“/dev/kfd”) ● ioctl(KFD_IOC_SET_MEMORY_POLICY) ● ioctl(KFD_IOC_CREATE_QUEUE) ● ioctl(KFD_IOC_DESTROY_QUEUE) KFD IOMMU Driver HSA-aware Kernel HSA Device IOMMU
  • 9.
    Agenda ● Introductionto HSA o hUMA o User Level Queueing ● HSA Driver o Concepts ▪ Flow Overview ▪ User & Hardware Queues o Source Code Detail ● IOMMU o Concepts ▪ GCR3 ▪ PPR o Source Code Detail
  • 10.
    Concepts - HSARun Flow Application KFD Driver Create user queues Create HW queue with user queue information Enqueu AQL packets, kick doorbell, and wait signal Nothing Application finish and destroy queues Release HW queue Initialization Computation Finish User - HW interaction
  • 11.
    Scheduled Policy 1.Hardware scheduler and allows oversubscription (more queues than HW slots) 2. HW scheduling but does not allow oversubscription, so create_queue requests fail when we run out of HW slots 3. Not use HW scheduling, so the driver manually assigns queues to HW slots by programming registers
  • 12.
    pasid=1 queue_id=0 ring_base_address pasid=1 queue_id=1 ring_base_address HSA GPU’s configuration register mmio address Software Scheduler pasid=0 queue_id=0 ring_base_address doorbell Free hardware queue_id bitmap pasid=0 queue_id=1 ring_base_address doorbell doorbell doorbell queue acquire register (pipe, queue) Physical Address
  • 13.
    HSA GPU’s configurationregister mmio address Hardware Scheduler kernel_queue ring_base_address doorbell queue acquire register (pipe=4, queue=0) Physical Address
  • 14.
    Hardware Scheduler -No Oversubscription PM4 Packet (Type3) IT_RUN_LIST run_list PM4 Packet (Type3) IT_MAP_PROCESS page_table_base pasid sh_mem_config PM4 Packet (Type3) IT_MAP_QUEUES mqd_addr (Memory Queue Descriptoy) 3 Processes
  • 15.
    Hardware Scheduler -Oversubscription PM4 Packet (Type3) IT_RUN_LIST run_list PM4 Packet (Type3) IT_MAP_PROCESS page_table_base pasid sh_mem_config PM4 Packet (Type3) IT_MAP_QUEUES mqd_addr (Memory Queue Descriptoy) PM4 Packet (Type3) IT_RUN_LIST run_list
  • 16.
    Per Application PerDevice Only for HW scheduling Per HW Queue
  • 17.
    IOCTL Command Providedby KFD ● KFD_IOC_CREATE_QUEUE o Create hardware queue from application’s information (ex: ring base address) ● KFD_IOC_DESTROY_QUEUE o Release hardware queue ● KFD_IOC_UPDATE_QUEUE ● KFD_IOC_SET_MEMORY_POLICY o Set cache coherent policy ● KFD_IOC_GET_CLOCK_COUNTERS o Get GPU clock counter ● KFD_IOC_GET_PROCESS_APERTURES o Get apertures information of GPU ● KFD_IOC_PMC_ACQUIRE_ACCESS ● KFD_IOC_PMC_RELEASE_ACCESS o Exclusive access for performance counters
  • 18.
    HSA Driver Flow ● System intialization ○ module_init ○ device_init (Called by radeon) ● Application open “/dev/kfd” device ● Application send ioctl ○ KFD_IOC_SET_MEMORY_POLICY ○ KFD_IOC_CREATE_QUEUE ● Application send ioctl ○ KFD_IOC_DESTROY_QUEUE ● Application termination
  • 19.
    module_init(kfd_module_init) ● radeon_kfd_pasid_init o Initialize PASID bitmap ● radeon_kfd_chardev_init o register_chrdev: /dev/kfd o kfd_ops ▪ Define open, ioctl member function
  • 20.
    kgd2kfd_device_init ● radeon_kfd_doorbell_init(kfd); ● radeon_kfd_interrupt_init(kfd); ● amd_iommu_set_invalidate_ctx_cb(kfd->pdev, iommu_pasid_shutdown_callback); ● device_queue_manager_init(kfd); o dqm->initialize ● dqm->start(kfd->dqm);
  • 21.
    dqm->initialize For KFD_SCHED_POLICY_NO_HWS* ● Prepare pipe, queue bitmap
  • 22.
    kfd_open ● radeon_kfd_create_process(current) o Create kfd_process o Assign PASID
  • 23.
    KFD_IOC_SET_MEMORY_POLICY ● Twopolicy o cache_policy_coherent o cache_policy_noncoherent ● Okra o default policy=cache_policy_coherent o alternate policy=cache_policy_noncoherent
  • 24.
    radeon_kfd_bind_process_to_device ● Calledwhen user application send ioctl command ● amd_iommu_bind_pasid() o Register iommu with this kfd_process
  • 25.
    KFD_IOC_CREATE_QUEUE ● Createqueue with informations from userspace ● pqm_create_queue ● Return queue_id and doorbell_address to userspace o queue_id is per kfd_process o doorbell_address map to device mmio address
  • 26.
    pqm_create_queue ● find_available_queue_slot o Assign qid (per kfd_process) ● dqm->register_process o Register process to dqm (device queue manager) ● create_cp_queue o Create with queue_properties get from application o Map doorbell mmio address to application ● dqm->create_queue ● dqm->execute_queue
  • 27.
    dqm->create_queue For KFD_SCHED_POLICY_NO_HWS ● init_mqd (memory queue descriptor) o Store queue configuration from application ● Find unused (pipe, queue) from dqm (device queue manager) o If no, return -EBUSY o Maximum = 56
  • 28.
    dqm->execute_queue For KFD_SCHED_POLICY_NO_HWS ● Write queue configuration to device ● load_mqd o ring_base_addr o doorbell_offset o queue_priority o ...
  • 29.
    pasid=1 queue_id=0 ring_base_address pasid=1 queue_id=1 ring_base_address HSA GPU’s configuration register mmio address pasid=0 queue_id=0 ring_base_address Free hardware queue_id bitmap queue select register doorbell pasid=0 queue_id=1 ring_base_address doorbell doorbell doorbell Each process can have up to 1024 queues (pipe, queue) Physical Address
  • 30.
    kgd2kfd_device_init ● radeon_kfd_doorbell_init(kfd); ● radeon_kfd_interrupt_init(kfd); ● device_iommu_pasid_init(kfd); ● kfd_topology_add_device(kfd); ● amd_iommu_set_invalidate_ctx_cb(kfd->pdev, iommu_pasid_shutdown_callback); ● device_queue_manager_init(kfd); o dqm->initialize ● dqm->start(kfd->dqm);
  • 31.
    dqm->start For KFD_SCHED_POLICY_HWS* ● pm_init (packet manager) ● kernel_queue_init o kernel_queue doorbell o kernel_queue ring address o load_mqd to write kernel_queue configuration to device
  • 32.
    pqm_create_queue ● find_available_queue_slot o Assign qid (per kfd_process) ● dqm->register_process o Register process to dqm (device queue manager) ● create_cp_queue o Create with queue_properties get from application o Map doorbell mmio address to application ● dqm->create_queue ● dqm->execute_queue
  • 33.
    dqm->create_queue For KFD_SCHED_POLICY_HWS* ● init_mqd (memory queue descriptor) o Store queue configuration from application
  • 34.
    dqm->execute_queue For KFD_SCHED_POLICY_HWS* ● dqm->destroy_queues ● pm_send_runlist o pm_create_runlist_ib ▪ Construct pm4 packet of MAP_PROCESS and MAP_QUEUES type ● Packet contains application’s ring address o pm->kernel_queue->acquire_packet_buffer ▪ Get a not used entry of kernel_queue o pm_create_runlist ▪ Construct pm4 packet of RUN_LIST type o pm->kernel_queue->submit_packet ▪ Kick kernel queue’s doorbell
  • 35.
    Hardware Scheduler -No Oversubscription PM4 Packet (Type3) IT_RUN_LIST run_list PM4 Packet (Type3) IT_MAP_PROCESS page_table_base pasid sh_mem_config PM4 Packet (Type3) IT_MAP_QUEUES mqd_addr (Memory Queue Descriptoy) 3 Processes
  • 36.
    Hardware Scheduler -Oversubscription PM4 Packet (Type3) IT_RUN_LIST run_list PM4 Packet (Type3) IT_MAP_PROCESS page_table_base pasid sh_mem_config PM4 Packet (Type3) IT_MAP_QUEUES mqd_addr (Memory Queue Descriptoy) PM4 Packet (Type3) IT_RUN_LIST run_list
  • 37.
    Software Scheduling HardwareScheduling ● Prepare (pipe, queue) bitmap dqm-> initialize dqm-> start ● Create kfd_process ● Assign PASID kfd_open ● Get queue_id ● Map doorbell to application ioctl(CREAT E_QUEUE) ● init_mqd ● Find unused (pipe, queue) to assign HW queue_id dqm- >create_que ue ● Write queue configuration to device dqm- >execute_qu eue dqm-> initialize ● pm_init ● kernel_queue_init dqm-> start ● Create kfd_process ● Assign PASID kfd_open ● init_mqd dqm- >create_que ue ● Create pm4 packet ● Kick kernel_queue’s doorbell dqm- >execute_qu eue ● Get queue_id ● Map doorbell to application ioctl(CREAT E_QUEUE)
  • 38.
    Application Computation ... ● HW has ring_base_addr userspace address o Application enqueue AQL packet and wait signal ● Application has HW doorbell mmio address o Use to kick hardware ● Driver do nothing ● Until application send ioctl(KFD_IOC_DESTROY_QUEUE) or application finish
  • 39.
    Haredware Queue Deactivation 1. Application send ioctl(KFD_IOC_DESTROY_QUEUE) 2. Task exit notifier
  • 40.
    Haredware Queue Deactivation(1) ● ioctl(KFD_IOC_DESTROY_QUEUE) ● pqm_destroy_queue o dqm->destroy_queue o Restore queue, pipe bitmap o dqm->execute_queues(dqm);
  • 41.
    dqm->destroy_queue For KFD_SCHED_POLICY_NO_HWS ● destroy_mqd o acquire_queue(kgd, pipe_id, queue_id); o write_register(kgd, CP_HQD_DEQUEUE_REQUEST, DEQUEUE_REQUEST_DRAIN);
  • 42.
    dqm->destroy_queue For KFD_SCHED_POLICY_HWS* ● dqm->destroy_queues o pm_send_unmap_queue ▪ Send a pm4 packet of UNMAP_QUEUES o pm_send_query_status(KFD_FENCE_COMPLETE D)
  • 43.
    Haredware Queue Deactivation(2) ● Task exit notifier will call iommu_pasid_shutdown_callback o Register in kgd2kfd_device_init ->amd_iommu_set_invalidate_ctx_cb o Will be called in mmu_notifier’s release function (mmu_notifier is registered in radeon_kfd_bind_process_to_device ->amd_iommu_bind_pasid)
  • 44.
    iommu_pasid_shutdown_callback ● pqm_destroy_queue o dqm->destroy_queue o Restore queue, pipe bitmap o dqm->execute_queues(dqm);
  • 45.
    Agenda ● Introductionto HSA o hUMA o User Level Queueing ● HSA Driver o Concepts ▪ Flow Overview ▪ User & Hardware Queues o Source Code Detail ● IOMMU o Concepts ▪ GCR3 ▪ PPR o Source Code Detail
  • 46.
    Introduction to IOMMU ● User application send AQL packet into ring address which is virtual address ● Device accessing need translate VA to PA Doorbell Ring Address
  • 47.
    HSA GPU Devicetable PASID=2 GCR3 Assign this entry with kfd_process->mm->pgd Physical Address
  • 48.
    PRI & PPR ● The operating system is usually required to pin memory pages used for I/O. ● IOMMU Provide mechnism to let peripheral to use unpinned pages for I/O. ● Only support in AMD IOMMU_v2
  • 49.
    PRI & PPR ● PRI(page request interface) o peripheral request memory management service from a host OS (eg, page fault service for peripheral) o Issued by peripheral ● PPR(peripheral page service request) o When IOMMU receives a valid PRI request, it creates a PPR message in request log to request changes to virtual address space o Issued by IOMMU as interrupt ● Use to request IO page table change o IOMMU driver can register PPR notifier
  • 50.
  • 51.
    Set IOMMU WithPASID ● amd_iommu_bind_pasid ● Called when kfd_process create o mmu_notifier_register(&pasid_state->mn, pasid_state->mm); o amd_iommu_domain_set_gcr3(dev_state->domain, pasid, __pa(pasid_state->mm->pgd));
  • 52.
    HSA GPU Devicetable PASID=2 GCR3 Assign this entry with kfd_process->mm->pgd
  • 53.
    PRI & PPRFlow Peripheral issue PRI to IOMMU IOMMU write PPR request to PPR log (log contains fault address, pasid, device_id, tag, flags) IOMMU send interrupt to CPU
  • 54.
    PPR Flow Whenirq comes readl(iommu->mmio_base + MMIO_STATUS_OFFSET); if (status & MMIO_STATUS_PPR_INT_MASK) ppr_notifier Register in amd_iommv_v2_init do_fault
  • 55.
    do_fault ● get_user_pages()API to pin fault pages into memory o mm_struct, fault_addr
  • 56.
    Flow Review Application Runtime Library ● open(“/dev/kfd”) ● ioctl(KFD_IOC_SET_MEMORY_POLICY) ● ioctl(KFD_IOC_CREATE_QUEUE) ● ioctl(KFD_IOC_DESTROY_QUEUE) KFD IOMMU Driver HSA-aware Kernel HSA Device IOMMU
  • 57.
  • 58.
    Reference ● https://github.com/HSAFoundation/HSA-Drivers- Linux-AMD ● http://www.hsafoundation.com/standards/

Editor's Notes

  • #11 User queue with information
  • #12 Module_param ,insmod can change sched_policy
  • #13 Driver’s help ring VA SW 7*8 = 56
  • #14 Oversubscription dqm->processes_count >= VMID_PER_DEVICE) || // 8 dqm->queue_count >= PIPE_PER_ME_CP_SCHEDULING * QUEUES_PER_PIPE))) // 24 SW 7*8 = 56
  • #15 http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/10/R6xx_R7xx_3D.pdf HSA compilant HW need to understand pm4 packet format of radeon http://www.spinics.net/linux/lists/kernel/msg1784187.html Type-0 Packet Write N DWORDs in the information body to the N consecutive registers, or to the register, pointed to by the BASE_INDEX field of the packet header . Type3:Carry out the operation indicated by field IT_OPCODE.
  • #16 http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/10/R6xx_R7xx_3D.pdf HSA compilant HW need to understand pm4 packet format of radeon http://www.spinics.net/linux/lists/kernel/msg1784187.html Radeon R7 for Kaveri Type-0 Packet Write N DWORDs in the information body to the N consecutive registers, or to the register, pointed to by the BASE_INDEX field of the packet header . Type3:Carry out the operation indicated by field IT_OPCODE.
  • #17 per_device_data radeon_dev
  • #18 KFD is HSA driver!
  • #19 Start code
  • #21 kfd_topology_add_device: dev->gpu_id
  • #24 Wait for spec
  • #25 per_device_data
  • #28 Wrap all mmio access to radeon
  • #30 Driver’s help
  • #31 kfd_topology_add_device: dev->gpu_id
  • #32 packet_manager’s most important member: kernel_queue
  • #36 http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/10/R6xx_R7xx_3D.pdf HSA compilant HW need to understand pm4 packet format of radeon http://www.spinics.net/linux/lists/kernel/msg1784187.html Type-0 Packet Write N DWORDs in the information body to the N consecutive registers, or to the register, pointed to by the BASE_INDEX field of the packet header . Type3:Carry out the operation indicated by field IT_OPCODE.
  • #37 http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/10/R6xx_R7xx_3D.pdf HSA compilant HW need to understand pm4 packet format of radeon http://www.spinics.net/linux/lists/kernel/msg1784187.html Type-0 Packet Write N DWORDs in the information body to the N consecutive registers, or to the register, pointed to by the BASE_INDEX field of the packet header . Type3:Carry out the operation indicated by field IT_OPCODE.
  • #43 Query also a packet
  • #47 SMMU functionality
  • #49 以前沒差,IOMMU只摸device address For now, data in AQL packet is VA