• Share
  • Email
  • Embed
  • Like
  • Private Content
ISCA final presentation - Runtime
 

ISCA final presentation - Runtime

on

  • 353 views

 

Statistics

Views

Total Views
353
Views on SlideShare
353
Embed Views
0

Actions

Likes
4
Downloads
27
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Queue type <br /> HSA_QUEUE_TYPE_MULTI = 0, multiple producers are supported <br /> HSA_QUEUE_TYPE_SINGLE = 1, only a single producer is supported <br /> <br /> Queue features <br /> HSA_QUEUE_FEATURE_DISPATCH = 1, queue supports dispatch packets. <br /> HSA_QUEUE_FEATURE_AGENT_DISPATCH = 2, queue supports agent dispatch packets <br /> <br /> service_queue <br /> A pointer to another user mode queue that can be used by the HSAIL kernel to request system services. <br />
  • Service_queue_type: <br /> NONE (no service queue), <br /> COMMON (runtime provided service queue that is shared), <br /> NEW (require the runtime to create a new queue).
  • acquire_fence_scope : Determine the scope and type of the memory fence operation applied before the packet enters the <br /> active phase. <br /> <br /> release_fence_scope : Determine the scope and type of the memory fence operation applied after kernel completion but <br /> before the packet is completed. <br /> <br /> HSA_FENCE_SCOPE_NONE = 0 <br /> No scope. Only valid for barrier packets. <br /> HSA_FENCE_SCOPE_COMPONENT = 1 <br /> The fence is applied with component scope for the global segment. <br /> HSA_FENCE_SCOPE_SYSTEM = 2 <br /> The fence is applied with system scope for the global segment.

ISCA final presentation - Runtime ISCA final presentation - Runtime Presentation Transcript

  • HSA RUNTIME YEN-CHING CHUNG, NATIONAL TSING HUA UNIVERSITY
  • OUTLINE  Introduction  HSA Core Runtime API (Pre-release 1.0 provisional)  Initialization and Shut Down  Notifications (Synchronous/Asynchronous)  Agent Information  Signals and Synchronization (Memory-Based)  Queues and Architected Dispatch  Summary © Copyright 2014 HSA Foundation. All Rights Reserved
  • INTRODUCTION (1)  The HSA core runtime is a thin, user-mode API that provides the interface necessary for the host to launch compute kernels to the available HSA components.  The overall goal of the HSA core runtime design is to provide a high-performance dispatch mechanism that is portable across multiple HSA vendor architectures.  The dispatch mechanism differentiates the HSA runtime from other language runtimes by architected argument setting and kernel launching at the hardware and specification level.  The HSA core runtime API is standard across all HSA vendors, such that languages which use the HSA runtime can run on different vendor’s platforms that support the API.  The implementation of the HSA runtime may include kernel-level components (required for some hardware components, ex: AMD Kaveri) or may be entirely user-space (for example, simulators or CPU implementations). © Copyright 2014 HSA Foundation. All Rights Reserved
  • Component 1 Driver Component N… Vendor m … Component 1 Driver Component N… Vendor 1 Component 1 HSA Runtime Component N… HSA Vendor 1 HSA Finalizer Component 1 HSA Runtime Component N… HSA Vendor m HSA Finalizer INTRODUCTION (2) Programming Model Language Runtime  The software architecture stack without HSA runtime OpenCL App Java App OpenMP App DSL App OpenCL Runtime Java Runtime OpenMP Runtime DSL Runtime … …  The software architecture stack with HSA runtime … © Copyright 2014 HSA Foundation. All Rights Reserved
  • INTRODUCTION (3) OpenCL Runtime HSA RuntimeAgent Start Program HSA Memory Allocation Enqueue Dispatch Packet Exit Program Resource Deallocation Command Queue Platform, Device, and Context Initialization SVM Allocation and Kernel Arguments Setting Build Kernel HSA Runtime Close HSA Runtime Initialization and Topology Discovery HSAIL Finalization and Linking © Copyright 2014 HSA Foundation. All Rights Reserved
  • INTRODUCTION (4)  HSA Platform System Architecture Specification support  Runtime initialization and shutdown  Notifications (synchronous/asynchronous)  Agent information  Signals and synchronization (memory-based)  Queues and Architected dispatch  Memory management  HSAIL support  Finalization, linking, and debugging  Image and Sampler support HSA Runtime HSA Memory Allocation Enqueue Dispatch Packet HSA Runtime Close HSA Runtime Initialization and Topology Discovery HSAIL Finalization and Linking © Copyright 2014 HSA Foundation. All Rights Reserved
  • RUNTIME INITIALIZATION AND SHUTDOWN
  • OUTLINE  Runtime Initialization API  hsa_init  Runtime Shut Down API  hsa_shut_down  Examples © Copyright 2014 HSA Foundation. All Rights Reserved
  • HSA RUNTIME INITIALIZATION  When the API is invoked for the first time in a given process, a runtime instance is created.  A typical runtime instance may contain information of platform, topology, reference count, queues, signals, etc.  The API can be called multiple times by applications  Only a single runtime instance will exist for a given process.  Whenever the API is invoked, the reference count is increased by one. © Copyright 2014 HSA Foundation. All Rights Reserved
  • HSA RUNTIME SHUT DOWN  When the API is invoked, the reference count is decreased by 1.  When the reference count < 1  All the resources associated with the runtime instance (queues, signals, topology information, etc.) are considered invalid and any attempt to reference them in subsequent API calls results in undefined behavior.  The user might call hsa_init to initialize the HSA runtime again.  The HSA runtime might release resources associated with it. © Copyright 2014 HSA Foundation. All Rights Reserved
  • EXAMPLE – RUNTIME INITIALIZATION (1) Data structure for runtime instance If hsa_init is called more than once, increase the ref_count by 1 © Copyright 2014 HSA Foundation. All Rights Reserved
  • EXAMPLE – RUNTIME INITIALIZATION (2) hsa_init is called the first time, allocate resources and set the reference count Get the number of HSA agent Initialize agents Create an empty agent list If initialization failed, release resources Create topology table © Copyright 2014 HSA Foundation. All Rights Reserved
  • Agent-0 node_id 0 id 0 type CPU vendor Generic name Generic wavefront_size 0 queue_size 200 group_memory 0 fbarrier_max_count 1 is_pic_supported 0 … … EXAMPLE - RUNTIME INSTANCE (1) Platform Name: Generic Memory node_id 0 id 0 segment_type 111111 address_base 0x0001 size 2048 MB peak_bandwidth 6553.6 mpbs Agent-1 node_id 0 id 0 type GPU vendor Generic name Generic wavefront_size 64 queue_size 200 group_memory 64 fbarrier_max_count 1 is_pic_supported 1 Cache node_id 0 id 0 levels 1 associativity 1 cache size 64KB cache line size 4 is_inclusive 1 Agent: 2 Memory: 1 Cache: 1 … … © Copyright 2014 HSA Foundation. All Rights Reserved
  • Agent-0 node_id = 0 id = 0 agent_type = 1 (CPU) vendor[16] = Generic name[16] = Generic wavefront_size = 0 queue_size =200 group_memory_size_bytes =0 fbarrier_max_count = 1 is_pic_supported = 0 Platform Header File *base_address = 0x00001 Size = 248 system_timestamp_frequency_ mhz = 200 signal_maximum_wait = 1/200 *node_id no_nodes = 1 *agent_list no_agent = 2 *memory_descriptor_list no_memory_descriptor = 1 *cache_descriptor_list no_cache_descriptor = 1 EXAMPLE - RUNTIME INSTANCE (2) … … cache node_id = 0 Id = 0 Levels = 1 * associativity * cache_size * cache_line_size * is_inclusive 1 NULL 64KB NULL 1 NULL 4 NULL Memory node_id = 0 Id = 0 supported_segment_type_mask = 111111 virtual_address_base = 0x0001 size_in_bytes = 2048MB peak_bandwidth_mbps = 6553.6 0 NULL 45 165 NULL 285 NULL 325 NULL Agent-1 node_id = 0 id = 0 agent_type = 2 (GPU) vendor[16] = Generic name[16] = Generic wavefront_size = 64 queue_size =200 group_memory_size_bytes =64 fbarrier_max_count = 1 is_pic_supported = 1 … © Copyright 2014 HSA Foundation. All Rights Reserved
  • EXAMPLE – RUNTIME SHUT DOWN © Copyright 2014 HSA Foundation. All Rights Reserved If ref_count < 1, then free the list; Otherwise decrease the ref_count by 1.
  • NOTIFICATIONS (SYNCHRONOUS/ASYNCHRONOUS)
  • OUTLINE  Synchronous Notifications  hsa_status_t  hsa_status_string  Asynchronous Notifications  Example © Copyright 2014 HSA Foundation. All Rights Reserved
  • SYNCHRONOUS NOTIFICATIONS  Notifications (errors, events, etc.) reported by the runtime can be synchronous or asynchronous  The HSA runtime uses the return values of API functions to pass notifications synchronously.  A status code is define as an enumeration, , to capture the return value of any API function that has been executed, except accessors/mutators.  The notification is a status code that indicates success or error.  Success is represented by HSA_STATUS_SUCCESS, which is equivalent to zero.  An error status is assigned a positive integer and its identifier starts with the HSA_STATUS_ERROR prefix.  The status code can help to determine a cause of the unsuccessful execution. © Copyright 2014 HSA Foundation. All Rights Reserved
  • STATUS CODE QUERY  Query additional information on status code  Parameters  status (input): Status code that the user is seeking more information on  status_string (output): An ISO/IEC 646 encoded English language string that potentially describes the error status © Copyright 2014 HSA Foundation. All Rights Reserved
  • ASYNCHRONOUS NOTIFICATIONS  The runtime passes asynchronous notifications by calling user-defined callbacks.  For instance, queues are a common source of asynchronous events because the tasks queued by an application are asynchronously consumed by the packet processor. Callbacks are associated with queues when they are created. When the runtime detects an error in a queue, it invokes the callback associated with that queue and passes it an error flag (indicating what happened) and a pointer to the erroneous queue.  The HSA runtime does not implement any default callbacks.  When using blocking functions within the callback implementation, a callback that does not return can render the runtime state to be undefined. © Copyright 2014 HSA Foundation. All Rights Reserved
  • EXAMPLE - CALLBACK Pass the callback function when create queue If the queue is empty, set the event and invoke callback © Copyright 2014 HSA Foundation. All Rights Reserved
  • AGENT INFORMATION
  • OUTLINE  Agent information  hsa_node_t  hsa_agent_t  hsa_agent_info_t  hsa_component_feature_t  Agent Information manipulation APIs  hsa_iterate_agents  hsa_agent_get_info  Example © Copyright 2014 HSA Foundation. All Rights Reserved
  • INTRODUCTION  The runtime exposes a list of agents that are available in the system.  An HSA agent is a hardware component that participates in the HSA memory model.  An HSA agent can submit AQL packets for execution.  An HSA agent may also but is not required to be an HSA component. It is possible for a system to include HSA agents that are neither an HSA component nor a host CPU.  HSA agents are defined as opaque handles of type hsa_agent_t .  The HSA runtime provides APIs for applications to traverse the list of available agents and query attributes of a particular agent. © Copyright 2014 HSA Foundation. All Rights Reserved
  • AGENT INFORMATION (1)  Opaque agent handle  Opaque NUMA node handle  An HSA memory node is a node that delineates a set of system components (host CPUs and HSA Components) with “local” access to a set of memory resources attached to the node's memory controller and appropriate HSA-compliant access attributes. © Copyright 2014 HSA Foundation. All Rights Reserved
  • AGENT INFORMATION (2)  Component features  An HSA component is a hardware or software component that can be a target of the AQL queries and conforms to the memory model of the HSA.  Values  HSA_COMPONENT_FEATURE_NONE = 0  No component capabilities. The device is an agent, but not a component.  HSA_COMPONENT_FEATURE_BASIC = 1  The component supports the HSAIL instruction set and all the AQL packet types except Agent dispatch.  HSA_COMPONENT_FEATURE_ALL = 2  The component supports the HSAIL instruction set and all the AQL packet types. © Copyright 2014 HSA Foundation. All Rights Reserved
  • AGENT INFORMATION (3)  Agent attributes  Values  HSA_AGENT_INFO_MAX_GRID_DIM  HSA_AGENT_INFO_MAX_WORKGROUP_DIM  HSA_AGENT_INFO_QUEUE_MAX_PACKETS  HSA_AGENT_INFO_CLOCK  HSA_AGENT_INFO_CLOCK_FREQUENCY  HSA_AGENT_INFO_MAX_SIGNAL_WAIT  HSA_AGENT_INFO_NAME  HSA_AGENT_INFO_NODE  HSA_AGENT_INFO_COMPONENT_FEATURES  HSA_AGENT_INFO_VENDOR_NAME  HSA_AGENT_INFO_WAVEFRONT_SIZE  HSA_AGENT_INFO_CACHE_SIZE © Copyright 2014 HSA Foundation. All Rights Reserved
  • AGENT INFORMATION MANIPULATION (1)  Iterate over the available agents, and invoke an application-defined callback on every iteration  If callback returns a status other than HSA_STATUS_SUCCESS for a particular iteration, the traversal stops and the function returns that status value.  Parameters  callback (input): Callback to be invoked once per agent  data (input): Application data that is passed to callback on every iteration. Can be NULL. © Copyright 2014 HSA Foundation. All Rights Reserved
  • AGENT INFORMATION MANIPULATION (2)  Get the current value of an attribute for a given agent  Parameters  agent (input): A valid agent  attribute (input): Attribute to query  value (output): Pointer to a user-allocated buffer where to store the value of the attribute. If the buffer passed by the application is not large enough to hold the value of attribute, the behavior is undefined. © Copyright 2014 HSA Foundation. All Rights Reserved
  • EXAMPLE - AGENT ATTRIBUTE QUERY Copy agent attribute information Get the agent handle of Agent 0 © Copyright 2014 HSA Foundation. All Rights Reserved
  • SIGNALS AND SYNCHRONIZATION (MEMORY-BASED)
  • OUTLIINE  Signal  Signal manipulation API  Create/Destroy  Query  Send  Atomic Operations  Signal wait  Get time out  Signal Condition  Example © Copyright 2014 HSA Foundation. All Rights Reserved
  • SIGNAL (1)  HSA agents can communicate with each other by using coherent global memory, or by using signals.  A signal is represented by an opaque signal handle  A signal carries a value, which can be updated or conditionally waited upon via an API call or HSAIL instruction.  The value occupies four or eight bytes depending on the machine model in use. © Copyright 2014 HSA Foundation. All Rights Reserved
  • SIGNAL (2)  Updating the value of a signal is equivalent to sending the signal.  In addition to the update (store) of signals, the API for sending signal must support other atomic operations with specific memory order semantics  Atomic operations: AND, OR, XOR, Add, Subtract, Exchange, and CAS  Memory order semantics : Release and Relaxed © Copyright 2014 HSA Foundation. All Rights Reserved
  • SIGNAL CREATE/DESTROY  Create a signal  Parameters  initial_value (input): Initial value of the signal.  signal_handle (output): Signal handle.  Destroy a signal previous created by hsa_signal_create  Parameter  signal_handle (input): Signal handle. © Copyright 2014 HSA Foundation. All Rights Reserved
  •  Send and atomically set the value of a signal with release semantics SIGNAL LOAD/STORE  Atomically read the current signal value with acquire semantics  Atomically read the current signal value with relaxed semantics  Send and atomically set the value of a signal with relaxed semantics © Copyright 2014 HSA Foundation. All Rights Reserved
  •  Send and atomically increment the value of a signal by a given amount with release semantics SIGNAL ADD/SUBTRACT  Send and atomically decrement the value of a signal by a given amount with release semantics  Send and atomically increment the value of a signal by a given amount with relaxed semantics  Send and atomically decrement the value of a signal by a given amount with relaxed semantics © Copyright 2014 HSA Foundation. All Rights Reserved
  •  Send and atomically perform a logical AND operation on the value of a signal and a given value with release semantics SIGNAL AND (OR, XOR)/EXCHANGE  Send and atomically set the value of a signal and return its previous value with release semantics  Send and atomically perform a logical AND operation on the value of a signal and a given value with relaxed semantics  Send and atomically set the value of a signal and return its previous value with relaxed semantics © Copyright 2014 HSA Foundation. All Rights Reserved
  • SIGNAL WAIT (1)  The application may wait on a signal, with a condition specifying the terms of wait.  Signal wait condition operator  Values  HSA_EQ: The two operands are equal.  HSA_NE: The two operands are not equal.  HSA_LT: The first operand is less than the second operand.  HSA_GTE: The first operand is greater than or equal to the second operand. © Copyright 2014 HSA Foundation. All Rights Reserved
  • SIGNAL WAIT (2)  The wait can be done either in the HSA component via an HSAIL wait instruction or via a runtime API defined here.  Waiting on a signal returns the current value at the opaque signal object;  The wait may have a runtime defined timeout which indicates the maximum amount of time that an implementation can spend waiting.  The signal infrastructure allows for multiple senders/waiters on a single signal.  Wait reads the value, hence acquire synchronizations may be applied. © Copyright 2014 HSA Foundation. All Rights Reserved
  • SIGNAL WAIT (3)  Signal wait  Parameters  signal_handle (input): A signal handle  condition (input): Condition used to compare the passed and signal values  compare_ value (input): Value to compare with  return_value (output): A pointer where the current signal value must be read into © Copyright 2014 HSA Foundation. All Rights Reserved
  • SIGNAL WAIT (4)  Signal wait with timeout  Parameters  signal_handle (input): A signal handle  timeout (input): Maximum wait duration (A value of zero indicates no maximum)  long_wait (input): Hint indicating that the signal value is not expected to meet the given condition in a short period of time. The HSA runtime may use this hint to optimize the wait implementation.  condition (input): Condition used to compare the passed and signal values  compare_ value (input): Value to compare with  return_value (output): A pointer where the current signal value must be read into © Copyright 2014 HSA Foundation. All Rights Reserved
  • EXAMPLE – SIGNAL WAIT (1) thread_1 thread_2 thread_1 is blocked hsa_signal_add_relaxed (value = value + 3) Return signal value Condition satisfied, the execution of thread_1 continues value = 0 Timeline Timeline value = 3 hsa_signal_substract_relaxed (value = value - 1)value = 2 hsa_signal_wait_timeout_acquire (value == 2) © Copyright 2014 HSA Foundation. All Rights Reserved
  • EXAMPLE – SIGNAL WAIT (2) If signal_handle is invalid, then return signal invalid status Compare tmp->value with compare_value to see if the condition is satisfied? If timeout = 0 then return signal time out status Signal wait condition function If the condition is satisfied, then return signal and status © Copyright 2014 HSA Foundation. All Rights Reserved
  • QUEUES AND ARCHITECTED DISPATCH
  • OUTLINE  Queues  Queue Types and Structure  HSA runtime API for Queue Manipulations  Architected Queuing Language (AQL) Support  Packet type  Packet header  Examples  Enqueue Packet  Packet Processor © Copyright 2014 HSA Foundation. All Rights Reserved
  • INTRODUCTION (1)  An HSA-compliant platform supports multiple user-level command queues allocation.  A use-level command queue is characterized as runtime-allocated, user-level accessible virtual memory of a certain size, containing packets defined in the Architected Queuing Language (AQL packets).  Queues are allocated by HSA applications through the HSA runtime.  HSA software receives memory-based structures to configure the hardware queues to allow for efficient software management of the hardware queues of the HSA agents.  This queue memory shall be processed by the HSA Packet Processor as a ring buffer.  Queues are read-only data structures.  Writing values directly to a queue structure results in undefined behavior.  But HSA agents can directly modify the contents of the buffer pointed by base_address, or use runtime APIs to access the doorbell signal or the service queue. © Copyright 2014 HSA Foundation. All Rights Reserved
  •  Two queue types, AQL and Service Queues, are supported  AQL Queue consumes AQL packets that are used to specify the information of kernel functions that will be executed on the HSA component  Service Queue consumes agent dispatch packets that are used to specify runtime-defined or user registered functions that will be executed on the agent (typically, the host CPU) INTRODUCTION (2) © Copyright 2014 HSA Foundation. All Rights Reserved
  • INTRODUCTION (3)  AQL queue structure © Copyright 2014 HSA Foundation. All Rights Reserved
  • INTRODUCTION (4)  In addition to the data held in the queue structure, the queue also defines two properties (readIndex and writeIndex) that define the location of “head” and “tail” of the queue.  readIndex: The read index is a 64-bit unsigned integer that specifies the packetID of the next AQL packet to be consumed by the packet processor.  writeIndex: The write index is a 64-bit unsigned integer that specifies the packetID of the next AQL packet slot to be allocated.  Both indices are not directly exposed to the user, who can only access them by using dedicated HSA core runtime APIs.  The available index functions differ on the index of interest (read or write), action to be performed (addition, compare and swap, etc.), and memory consistency model (relaxed, release, etc.). © Copyright 2014 HSA Foundation. All Rights Reserved
  • INTRODUCTION (5)  The read index is automatically advanced when a packet is read by the packet processor.  When the packet processor observes that  The read index matches the write index, the queue can be considered empty;  The write index is greater than or equal to the sum of the read index and the size of the queue, then the queue is full.  The doorbell_signal field of a queue contains a signal that is used by the agent to inform the packet processor to process the packets it writes.  The value that the doorbell signaled is equal to the ID of the packet that is ready to be launched. © Copyright 2014 HSA Foundation. All Rights Reserved
  • INTRODUCTION (6)  The new task might be consumed by the packet processor even before the doorbell signal has been signaled by the agent.  This is because the packet processor might be already processing some other packets and observes that there is new work available, so it processes the new packets.  In any case, the agent must ring the doorbell for every batch of packets it writes. © Copyright 2014 HSA Foundation. All Rights Reserved
  • QUEUE CREATE/DESTROY  Create a user mode queue  When a queue is created, the runtime also allocates the packet buffer and the completion signal.  The application should only rely on the status code returned to determine if the queue is valid  Destroy a user mode queue  A destroyed queue might not be accessed after being destroyed.  When a queue is destroyed, the state of the AQL packets that have not been yet fully processed becomes undefined. © Copyright 2014 HSA Foundation. All Rights Reserved
  • GET READ/WRITE INDEX  Atomically retrieve read index of a queue with acquire semantics  Atomically retrieve write index of a queue with acquire semantics  Atomically retrieve read index of a queue with relaxed semantics  Atomically retrieve write index of a queue with relaxed semantics © Copyright 2014 HSA Foundation. All Rights Reserved
  • SET READ/WRITE INDEX  Atomically set the read index of a queue with release semantics  Atomically set the read index of a queue with relaxed semantics  Atomically set the write index of a queue with release semantics  Atomically set the write index of a queue with relaxed semantics © Copyright 2014 HSA Foundation. All Rights Reserved
  • COMPARE AND SWAP WRITE INDEX  Atomically compare and set the write index of a queue with acquire/release/relaxed/acquire- release semantics  Parameters  queue (input): A queue  expected (input): The expected index value  val (input): Value to copy to the write index if expected matches the observed write index  Return value  Previous value of the write index © Copyright 2014 HSA Foundation. All Rights Reserved
  • ADD WRITE INDEX  Atomically increment the write index of a queue by an offset with release/acquire/relaxed/acquire-release semantics  Parameters  queue (input): A queue  val (input): The value to add to the write index  Return value  Previous value of the write index © Copyright 2014 HSA Foundation. All Rights Reserved
  • ARCHITECTED QUEUING LANGUAGE (AQL)  An HSA-compliant system provides a command interface for the dispatch of HSA agent commands.  This command interface is provided by the Architected Queuing Language (AQL).  AQL allows HSA agents to build and enqueue their own command packets, enabling fast and low-power dispatch.  AQL also provides support for HSA component queue submissions  The HSA component kernel can write commands in AQL format. © Copyright 2014 HSA Foundation. All Rights Reserved
  • AQL PACKET (1)  AQL packet format  Values  Always reserved packet (0): Packet format is set to always reserved when the queue is initialized.  Invalid packet (1): Packet format is set to invalid when the readIndex is incremented, making the packet slot available to the HSA agents.  Dispatch packet (2): Dispatch packets contain jobs for the HSA component and are created by HSA agents.  Barrier packet (3): Barrier packets can be inserted by HSA agents to delay processing subsequent packets. All queues support barrier packets.  Agent dispatch packet (4): Dispatch packets contain jobs for the HSA agent and are created by HSA agents. © Copyright 2014 HSA Foundation. All Rights Reserved
  • AQL PACKET (2) HSA signaling object handle used to indicate completion of the job © Copyright 2014 HSA Foundation. All Rights Reserved
  • EXAMPLE - ENQUEUE AQL PACKET (1)  An HSA agent submits a task to a queue by performing the following steps:  Allocate a packet slot (by incrementing the writeIndex)  Initialize the packet and copy packet to a queue associated with the Packet Processor  Mark packet as valid  Notify the Packet Processor of the packet (With doorbell signal) © Copyright 2014 HSA Foundation. All Rights Reserved
  • EXAMPLE - ENQUEUE AQL PACKET (2) Dispatch Queue Allocate an AQL packet slot Copy the packet into queue. Note that, we can have a lock here to prevent race condition in multithread environment WriteIndex ReadIndex Initialize packet Send doorbell signal © Copyright 2014 HSA Foundation. All Rights Reserved
  • EXAMPLE - PACKET PROCESSOR WriteIndex ReadIndex Get packet content Check if barrier packet Update readIndex, change packet state to invalid, and send completion signal. Receive doorbell Dispatch Queue If there is any packet in queue, process the packet. © Copyright 2014 HSA Foundation. All Rights Reserved
  • MEMORY MANAGEMENT
  • OUTLINE  Memory registration and deregistration  Memory region and memory segment  APIs for memory region manipulation  APIs for memory registration and deregistration © Copyright 2014 HSA Foundation. All Rights Reserved
  • INTRODUCTION  One of the key features of HSA is its ability to share global pointers between the host application and code executing on the HSA component.  This ability means that an application can directly pass a pointer to memory allocated on the host to a kernel function dispatched to a component without an intermediate copy  When a buffer created in the host is also accessed by a component, programmers are encouraged to register the corresponding address range beforehand.  Registering memory expresses an intention to access (read or write) the passed buffer from a component other than the host. This is a performance hint that allows the runtime implementation to know which buffers will be accessed by some of the components ahead of time.  When an HSA program no longer needs to access a registered buffer in a device, the user should deregister that virtual address range. © Copyright 2014 HSA Foundation. All Rights Reserved
  • MEMORY REGION/SEGMENT  A memory region represents a virtual memory interval that is visible to a particular agent, and contains properties about how memory is accessed or allocated from that agent.  Memory segments  Values  HSA_SEGMENT_GLOBAL = 1  HSA_SEGMENT_PRIVATE = 2  HSA_SEGMENT_GROUP = 4  HSA_SEGMENT_KERNARG = 8  HSA_SEGMENT_READONLY = 16  HSA_SEGMENT_IMAGE = 32 © Copyright 2014 HSA Foundation. All Rights Reserved
  • MEMORY REGION INFORMATION  Attributes of a memory region  Values  HSA_REGION_INFO_BASE_ADDRESS  HSA_REGION_INFO_SIZE  HSA_REGION_INFO_NODE  HSA_REGION_INFO_MAX_ALLOCATION_SIZE  HSA_REGION_INFO_SEGMENT  HSA_REGION_INFO_BANDWIDTH  HSA_REGION_INFO_CACHED © Copyright 2014 HSA Foundation. All Rights Reserved
  • MEMORY REGION MANIPULATION (1)  Get the current value of an attribute of a region  Iterate over the memory regions that are visible to an agent, and invoke an application-defined callback on every iteration  If callback returns a status other than HSA_STATUS_SUCCESS for a particular iteration, the traversal stops and the function returns that status value. © Copyright 2014 HSA Foundation. All Rights Reserved
  • MEMORY REGION MANIPULATION (2)  Allocate a block of memory  Deallocate a block of memory previously allocated using hsa_memory_allocate  Copy block of memory  Copying a number of bytes larger than the size of the memory regions pointed by dst or src results in undefined behavior. © Copyright 2014 HSA Foundation. All Rights Reserved
  • MEMORY REGISTRATION/DEREGISTRATION  Register memory  Parameters  address (input): A pointer to the base of the memory region to be registered. If a NULL pointer is passed, no operation is performed.  size (input): Requested registration size in bytes. A size of zero is only allowed if address is NULL.  Deregister memory previously registered using hsa_memory_register  Parameter  address (input): A pointer to the base of the memory region to be registered. If a NULL pointer is passed, no operation is performed. © Copyright 2014 HSA Foundation. All Rights Reserved
  • EXAMPLE Allocate a memory space Use hsa_region_get_info to get the size in byte of this memory space Register this memory space for a performance hint Finish operation, deregister and free this memory space © Copyright 2014 HSA Foundation. All Rights Reserved
  • SUMMARY
  • SUMMARY  Covered  HSA Core Runtime API (Pre-release 1.0 provisional)  Runtime Initialization and Shutdown (Open/Close)  Notifications (Synchronous/Asynchronous)  Agent Information  Signals and Synchronization (Memory-Based)  Queues and Architected Dispatch  Memory Management  Not covered  Extension of Core Runtime  HSAIL Finalization, Linking, and Debugging  Images and Samplers © Copyright 2014 HSA Foundation. All Rights Reserved
  • QUESTIONS? © Copyright 2014 HSA Foundation. All Rights Reserved