Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

SYCL 1.2.1 Reference Card

112 views

Published on

The Official Khronos™ Group SYCL 1.2.1 Reference Card

Published in: Technology
  • Be the first to comment

  • Be the first to like this

SYCL 1.2.1 Reference Card

  1. 1. www.khronos.org/sycl©2018 Khronos Group - Rev. 0818 SYCL 1.2.1 API Reference Guide Page 1 Runtime: Device class [4.6.4] device(); explicit device(const device_selector &deviceSelector); explicit device(cl_device_id deviceId); cl_device_id get() const; platform get_platform() const; bool is_host() const; bool is_cpu() const; bool is_gpu() const; bool is_accelerator() const; template <info::device param> typename info::param_traits< info::device, param>::return_type get_info() const; bool has_extension(const string_class &extension) const; template <info::partition_property prop> vector_class<device> create_sub_devices( size_t nbSubDev) const; template <info::partition_property prop> vector_class<device> create_sub_devices( const vector_class<size_t> &counts) const; template <info::partition_property prop> vector_class<device> create_sub_devices( info::affinity_domain affinityDomain) const; static vector_class<device> get_devices( info::device_type deviceType = info::device_type::all); (Continued on next page) SYCL™ is a C++ programming model for OpenCL which builds on the underlying concepts, portability, and efficiency of OpenCL while adding much of the ease of use and flexibility of C++. [n.n] refers to sections in the SYCL 1.2.1 specification available at khronos.org/registry/sycl SYCL example [3.1.3] Below is an example of a typical SYCL application which schedules a job to run in parallel on any OpenCL accelerator. #include <CL/sycl.hpp> #include <iostream> int main() { using namespace cl::sycl; int data[1024]; // Allocates data to be worked on // Include all the SYCL work in a {} block to ensure all // SYCL tasks are completed before exiting the block. { // Create a queue to enqueue work to queue myQueue; // Wrap the data array variable in a buffer. buffer<size_t, 1> resultBuf { data, range<1> { 1024 } }; // Create a command group to // issue commands to the queue. myQueue.submit([&](handler & cgh) { // Request access to the buffer auto writeResult = resultBuf.get_access<access::mode::write>(cgh); // Enqueue a parallel_for task. cgh.parallel_for<class simple_test>(range<1>{ 1024 }, [=](id<1> idx) { writeResult[idx] = idx[0]; }); // End of the kernel function }); // End of the queue commands } // End of scope, so wait for the queued work to complete // Print result for (int i = 0; i < 1024; i++) { std::cout <<''data[''<< i << ''] = '' << data[i] << std::endl; } return 0; } Runtime: Context class [4.6.3] explicit context(async_handler asyncHandler = {}); context(const device &dev, async_handler asyncHandler = {}); context(const platform &plt, async_handler asyncHandler = {}); context(const vector_class<device> &deviceList, async_handler asyncHandler = {}); context(cl_context clContext, async_handler asyncHandler = {}); cl_context get() const; bool is_host() const; template <info::context param> typename info::param_traits <info::context, param>::return_type get_info() const; platform get_platform() const; vector_class<device> get_devices() const; Context queries using get_info(): Descriptor Return type info::context::reference_count cl_uint info::context::platform platform info::context::devices vector_class<device> Runtime: Platform class [4.6.2] platform(); explicit platform(cl_platform_id platformID); explicit platform(const device_selector &deviceSelector); cl_platform_id get() const; static vector_class<platform> get_platforms(); vector_class<device> get_devices( info::device_type = info::device_type::all) const; template <info::platform param> typename info::param_traits< info::platform, param>::return_type get_info() const; bool has_extension(const string_class & extension) const; bool is_host() const; Platform information descriptors: Platform descriptors Return type info::platform::profile string_class info::platform::version string_class info::platform::name string_class info::platform::vendor string_class info::platform::extensions vector_class<string_class> Header file SYCL programs must include the <CL/sycl.hpp> header file to provide all of the SYCL features. Namespace All SYCL names are defined in the cl::sycl namespace. Queue See queue class functions [4.6.5] on page 2 of this reference guide. Buffer See buffer class functions [4.7.2] on page 2 of this reference guide. Accessor See accessor class function [4.7.6] on page 3 of this reference guide. Handler See handler class functions [4.8.3] on page 5 of this reference guide. Scopes The kernel scope specifies a single kernel function that will be, or has been, compiled by a device compiler and executed on a device. See Invoking Kernels [4.8.5] in the spec. The command group scope specifies a unit of work which is comprised of a kernel function and accessors. The application scope specifies all other code outside of a command group scope. Runtime: Device selection class [4.6.1] device_selector(); device_selector(const device_selector &rhs); device_selector &operator=(const device_selector &rhs); virtual ~device_selector(); device select_device() const; virtual int operator()(const device &device) const; SYCL device selectors: Device selectors Description default_selector Devices selected by heuristics of the system gpu_selector Select devices according to device type info::device::device_type::gpu cpu_selector Select devices according to device type info::device::device_type::cpu host_selector Selects the SYCL host Runtime: Common interface [4.3.2 - 4.3.4] Member functions for by-value semantics T may be device, context, queue, program, kernel, event, buffer, image, sampler, accessor, or stream. T(const T &rhs); T(T &&rhs); T &operator=(const T &rhs); T &operator=(T &&rhs); ~T(); bool operator==(const T &rhs) const; bool operator!=(const T &rhs) const; Member functions for by-value semantics In the following functions, T may be id, range, item, nd_item, h_item, group, or nd_range. T(const T &rhs) = default; T(T &&rhs) = default; T &operator=(const T &rhs) = default; T &operator=(T &&rhs) = default; ~T() = default; bool operator==(const T &rhs) const; bool operator!=(const T &rhs) const; Properties interface [4.3.4.1] Some runtime classes provide this properties interface. class T { . . . template <typename propertyT> bool has_property() const; template <typename propertyT> propertyT get_property() const; . . . }; class property_list { public: template <typename... propertyTN> property_list(propertyTN... props); };
  2. 2. www.khronos.org/sycl©2018 Khronos Group - Rev. 0818 SYCL 1.2.1 API Reference Guide Page 2 Runtime: Device class (continued) Device queries using get_info(): Descriptor in info::device Return type device_type info::device_type vendor_id cl_uint max_compute_units cl_uint max_work_item_dimensions cl_uint max_work_item_sizes id<3> max_work_group_size size_t preferred_vector_width_char cl_uint preferred_vector_width_short preferred_vector_width_int preferred_vector_width_long preferred_vector_width_float preferred_vector_width_double preferred_vector_width_half native_vector_width_char cl_uint native_vector_width_short native_vector_width_int native_vector_width_long native_vector_width_float native_vector_width_double native_vector_width_half max_clock_frequency cl_uint address_bits cl_uint max_mem_alloc_size cl_ulong image_support bool Descriptor in info::device Return type max_read_image_args cl_uint max_write_image_args cl_uint image2d_max_width size_t image2d_max_height size_t image3d_max_width size_t image3d_max_height size_t image3d_max_depth size_t image_max_buffer_size size_t image_max_array_size size_t max_samplers cl_uint max_parameter_size size_t mem_base_addr_align cl_uint half_fp_config vector_class<info::fp_config> single_fp_config vector_class<info::fp_config> double_fp_config vector_class<info::fp_config> global_mem_cache_type info::global_mem_cache_type global_mem_cache_line_size cl_uint global_mem_cache_size cl_ulong global_mem_size cl_ulong max_constant_buffer_size cl_ulong max_constant_args cl_uint local_mem_type info::local_mem_type local_mem_size cl_ulong error_correction_support bool host_unified_memory bool profiling_timer_resolution size_t is_endian_little bool Descriptor in info::device Return type is_available bool is_compiler_available bool is_linker_available bool execution_capabilities vector_class <info::execution_capability> queue_profiling bool built_in_kernels vector_class<string_class> platform platform name string_class vendor string_class driver_version string_class profile string_class version string_class opencl_c_version string_class extensions vector_class<string_class> printf_buffer_size size_t preferred_interop_user_sync bool parent_device device partition_max_sub_devices cl_uint partition_properties vector_class <info:: partition_property> partition_affinity_domains vector_class <info::partition_affinity_ domain> partition_type_property info::partition_property partition_type_affinity_domain info::partition_affinity_domain reference_count cl_uint Runtime: Queue class [4.6.5] Property class members: property::queue::enable_profiling::enable_profiling(); explicit queue(const property_list &propList = {}); queue(const async_handler &asyncHandler, const property_list &propList = {}); queue(const device_selector &deviceSelector, const property_list &propList = {}); queue(const device_selector &deviceSelector, const async_handler &asyncHandler, const property_list &propList = {}); queue(const device &syclDevice, const property_list &propList = {}); queue(const device &syclDevice, const async_handler &asyncHandler, const property_list &propList = {}); queue(const context &syclContext, const device_selector &deviceSelector, const property_list &propList = {}); queue(const context &syclContext, const device_selector &deviceSelector, const async_handler &asyncHandler, const property_list &propList = {}); queue(cl_command_queue clQueue, const context &syclContext, const async_handler &asyncHandler = {}); cl_command_queue get() const; context get_context() const; device get_device() const; bool is_host() const; void wait(); void wait_and_throw(); void throw_asynchronous(); template <info::queue param> typename info::param_traits< info::queue, param>::return_type get_info() const; template <typename T> event submit(T cgf); template <typename T> event submit(T cgf, const queue &secondaryQueue); Queue queries using get_info(): Descriptor Return type info::queue::context context info::queue::device device info::queue::reference_count cl_uint Buffer class [4.7.2] Property class members: property::buffer::use_host_ptr::use_host_ptr(); property::buffer::use_mutex::use_mutex(mutex_class &mutexRef); property::buffer::context_bound::context_bound( context boundContext); mutex_class *property::buffer::use_mutex::get_mutex_ptr() const; context property::buffer::context_bound::get_context() const; Class Declaration template <typename T, int dimensions = 1, typename AllocatorT = cl::sycl::buffer_allocator> class buffer; Member Functions buffer(const range<dimensions> &bufferRange, const property_list &propList = {}); buffer(const range<dimensions> &bufferRange, AllocatorT allocator, const property_list &propList = {}); buffer(T *hostData, const range<dimensions> &bufferRange, const property_list &propList = {}); buffer(T *hostData, const range<dimensions> &bufferRange, AllocatorT allocator, const property_list &propList = {}); buffer(const T *hostData, const range<dimensions> &bufferRange, const property_list &propList = {}); buffer(const T *hostData, const range<dimensions> &bufferRange, AllocatorT allocator, const property_list &propList = {}); buffer(shared_ptr_class<T> &hostData, const range<dimensions> &bufferRange, AllocatorT allocator, const property_list &propList = {}); buffer(shared_ptr_class<T> &hostData, const range<dimensions> &bufferRange, const property_list &propList = {}); template <class InputIterator> buffer(InputIterator first, InputIterator last, AllocatorT allocator, const property_list &propList = {}); template <class InputIterator> buffer(InputIterator first, InputIterator last, const property_list &propList = {}); buffer(buffer<T, dimensions, AllocatorT> &b, const index<dimensions> &baseIndex, const range<dimensions> &subRange); buffer(cl_mem clMemObject, const context &syclContext, event availableEvent = {}); range<dimensions> get_range() const; size_t get_count() const; size_t get_size() const; AllocatorT get_allocator() const; template <access::mode mode, access::target target = access::target::global_buffer> accessor<T, dimensions, mode, target> get_access(handler &commandGroupHandler); template <access::mode mode>accessor<T, dimensions, mode, access::target::host_buffer> get_access(); template <access::mode mode, access::target target = access::target::global_buffer> accessor<T, dimensions, mode, target> get_access(handler &commandGroupHandler, range<dimensions> accessRange, id<dimensions> accessOffset = {}); template <access::mode mode> accessor<T, dimensions, mode, access::target::host_buffer> get_access(range<dimensions> accessRange, id<dimensions> accessOffset = {}); template <typename Destination = std::nullptr_t> void set_final_data(Destination finalData = std::nullptr); void set_write_back(bool flag = true); bool is_sub_buffer() const; template <typename reinterpretT, int reinterpretDim> buffer<reinterpretT, reinterpretDim, AllocatorT> reinterpret(range<reinterpretDim> reinterpretRange) const; Runtime: Event class [4.6.6] event() event(cl_event clEvent, const context &syclContext); cl_event get(); vector_class<event> get_wait_list(); void wait(); void wait_and_throw(); static void wait(const vector_class<event> &eventList); static void wait_and_throw( const vector_class<event> &eventList); template <info::event param>typename info::param_traits< info::event, param>::return_type get_info() const; template <info::event_profiling param> typename info::param_traits<info::event_profiling, param>::return_type get_profiling_info() const; Event queries using get_info() Descriptor Return type info::event::command_execution_status info::event::command_ status info::event::reference_count cl_uint info::event_profiling::command_submit cl_ulong info::event_profiling::command_start cl_ulong info::event_profiling::command_end cl_ulong
  3. 3. www.khronos.org/sycl©2018 Khronos Group - Rev. 0818 SYCL 1.2.1 API Reference Guide Page 3 Accessor class [4.7.6] Enums access::mode access::target read, write, read_write, discard_write, discard_read_write, atomic global_buffer, constant_buffer, local, image, host_buffer, host_image, image_array Class Declaration template <typename dataT, int dimensions, access::mode accessmode, access::target accessTarget = access::target::global_buffer, access::placeholder isPlaceholder = access::placeholder::false_t> class accessor; Members of class accessor for buffers accessor(buffer<dataT, 1> &bufferRef); accessor(buffer<dataT, 1> &bufferRef, handler &commandGroupHandlerRef); accessor(buffer<dataT, dimensions> &bufferRef); accessor(buffer<dataT, dimensions> &bufferRef, handler &commandGroupHandlerRef); accessor(buffer<dataT, dimensions> &bufferRef, range<dimensions> accessRange, id<dimensions> accessOffset = {}); accessor(buffer<dataT, dimensions> &bufferRef, handler &commandGroupHandlerRef, range<dimensions> accessRange, id<dimensions> accessOffset = {}); constexpr bool is_placeholder() const; size_t get_size() const; size_t get_count() const; range<dimensions> get_range() const; range<dimensions> get_offset() const; operator dataT &() const; dataT &operator[](id<dimensions> index) const; dataT &operator[](size_t index) const; operator dataT() const; dataT operator[](id<dimensions> index) const; dataT operator[](size_t index) const; operator atomic<dataT, access::address_space::global_space> () const; atomic<dataT, access::address_space::global_space> operator[](id<dimensions> index) const; atomic<dataT, access::address_space::global_space> operator[](size_t index) const; __unspecified__ &operator[](size_t index) const; dataT *get_pointer() const; global_ptr<dataT> get_pointer() const; constant_ptr<dataT> get_pointer() const; Members of class accessor for local access accessor(handler &commandGroupHandlerRef); accessor(range<dimensions> allocationSize, handler &commandGroupHandlerRef); size_t get_size() const; size_t get_count() const; operator dataT &() const; dataT &operator[](id<dimensions> index) const; dataT &operator[](size_t index) const; operator atomic<dataT, access::address_space::local_space> () const; atomic<dataT, access::address_space::local_space> operator[](id<dimensions> index) const; atomic<dataT, access::address_space::local_space> operator[](size_t index) const; __unspecified__ &operator[](size_t index) const; local_ptr<dataT> get_pointer() const; Members of class accessor for images template <typename AllocatorT> accessor(image<dimensions, AllocatorT> &imageRef); template <typename AllocatorT> accessor(image<dimensions, AllocatorT> &imageRef, handler &commandGroupHandlerRef); template <typename AllocatorT> accessor(image<dimensions + 1, AllocatorT> &imageRef, handler &commandGroupHandlerRef); accessor<dataT, dimensions, mode, image> operator[](size_t index) const; size_t get_size() const; size_t get_count() const; template <typename coordT> dataT read(const coordT &coords) const; template <typename coordT> dataT read(const coordT &coords, const sampler &smpl) const; template <typename coordT> void write(const coordT &coords, const dataT &color) const; Accessor Capabilities Buffer [Table 4.44] The data type must match that of the SYCL buffer. The dimensionality is 0, 1, 2, or 3. Access Target Accessor Type Access Mode Placeholder modes global_buffer Device read, write, read_write discard_write, discard_read_write, atomic false_t true_t constant_buffer Device read false_t true_t host_buffer Host read, write, read_write discard_write, discard_read_write false_t Local [Table 4.47] The data type must match that of the SYCL buffer. The dimensionality is 0, 1, 2, or 3. Access Target Accessor Type Access Mode Placeholder modes local Device read_write, atomic false_t (Continued on next page) Image class [4.7.3] Property class members: property::image::use_host_ptr::use_host_ptr(); property::image::use_mutex::use_mutex(mutex_class &mutexRef); property::image::context_bound::context_bound( context boundContext); mutex_class *property::image::use_mutex::get_mutex_ptr() const; context property::image::context_bound::get_context() const; Enums image_channel_order image_channel_type a, r, rx, rg, rgx, ra, rgb, rgbx, rgba, argb, bgra, intensity, luminance, abgr snorm_int8, snorm_int16, unorm_int8, unorm_int16, unorm_short_565, unorm_short_555, unorm_int_101010, signed_int8, signed_int16, signed_int32, unsigned_int8, unsigned_int16, unsigned_int32, fp16, fp32 Class Declaration template <int dimensions = 1, typename AllocatorT = cl::sycl::image_allocator> class image; Member Functions image(image_channel_order order, image_channel_type type, const range<dimensions> &range, const property_list &propList = {}); image(image_channel_order order, image_channel_type type, const range<dimensions> &range, AllocatorT allocator, const property_list &propList = {}); image(image_channel_order order, image_channel_type type, const range<dimensions> &range, const range<dimensions - 1> &pitch, const property_list &propList = {}); image(void *hostPointer, image_channel_order order, image_channel_type type, const range<dimensions> &range, const range<dimensions - 1> &pitch, AllocatorT allocator, const property_list &propList = {}); image(void *hostPointer, image_channel_order order, image_channel_type type, const range<dimensions> &range, const property_list &propList = {}); image(void *hostPointer, image_channel_order order, image_channel_type type, const range<dimensions> &range, AllocatorT allocator, const property_list &propList = {}); image(void *hostPointer, image_channel_order order, image_channel_type type, const range<dimensions> &range, const range<dimensions - 1> &pitch, const property_list &propList = {}); image(void *hostPointer, image_channel_order order, image_channel_type type, const range<dimensions> &range, const range<dimensions - 1> &pitch, AllocatorT allocator, const property_list &propList = {}); image(void *hostPointer, image_channel_order order, image_channel_type type, const range<dimensions> &range, const property_list &propList = {}); image(void *hostPointer, image_channel_order order, image_channel_type type, const range<dimensions> &range, AllocatorT allocator, const property_list &propList = {}); image(shared_ptr_class<void> &hostPointer, image_channel_order order, image_channel_type type, const range<dimensions> &range, const property_list &propList = {}); image(shared_ptr_class<void> &hostPointer, image_channel_order order, image_channel_type type, const range<dimensions> &range, AllocatorT allocator, const property_list &propList = {}); image(shared_ptr_class<void> &hostPointer, image_channel_order order, image_channel_type type, const range<dimensions> &range, const range<dimensions - 1> &pitch, const property_list &propList = {}); image(shared_ptr_class<void> &hostPointer, image_channel_order order, image_channel_type type, const range<dimensions> &range, const range<dimensions - 1> &pitch, AllocatorT allocator, const property_list &propList = {}); image(cl_mem clMemObject, const context &syclContext, event availableEvent = {}); range<dimensions> get_range() const; range<dimensions-1> get_pitch() const; size_t get_count() const; size_t get_size() const; AllocatorT get_allocator() const; template <typename dataT, access::mode mode> accessor<dataT, dimensions, accessMode, access::target::image> get_access(handler &commandGroupHandler); template <typename dataT, access::mode mode> accessor<dataT, dimensions, accessMode, access::target::host_image> get_access(); template <typename Destination = std::nullptr_t> void set_final_data(Destination finalData = std::nullptr); void set_write_back(bool flag = true); Data management: Host allocation [4.7.1] The default allocator for memory objects is implementation defined, but users can supply their own allocator class. buffer<int, 1, UserDefinedAllocator<int> > b(d); Default allocators Allocators Description buffer_allocator Default buffer allocator used by the runtime, when no allocator is defined by the user. image_allocator Default image allocator used by the runtime for the image class, when no allocator is defined by the user. Must be a byte-sized allocator.
  4. 4. www.khronos.org/sycl©2018 Khronos Group - Rev. 0818 SYCL 1.2.1 API Reference Guide Page 4 Multi_ptr class [4.7.7] Class Declaration template <typename elementType, access::address_space addressSpace> class multi_ptr; Member Functions multi_ptr(); multi_ptr(const multi_ptr&); multi_ptr(multi_ptr&&); multi_ptr(pointer_t); multi_ptr(elementType*); multi_ptr(void*); multi_ptr(std::nullptr_t); ~multi_ptr(); multi_ptr &operator=(const multi_ptr&); multi_ptr &operator=(multi_ptr&&); multi_ptr &operator=(pointer_t); multi_ptr &operator=(elementType*); multi_ptr &operator=(void*); multi_ptr &operator=(std::nullptr_t); elementType &operator=() const; elementType *operator=() const; template <int dimensions, access::mode addressMode> multi_ptr(accessor<elementType, dimensions, addressMode, access::target::global_buffer>); template <int dimensions, access::mode addressMode> multi_ptr(accessor<elementType, dimensions, addressMode, access::target::local>); template <int dimensions, access::mode addressMode> multi_ptr(accessor<elementType, dimensions, addressMode, access::target::constant_buffer>); template <typename elementType, int dimensions, access::mode addressMode> multi_ptr(accessor<elementType, dimensions, addressMode, access::target::global_buffer>); template <typename elementType, int dimensions, access::mode addressMode> multi_ptr (accessor<elementType, dimensions, addressMode, access::target::local>); template <typename elementType, int dimensions, access::mode addressMode> multi_ptr(accessor<elementType, dimensions, addressMode, access::target::constant_buffer>); pointer_t get() const; operator void*() const; template <typename elementType> explicit operator multi_ptr<elementType, addressSpace>() const; Non-member Functions template <typename elementType, access::address_space addressSpace> multi_ptr<elementType, addressSpace> make_ptr(multi_ptr<elementType, addressSpace>::pointer_t); template <typename elementType, access::address_space addressSpace> multi_ptr<elementType, addressSpace> make_ptr(elementType*); template <typename elementType, access::address_space addressSpace> bool operatorOP(const multi_ptr<elementType, addressSpace> & lhs, const multi_ptr<elementType, addressSpace>& rhs); OP is one of ==, !=, <, >, <=, >= template <typename elementType, access::address_space addressSpace> bool operatorOP(const multi_ptr<elementType, addressSpace>& lhs, std::nullptr_t rhs); OP is one of ==, !=, <, >, <=, >= template <typename elementType, access::address_space addressSpace> bool operatorOP(std::nullptr_t lhs, const multi_ptr<elementType, addressSpace>& rhs); OP is one of ==, !=, <, >, <=, >= Template specialization aliases [4.7.7.2] template <typename elementType> using global_ptr = multi_ptr<elementType, access::address_space::global_space>; template <typename elementType> using local_ptr = multi_ptr<elementType, access::address_space::local_space>; template <typename elementType> using constant_ptr = multi_ptr<elementType, access::address_space::constant_space>; template <typename elementType> using private_ptr = multi_ptr<elementType, access::address_space::private_space>; void prefetch(size_t numElements) const; Ranges and index space identifiers[4.8.1] Class range [4.8.1.1] Class declaration template <size_t dimensions = 1> class range; Member functions range(size_t dim0); range(size_t dim0, size_t dim1); range(size_t dim0, size_t dim1, size_t dim2); size_t get(int dimension) const; size_t &operator[](int dimension); size_t size() const; range<dimensions> operatorOP( const range<dimensions> &rhs) const; range<dimensions> operatorOP(const size_t &rhs) const; range<dimensions> &operatorOP( const range<dimensions> &rhs); range<dimensions> &operatorOP(const size_t &rhs); Non-member function template <int dimensions> range<dimensions> operatorOP(const size_t &lhs, const range<dimensions> &rhs); Class nd_range [4.8.1.2] Class declaration template <size_t dimensions = 1> struct nd_range; Member functions nd_range(range<dimensions> globalSize, range<dimensions> localSize, id<dimensions> offset = id<dimensions>()); range<dimensions> get_global_range() const; range<dimensions> get_local_range() const; range<dimensions> get_group_range() const; id<dimensions> get_offset() const; Class id [4.8.1.3] Class declaration template <size_t dimensions = 1> struct id; Member functions id(); id(size_t dim0); id(size_t dim0, size_t dim1); id(size_t dim0, size_t dim1, size_t dim2); id(const range<dimensions> &range); id(const item<dimensions> &item); size_t get(int dimension) const; size_t &operator[](int dimension) const; operator size_t(); id<dimensions> operatorOP(const id<dimensions> &rhs) const; where OP is one of +, -, *, /, %, <<, >>, &, |, ˆ, &&, ||, <, >, <=, >= id<dimensions> operatorOP(const size_t &rhs) const; where OP is one of +, -, *, /, %, <<, >>, &, |, ˆ, &&, ||, <, >, <=, >= id<dimensions> &operatorOP(const id<dimensions> &rhs); where OP is one of +=, -=, *=, /=, %=, <<=, >>=, &=, |=, ˆ= id<dimensions> &operatorOP(const size_t &rhs); where OP is one of +=, -=, *=, /=, %=, <<=, >>=, &=, |=, ˆ= Non-member functions template <int dimensions> id<dimensions> operatorOP( const size_t &lhs, const id<dimensions> &rhs); where OP is one of +, -, *, /, %, <<, >>, &, |, ˆ Class item [4.8.1.4-5] Class declaration template <int dimensions = 1, bool with_offset = true> struct item; Member functions id<dimensions> get_id() const; size_t get_id(int dimension) const; size_t &operator[](int dimension); range<dimensions> get_range() const; id<dimensions> get_offset() const; operator item<dimensions, true>() const; size_t get_linear_id() const; Class nd_item [4.8.1.6] Class declaration template <size_t dimensions = 1> struct nd_item; Member functions id<dimensions> get_global_id() const; size_t get_global_id(int dimension) const; size_t get_global_linear_id() const; id<dimensions> get_local_id() const; size_t get_local_id(int dimension) const; size_t get_local_linear_id() const; group<dimensions> get_group() const; size_t get_group(int dimension) const; size_t get_group_linear_id() const; size_t get_num_range(int dimension) const; id<dimensions> get_num_range() const; range<dimensions> get_global_range() const; range<dimensions> get_local_range() const; id<dimensions> get_offset() const; nd_range<dimensions> get_nd_range() const; void barrier(access::fence_space accessSpace = access::fence_space::global_and_local) const; template <access::mode accessMode = access::mode::read_write> void mem_fence(access::fence_space accessSpace = access::fence_space::global_and_local) const; template <typename dataT> device_event async_work_group_copy( local_ptr<dataT> dest, global_ptr<dataT> src, size_t numElements) const; template <typename dataT> device_event async_work_group_copy( global_ptr<dataT> dest, local_ptr<dataT> src, size_t numElements) const; (Continued on next page) Sampler class [4.7.8] Enums addressing_mode sampler_filtering_mode mirrored_repeat repeat clamp_to_edge clamp none nearest linear coordinate_normalization_mode normalized unnormalized Class sampler members sampler(coordinate_normalization_mode normalizationMode, addressing_mode addressingMode, filtering_mode filteringMode); sampler(cl_sampler clSampler, const context &syclContext); addressing_mode get_addresssing_mode() const; filtering_mode get_filtering_mode() const; coordinate_normalization_mode get_coordinate_normalization_mode() const; Accessor class (continued) Image [Table 4.50] The data types are cl_int4, cl_uint4, cl_float4, and cl_half4, and the dimensionality is between 1 and 3 (inclusive). Access Target Accessor Type Dimensionality Access Mode Place- holder image Device 1, 2, or 3 read, write, discard_write, false_timage_array Device 1 or 2 host_image Host 1, 2, or 3
  5. 5. www.khronos.org/sycl©2018 Khronos Group - Rev. 0818 SYCL 1.2.1 API Reference Guide Page 5 Kernel class [4.8.7] kernel(cl_kernel clKernel, const context &syclContext); cl_kernel get() const; bool is_host() const; context get_context() const; program get_program() const; template <info::kernel param> typename info::param_traits< info::kernel, param>::return_type get_info() const; template <info::kernel_work_group param> typename info::param_traits<info::kernel_work_group, param>::return_type get_work_group_info(const device &dev) const; Kernel queries using get_info() Descriptor Return type info::kernel::function_name string_class info::kernel::num_args cl_uint info::kernel::context context info::kernel::program program info::kernel::reference_count cl_uint info::kernel::attributes string_class Ranges, index space identifiers (cont.) template <typename dataT> device_event async_work_group_copy( local_ptr<dataT> dest, global_ptr<dataT> src, size_t numElements, size_t srcStride) const; template <typename dataT> device_event async_work_group_copy( global_ptr<dataT> dest, local_ptr<dataT> src, size_t numElements, size_t destStride) const; template <typename... eventTN> void wait_for(eventTN... events) const; Class h_item [4.8.1.7] Class declaration template <int dimensions> struct h_item; Member functions item<dimensions, false> get_global() const; item<dimensions, false> get_local() const; item<dimensions, false> get_logical_local() const; item<dimensions, false> get_physical_local() const; range<dimensions> get_global_range() const; size_t get_global_range(int dimension) const; id<dimensions> get_global_id() const; size_t get_global_id(int dimension) const; range<dimensions> get_local_range() const; size_t get_local_range(int dimension) const; id<dimensions> get_local_id() const; size_t get_local_id(int dimension) const; range<dimensions> get_logical_local_range() const; size_t get_logical_local_range(int dimension) const; id<dimensions> get_logical_local_id() const; size_t get_logical_local_id(int dimension) const; range<dimensions> get_physical_local_range() const; size_t get_physical_local_range(int dimension) const; id<dimensions> get_physical_local_id() const; size_t get_physical_local_id(int dimension) const; Class group [4.8.1.8] Class declaration template <int dimensions = 1> struct group; Member functions id<dimensions> get_id() const; size_t get_id(int dimension) const; range<dimensions> get_global_range() const; size_t get_global_range(int dimension) const; range<dimensions> get_group_range() const; range<dimensions> get_local_range() const; size_t get_local_range(int dimension) const; size_t get_group_range(int dimension) const; size_t operator[](int dimension) const; size_t get_linear_id() const; template<typename workItemFunctionT> void parallel_for_work_item(workItemFunctionT func) const; template<typename workItemFunctionT> void parallel_for_work_item(range<dimensions> flexibleRange, workItemFunctionT func) const; template <access::mode accessMode = access::mode::read_write> void mem_fence(access::fence_space accessSpace = access::fence_space::global_and_local) const; template <typename dataT> device_event async_work_group_copy( local_ptr<dataT> dest, global_ptr<dataT> src, size_t numElements) const; template <typename dataT> device_event async_work_group_copy( global_ptr<dataT> dest, local_ptr<dataT> src, size_t numElements) const; template <typename dataT> device_event async_work_group_copy( local_ptr<dataT> dest, global_ptr<dataT> src, size_t numElements, size_t srcStride) const; template <typename dataT> device_event async_work_group_copy( global_ptr<dataT> dest, local_ptr<dataT> src, size_t numElements, size_t destStride) const; template <typename... eventTN> void wait_for(eventTN... events) const; class device_event member [4.8.1.9 - 4.8.1.10] void wait() const; Programclass[4.8.8] Enums program_state none compiled linked Member functions explicit program(const context &context); program(const context &context, vector_class<device> deviceList); program(vector_class<program> programList, string_class linkOptions =''''); program(const context &context, cl_program clProgram); cl_program get() const; bool is_host() const; template <typename kernelT> void compile_with_kernel_type( string_class compileOptions =''''); template <typename kernelT> void build_with_kernel_type( string_class buildOptions = ''''); void compile_with_source(string_class kernelSource, string_class compileOptions = ''''); void build_with_source(string_class kernelSource, string_class buildOptions = ''''); void link(string_class linkOptions =''''); template <typename kernelT> bool has_kernel<kernelT>() const; bool has_kernel(string_class kernelName) const; template <typename kernelT> kernel get_kernel<kernelT>() const; kernel get_kernel(string_class kernelName) const; template <info::program param> typename info::param_traits< info::program, param>::return_type get_info() const; vector_class<vector_class<char>> get_binaries() const; context get_context() const; vector_class<device> get_devices() const; string_class get_compile_options() const; string_class get_link_options() const; string_class get_build_options() const; program_state get_state() const; Information descriptors: Descriptor Return type info::program::reference_count cl_uint info::program::context cl_context info::program::devices vector_class<device> Command group class handler [4.8.3 - 6] template <typename dataT, int dimensions, access::mode accessMode, access::target accessTarget> void require(accessor<dataT, dimensions, accessMode, accessTarget, placeholder::true_t> acc); template <typename T> void set_arg(int argIndex, T && arg); template <typename... Ts> void set_args(Ts &&... args); template <typename KernelName, typename KernelType> void single_task(KernelType kernelFunc); template <typename KernelName, typename KernelType, int dimensions> void parallel_for(range<dimensions> numWorkItems, KernelType kernelFunc); template <typename KernelName, typename KernelType, int dimensions> void parallel_for(range<dimensions> numWorkItems, id<dimensions> workItemOffset, KernelType kernelFunc); template <typename KernelName, typename KernelType, int dimensions> void parallel_for(nd_range<dimensions> executionRange, KernelType kernelFunc); template <typename KernelName, typename WorkgroupFunctionType, int dimensions> void parallel_for_work_group( range<dimensions> numWorkGroups, WorkgroupFunctionType kernelFunc); template <typename KernelName, typename WorkgroupFunctionType, int dimensions> void parallel_for_work_group( range<dimensions> numWorkGroups, range<dimensions> workGroupSize, WorkgroupFunctionType kernelFunc); void single_task(kernel syclKernel); template <int dimensions> void parallel_for(range<dimensions> numWorkItems, kernel syclKernel); template <int dimensions> void parallel_for(range<dimensions> numWorkItems, id<dimensions> workItemOffset, kernel syclKernel); template <int dimensions> void parallel_for( nd_range<dimensions> ndRange, kernel syclKernel); template <typename T, int dim, access::mode mode, access::target tgt> void copy(accessor<T, dim, mode, tgt> src, shared_ptr_class<T> dest); template <typename T, int dim, access::mode mode, access::target tgt> void copy(shared_ptr_class<T> src, accessor<T, dim, mode, tgt> dest); template <typename T, int dim, access::mode mode, access::target tgt> void copy( accessor<T, dim, mode, tgt> src, T * dest); template <typename T, int dim, access::mode mode, access::target tgt> void copy( const T * src, accessor<T, dim, mode, tgt> dest); template <typename T, int dim, access::mode mode, access::target tgt> void copy( accessor<T, dim, mode, tgt> src, accessor<T, dim, mode, tgt> dest); template <typename T, int dim, access::mode mode, access::target tgt> void update_host( accessor<T, dim, mode, tgt> acc); template<typename T, int dim, access::mode mode, access::target tgt> void fill( accessor<T, dim, mode, tgt> dest, const T& src); class private_memory members [4.8.5.3] class private_memory { public: private_memory(const group<Dimensions> &); T &operator()(const h_item<Dimensions> &id); };
  6. 6. www.khronos.org/sycl©2018 Khronos Group - Rev. 0818 SYCL 1.2.1 API Reference Guide Page 6 Scalar data types [4.10.1. 6.5] SYCL supports the C++11 ISO standard data types. The following data type aliases are defined for OpenCL interoperability in the cl::sycl namespace: char, signed char, unsigned char, short int, unsigned short int, bool, int, unsigned int, long int, unsigned long int, long long int, unsignedlonglongint, size_t, float, double, cl_bool, cl_char, cl_uchar, cl_short, cl_ushort cl_int, cl_uint cl_long, cl_ulong cl_float cl_double cl_half half byte Synchronization & atomics [4.11] Enums fence_space : char memory_order : int local_space global_space global_and_local relaxed Class atomic Class declaration template <typename T, access::address_space addressSpace = access::address_space::global_space> class atomic; Member functions template <typename pointerT> atomic(multi_ptr<pointerT, addressSpace> ptr); void store(T operand, memory_order memoryOrder = memory_order::relaxed) volatile; T load(memory_order memoryOrder = memory_order::relaxed) const volatile; T exchange(T operand, memory_order memoryOrder = memory_order::relaxed) volatile; bool compare_exchange_strong(T& expected, T desired, memory_order successMemoryOrder = memory_order::relaxed, memory_order failMemoryOrder = memory_order::relaxed) volatile; T fetch_X(T operand, memory_order memoryOrder = memory_order::relaxed) volatile; where X is one of add, sub, and, or, xor, min, max Available only when T != float Non-member functions template <typename T, access::address_space addressSpace> T atomic_load(atomic<T, addressSpace> object, memory_order memoryOrder = memory_order::relaxed) volatile; template <typename T, access::address_space addressSpace> void atomic_store(atomic<T, addressSpace> object, T operand, memory_order memoryOrder = memory_order::relaxed) volatile; template <typename T, access::address_space addressSpace> T atomic_exchange(atomic<T, addressSpace> object, T operand, memory_order memoryOrder = memory_order::relaxed) volatile; template <typename T, access::address_space addressSpace> bool atomic_compare_exchange_strong( atomic<T, addressSpace> object, T &expected, T desired, memory_order successMemoryOrder = memory_order::relaxed, memory_order failMemoryOrder = memory_order::relaxed) volatile; template <typename T, access::address_space addressSpace> T atomic_fetch_X(atomic<T, addressSpace> object, T operand, memory_order memoryOrder = memory_order::relaxed) volatile; where X is one of add, sub, and, or, xor, min, max Exception classes [4.9.2] Class: exception const char *what() const; context get_context() const; bool has_context() const; cl_int get_cl_code() const; Class: exception_list size_t size() const; iterator begin() const; iterator end() const; Exception types Derived from cl::sycl::runtime_error class Derived from cl::sycl::device_error class kernel_error compile_program_error nd_range_error link_program_error accessor_error invalid_object_error event_error memory_allocation_error invalid_parameter_error platform_error profiling_error feature_not_supported Stream class [4.12] Enums stream_manipulator scientific, hex, oct, noshowbase, showbase, showpos, endl, noshowpos, fixed, hexfloat, defaultfloat, dec Member functions stream(size_t bufferSize, size_t maxStatementSize, handler& cgh); size_t get_size() const; size_t get_max_statement_size() const; Non-member functions template <typename T> const stream& operator<<(const stream& os, const T &rhs); __precision_manipulator__ setprecision(int precision); __width_manipulator__ setw(int width); Vector data types [4.10.2] RET’s dataT template parameter depends on vec’s dataT template parameter: dataT RET cl_char, cl_uchar cl_char cl_short, cl_ushort or cl_half cl_short cl_int, cl_uint or cl_float cl_int cl_long, cl_ulong or cl_double cl_long Class declaration template <typename dataT, int numElements> class vec; Members using element_type = dataT; vec(); explicit vec(const dataT &arg); template <typename... argTN> vec(const argTN&... args); vec(const vec<dataT, numElements> &rhs); #ifdef __SYCL_DEVICE_ONLY__ vec(vector_t openclVector); operator vector_t() const; #endif operator dataT() const; // Available only if numElements == 1 size_t get_count() const; size_t get_size() const; template <typename convertT, rounding_mode roundingMode> vec<convertT, numElements> convert() const; template <typename asT> asT as() const; The following XYZW members are available only when numElements <= 4. RGBA members are available only when numElements == 4. template<int... swizzleIndexes> __swizzled_vec__ swizzle() const; __swizzled_vec__ XYZW_ACCESS() const; __swizzled_vec__ RGBA_ACCESS() const; __swizzled_vec__ INDEX_ACCESS() const; #ifdef SYCL_SIMPLE_SWIZZLES __swizzled_vec__ XYZW_SWIZZLE() const; __swizzled_vec__ RGBA_SWIZZLE() const; #endif The following lo, hi, odd, and even members are available only when numElements > 1. __swizzled_vec__ lo() const; __swizzled_vec__ hi() const; __swizzled_vec__ odd() const; __swizzled_vec__ even() const; template <access::address_space addressSpace> void load(size_t offset, multi_ptr<dataT, addressSpace> ptr); template <access::address_space addressSpace> void store(size_t offset, multi_ptr<dataT, addressSpace> ptr) const; vec<dataT, numElements> &operator=(const vec<dataT, numElements> &rhs); vec<dataT, numElements> &operator=(const dataT &rhs); vec<RET, numElements> operator!(); vec<dataT, numElements> operator~(); // Not available for floating point types Member functions with OP variable For all types, OP may be: For non floating-point types, OP may be: vec<dataT, numElements> operatorOP(const vec<dataT, numElements> &rhs) const; +, -, *, / %, &, |, ^, <<, >> vec<dataT, numElements> operatorOP(const dataT &rhs) const; vec<dataT, numElements> &operatorOP(const vec<dataT, numElements> &rhs); +=, -=, *=, /= %=, &=, |=, ^=, <<=, >>=vec<dataT, numElements> &operatorOP(const dataT &rhs); vec<RET, numElements> operatorOP(const vec<dataT, numElements> &rhs) const; &&, ||, ==, !=, <, >, <=, >=vec<RET, numElements> operatorOP(const dataT &rhs) const; vec<dataT, numElements> &operatorOP(); ++, -- vec<dataT, numElements> operatorOP(int); Non-member functions Non-member functions with OP variable For all types, OP may be: For non floating-point types, OP may be: template <typename dataT, int numElements> vec<dataT, numElements> operatorOP(const dataT &lhs) const vec<dataT, numElements> &rhs); +, -, *, / %, &, |, ^, <<, >> template <typename dataT, int numElements> vec<RET, numElements> operatorOP(const dataT &lhs, const vec<dataT, numElements> &rhs); &&, ||, ==, !=, <, >, <=, >=
  7. 7. www.khronos.org/sycl©2018 Khronos Group - Rev. 0818 SYCL 1.2.1 API Reference Guide Page 7 Math functions[4.13.3] Math funtions are available in the namespace cl::sycl. In all cases below, n may be one of 2, 3, 4, 8, or 16. Tf (genfloat in the spec) is type float[n], double[n], or half[n]. Tff (genfloatf) is type float[n]. Tfd (genfloatd) is type double[n]. Th (genfloath) is type half[n]. sTf (sgenfloat) is type float, double, or half. Ti (genint) is type int[n]. uTi (ugenint) is type unsigned int or uintn. uTli (ugenlonginteger) is unsigned long int, ulonglongn, ulongn, unsigned long long int. N indicates native variants, available in cl::sycl::native. H indicates half variants, available in cl::sycl::halfprecision, implemented with a minimum of 10 bits of accuracy. Tf acos (Tf x) Arc cosine Tf acosh (Tf x) Inverse hyperbolic cosine Tf acospi (Tf x) acos (x) / π Tf asin (Tf x) Arc sine Tf asinh (Tf x) Inverse hyperbolic sine Tf asinpi (Tf x) asin (x) / π Tf atan (Tf y_over_x) Arc tangent Tf atan2 (Tf y, Tf x) Arc tangent of y / x Tf atanh (Tf x) Hyperbolic arc tangent Tf atanpi (Tf x) atan (x) / π Tf atan2pi (Tf y, Tf x) atan2 (y, x) / π Tf cbrt (Tf x) Cube root Tf ceil (Tf x) Round to integer toward + infinity Tf copysign (Tf x, Tf y) x with sign changed to sign of y Tf cos (Tf x) Tff cos (Tff x) N H Cosine Tf cosh (Tf x) Hyperbolic cosine Tf cospi (Tf x) cos (π x) Tff divide (Tff x, Tff y) N H x / y (Not available in cl::sycl.) Tf erfc (Tf x) Complementary error function Tf erf (Tf x) Calculates error function Tf exp (Tf x) Tff exp (Tff x) N H Exponential base e Tf exp2 (Tf x) Tff exp2 (Tff x) N H Exponential base 2 Tf exp10 (Tf x) Tff exp10 (Tff x) N H Exponential base 10 Tf expm1 (Tf x) ex -1.0 Tf fabs (Tf x) Absolute value Tf fdim (Tf x, Tf y) Positive difference between x and y Tf floor (Tf x) Round to integer toward infinity Tf fma (Tf a, Tf b, Tf c) Multiply and add, then round Tf fmax (Tf x, Tf y) Tf fmax (Tf x, sTf y) Return y if x < y, otherwise it returns x Tf fmin (Tf x, Tf y) Tf fmin (Tf x, sTf y) Return y if y < x, otherwise it returns x Tf fmod (Tf x, Tf y) Modulus. Returns x – y * trunc (x/y) Tf fract (Tf x, Tf *iptr) Fractional value in x Tf frexp (Tf x, Ti *exp) Extract mantissa and exponent Tf hypot (Tf x, Tf y) Square root of x2 + y2 Ti ilogb (Tf x) Return exponent as an integer value Tf ldexp (Tf x, Ti k) doublen ldexp (doublen x, int k) x * 2n Tf lgamma (Tf x) Log gamma function Tf lgamma_r (Tf x, Ti *signp) Log gamma function Tf log (Tf x) Tff log (Tff x) N H Natural logarithm Tf log2 (Tf x) Tff log2 (Tff x) N H Base 2 logarithm Tf log10 (Tf x) Tff log10 (Tff x) N H Base 10 logarithm Tf log1p (Tf x) ln (1.0 + x) Tf logb (Tf x) Return exponent as an integer value Tf mad (Tf a, Tf b, Tf c) Approximates a * b + c Tf maxmag (Tf x, Tf y) Maximum magnitude of x and y Tf minmag (Tf x, Tf y) Minimum magnitude of x and y Tf modf (Tf x, Tf *iptr) Decompose floating-point number Tff nan(uTi nancode) Tfd nan(uTli nancode) Quiet NaN (Return is scalar when nancode is scalar) Tf nextafter (Tf x, Tf y) Next representable floating-point value after x in the direction of y Tf pow (Tf x, Tf y) Compute x to the power of y Tf pown (Tf x, Tiy) Compute xy , where y is an integer Tf powr (Tf x, Tf y) Tff powr (Tff x, Tff y) N Tff powr (Tff x, Th y) H Compute xy , where x is >= 0 Tff recip (Tff x) N H 1 / x (Not available in cl::sycl.) Tf remainder (Tf x, Tf y) Floating point remainder Tf remquo (Tf x, Tf y, Ti *quo) Remainder and quotient Tf rint (Tf x) Round to nearest even integer Tf rootn (Tf x, Ti y) Compute x to the power of 1/y Tf round (Tf x) Integral value nearest to x rounding Tf rsqrt (Tf x) Tff rsqrt (Tff x) N H Inverse square root Tf sin (Tf x) Tff sin (Tff x) N H Sine Tf sincos (Tf x, Tf *cosval) Sine and cosine of x Tf sinh (Tf x) Hyperbolic sine Tf sinpi (Tf x) sin (π x) Tf sqrt (Tf x) Tff sqrt (Tff x) N H Square root Tf tan (Tf x) Tff tan (Tff x) N H Tangent Tf tanh (Tf x) Hyperbolic tangent Tf tanpi (Tf x) tan (π x) Tf tgamma (Tf x) Gamma function Tf trunc (Tf x) Round to integer toward zero Integer functions[4.13.4] Integer funtions are available in the namespace cl::sycl. In all cases below, n may be one of 2, 3, 4, 8, or 16. If a type in the functions below is shown with [xbit] in its name, this indicates that the type is x bits in size. Tint (geninteger in the spec) is type int[n], uint[n], unsigned int, char, char[n], signed char, scharn, ucharn, unsigned short[n], unsigned short, ushort[n], longn, ulongn, long int, unsigned long int, long long int, longlongn, ulonglongn unsigned long long int. uTint (ugeninteger) is type unsigned char, ucharn, unsigned short, ushortn, unsigned int, uintn, unsigned long int, ulongn, ulonglongn, unsigned long long int. iTint (igeninteger) is type signed char, scharn, short[n], int[n], long int, longn, long long int, longlongn. sTint (sgeninteger) is type char, signed char, unsigned char, short, unsigned short, int, unsigned int, long int, unsigned long int, long long int, unsigned long long int. uTint abs (Tint x) | x | uTint abs_diff (Tint x, Tint y) | x – y | without modulo overflow Tint add_sat (Tint x, Tint y) x + y and saturates the result Tint hadd (Tint x, Tint y) (x + y) >> 1 withoutmod.overflow Tint rhadd (Tint x, Tint y) (x + y + 1) >> 1 Tint clamp (Tint x, Tint min, Tint max) Tint clamp (Tint x, sTint min, sTint max) min(max(x, min), max) Tint clz (Tint x) number of leading 0-bits in x Tint mad_hi (Tint a, Tint b, Tint c) mul_hi(a, b) + c Tint mad_sat (Tint a, Tint b, Tint c) a * b + c and saturates the result Tint max (Tint x, Tint y) Tint max (Tint x, sTint y) y if x < y, otherwise it returns x Tint min (Tint x, Tint y) Tint min (Tint x, sTint y) y if y < x, otherwise it returns x Tint mul_hi (Tint x, Tint y) high half of the product of x and y Tint popcount (Tint x) Number of non-zero bits in x Tint rotate (Tint v, Tint i) result[indx] = v[indx] << i[indx] Tint sub_sat (Tint x, Tint y) x - y and saturates the result uTint16bit upsample (uTint8bit hi, uTint8bit lo) result[i]= ((ushort)hi[i]<< 8)|lo[i] iTint16bit upsample (iTint8bit hi, uTint8bit lo) result[i]=((short)hi[i]<< 8)|lo[i] uTint32bit upsample ( uTint16bit hi, uTint16bit lo) result[i]=((uint)hi[i]<< 16)|lo[i] iTint32bit upsample (iTint16bit hi, uTint16bit lo) result[i]=((int)hi[i]<< 16)|lo[i] uTint64bit upsample (uTint32bit hi, uTint32bit lo) result[i]=((ulonglong)hi[i]<< 32) |lo[i] iTint64bit upsample (iTint32bit hi, uTint32bit lo) result[i]=((longlong)hi[i]<< 32)|lo[i] Tint32bit mad24 (Tint32bit x, Tint32bit y, Tint32bit z) Tint32bit mad24 (Tint32bit x, Tint32bit y, Tint32bit z) Multiply 24-bit integer values x, y, add 32- bit integer result to 32-bit integer z Tint32bit mul24 (Tint32bit x, Tint32bit y) Multiply 24-bit integer values x and y
  8. 8. www.khronos.org/sycl©2018 Khronos Group - Rev. 0818 SYCL 1.2.1 API Reference Guide Page 8 © 2018 Khronos Group. All rights reserved. SYCL is a trademark of the Khronos Group. The Khronos Group is an industry consortium creating open standards for the authoring and acceleration of parallel computing, graphics, dynamic media, and more on a wide variety of platforms and devices. See www.khronos.org to learn more about the Khronos Group. See www.khronos.org/sycl to learn more about SYCL. Relational built-in functions [4.13.7] Relational functions are available in the namespace cl::sycl. In all cases below, n may be one of 2, 3, 4, 8, or 16. If a type in the functions below is shown with [xbit] in its name, this indicates that the type is x bits in size. Tint (geninteger in the spec) is type int[n], uint[n], unsigned int, char, char[n], signed char, scharn, ucharn, unsigned short[n], unsigned short, ushort[n], longn, ulongn, long int, unsigned long int, long long int, longlongn, ulonglongn unsigned long long int. iTint (igeninteger) is type signed char, scharn, short[n], int[n], long int, longn, long long int, longlongn. uTint (ugeninteger) is type unsigned char, ucharn, unsigned short, ushortn, unsigned int, uintn, unsigned long int, ulongn, ulonglongn, unsigned long long int. Ti (genint) is type int[n]. uTi (ugenint) is type unsigned int or uintn. Tff (genfloatf) is type float[n]. Tfd (genfloatd) is type double[n]. T (gentype) is type float[n], double[n], or half[n], or any type listed for above for Tint. int any (iTint x) 1 if MSB in component of x is set; else 0 int all (iTint x) 1 if MSB in all components of x are set; else 0 T bitselect (T a, T b, T c) Eachbitofresultiscorresponding bitofaifcorrespondingbitofcis0 Tint select (Tint a, Tint b, iTint c) Tint select (Tint a, Tint b, uTint c) Tff select (Tff a, Tff b, Ti c) Tff select (Tff a, Tff b, uTi c) Tfd select (Tfd a, Tfd b, iTint64bit c) Tfd select (Tfd a, Tfd b, uTint64bit c) For each component of a vector type, result[i] = if MSB of c[i] is set ? b[i] : a[i] For scalar type, result = c? b : a iTint32bit function (Tff x, Tff y) iTint64bit function (Tfd x, Tfd y) function may be one of isequal, isnotequal, isgreater, isgreaterequal, isless, islessequal, islessgreater, isordered, isunordered. This format is used for many relational functions. Replace function with the function name. iTint32bit function (Tff x) iTint64bit function (Tfd x) function may be one of isfinite, isinf, isnan, isnormal, signbit. Geometric Functions [4.13.6] Geometric functions are available in the namespace cl::sycl. Tgf (gengeofloat in the spec) is type float, float2, float3, float4. Tgd (gengeodouble) is type double, double2, double3, double4. float4 cross (float4 p0, float4 p1) float3 cross (float3 p0, float3 p1) double4 cross (double4 p0, double4 p1) double3 cross (double3 p0, double3 p1) Cross product float distance (Tgf p0, Tgf p1) double distance (Tgd p0, Tgd p1) Vector distance float dot (Tgf p0, Tgf p1) double dot (Tgd p0, Tgd p1) Dot product float length (Tgf p) double length (Tgd p) Vector length Tgf normalize (Tgf p) Tgd normalize (Tgd p) Normal vector length 1 float fast_distance (Tgf p0, Tgf p1) Vector distance float fast_length (Tgf p) Vector length Tgf fast_normalize (Tgf p) Normal vector length 1 Common functions[4.13.5] Common funtions are available in the namespace cl::sycl. On the host the vector types use the vec class and on an OpenCL device use the corresponding OpenCL vector types. In all cases below, n may be one of 2, 3, 4, 8, or 16. Tf (genfloat in the spec) is type float[n], double[n], or half[n]. Tff (genfloatf) is type float[n]. Tfd (genfloatd) is type double[n]. Tf clamp (Tf x, Tf minval, Tf maxval); Tff clamp (Tff x, float minval,float maxval); Tfd clamp (Tfd x, double minval, doublen maxval); Clamp x to range given by minval, maxval Tf degrees (Tf radians); radians to degrees Tf abs (Tf x, Tf y); Tff abs (Tff x, float y); Tfd abs (Tfd x, double y); Max of x and y Tf max (Tf x, Tf y); Tff max (Tff x, float y); Tfd max (Tfd x, double y); Max of x and y Tf min (Tf x, Tf y); Tff min (Tff x, float y); Tfd min (Tfd x, double y); Min of x and y Tf mix (Tf x, Tf y, Tf a); Tff mix (Tff x, Tff y, float a); Tfd mix (Tfd x, Tfd y, double a) ; Linear blend of x and y Tf radians (Tf degrees); degrees to radians Tf step (Tf edge, Tf x); Tff step (float edge, Tff x); Tfd step (double edge, Tfd x); 0.0 if x < edge, else 1.0 Tf smoothstep (Tf edge0, Tf edge1, Tf x); Tff smoothstep(float edge0,float edge1, Tff x); Tfd smoothstep(double edge0, double edge1,Tfd x); Step and interpolate Tf sign (Tf x); Sign of x Preprocessor directives and macros [6.6] CL_SYCL_LANGUAGE_VERSION Integer version, e.g.: 121 __FAST_RELAXED_MATH__ -cl-fast-relaxed-math __SYCL_DEVICE_ONLY__ defined when in device compilation __SYCL_SINGLE_SOURCE__ produce host and device binary SYCL_EXTERNAL allow kernel external linkage (optional) Examples of how to invoke kernels Example: single_task invoke [4.8.5.1] SYCL provides a simple interface to enqueue a kernel that will be sequentially executed on an OpenCL device. myQueue.submit([&](handler & cgh) { cgh.single_task<class kernel_name>( [=] () { // [kernel code] })); }); Examples: parallel_for invoke [4.8.5.2] Example #1 Using a lambda function for a kernel invocation. myQueue.submit([&](handler & cgh) { auto acc = myBuffer.get_access<access::mode::write>(cgh); cgh.parallel_for<class myKernel> (range<1>(numWorkItems), [=] (id<1> index) { acc[index] = 42.0f; }); }); Example #2 Allowing the runtime to choose the index space best matching the range provided using information given the scheduled interface instead of the general interface. class MyKernel; myQueue.submit([&](handler & cgh) { auto acc=myBuffer.get_access<access::mode::write>(cgh); cgh.parallel_for<class myKernel>(range<1>( numWorkItems), [=] (item<1> item) { size_t index = item.get_global(); acc[index] = 42.0f; }); }); Example #3 Two examples of launching a kernel functor over a 3D grid. In the first example, work_item IDs range from 0 to 2. myQueue.submit([&](handler & cgh) { cgh.parallel_for<class example_kernel1>( range<3>(3,3,3), // global range [=] (item<3> it) { //[kernel code] }); }); Example #4 Launching sixty-four work-items in a three-dimensional grid with four in each dimension and divided into eight work-groups. myQueue.submit([&](handler & cgh) { cgh.parallel_for<class example_kernel>( nd_range<3>(range<3>(4, 4, 4), range<3>(2, 2, 2)), [=] (nd_item<3> item) { //[kernel code] // Work-group synchronization item.barrier(access::fence_space::global_space); //[kernel code] }); }); Parallel For hierarchical invoke [4.8.5.3] myQueue.submit([&](handler & cgh) { // Issue 8 work-groups of 8 work-items each cgh.parallel_for_work_group<class example_kernel>( range<3>(2, 2, 2), range<3>(2, 2, 2), [=](group<3> myGroup) { // [workgroup code] int myLocal; // this variable is shared between workitems // This variable will be instantiated for each work-item // separately private_memory<int> myPrivate(myGroup); // Issue parallel sets of work-items each sized using the // runtime default myGroup.parallel_for_work_item([&](h_item<3> myItem) { // [work-item code] myPrivate(myItem) = 0; }); // Carry private value across loops myGroup.parallel_for_work_item([&](h_item<3> myItem) { //[work-item code] output[myItem.get_global()] = myPrivate(myItem); }); //[workgroup code] }); });

×