Your SlideShare is downloading. ×
0
20101030 opencl intro
20101030 opencl intro
20101030 opencl intro
20101030 opencl intro
20101030 opencl intro
20101030 opencl intro
20101030 opencl intro
20101030 opencl intro
20101030 opencl intro
20101030 opencl intro
20101030 opencl intro
20101030 opencl intro
20101030 opencl intro
20101030 opencl intro
20101030 opencl intro
20101030 opencl intro
20101030 opencl intro
20101030 opencl intro
20101030 opencl intro
20101030 opencl intro
20101030 opencl intro
20101030 opencl intro
20101030 opencl intro
20101030 opencl intro
20101030 opencl intro
20101030 opencl intro
20101030 opencl intro
20101030 opencl intro
20101030 opencl intro
20101030 opencl intro
20101030 opencl intro
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

20101030 opencl intro

933

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
933
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
20
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Brief Introduction to OpenCL Hu Zi Ming hzmangel@gmail.com 2010-10-30 Brief Introduction to OpenCL 1 / 24
  • 2. Outline1 Some Background about OpenCL CPU vs. GPU What is OpenCL Advantages & Disadvantages2 Programming with OpenCL3 Demo about OpenCL Brief Introduction to OpenCL 2 / 24
  • 3. CPU vs. GPUCPU: Make single thread fast Hide latency though large cache Brief Introduction to OpenCL 3 / 24
  • 4. CPU vs. GPUCPU: Make single thread fast Hide latency though large cacheGPU: Improvement thoughput Hide latency though prarllelism Brief Introduction to OpenCL 3 / 24
  • 5. Before OpenCL. . .Nvidia CUDAATI streamMicrosoft DirectComputer...... Brief Introduction to OpenCL 4 / 24
  • 6. Before OpenCL. . .Nvidia CUDAATI streamMicrosoft DirectComputer......Apple said, Let there be standard Brief Introduction to OpenCL 4 / 24
  • 7. Before OpenCL. . .Nvidia CUDAATI streamMicrosoft DirectComputer......Apple said, Let there be standardAnd there was OpenCL Brief Introduction to OpenCL 4 / 24
  • 8. What is OpenCLOpen Computing LanguageBased on C for CUDA but slightly lowerOriginally developed by AppleHanded over to the Khronos Group nowCan be used in parallel computing Brief Introduction to OpenCL 5 / 24
  • 9. AdvantagesSupport heterogeneous platformsTask-based(CPU) and data-based(GPU) parallelism for parallelcomputingImprove memory bandwidth and compute bandwidth greatlyExtends the GPU power w/o been locked in one manufacturerSupport extensions like OpenGLSupport ES mode for mobile devices Brief Introduction to OpenCL 6 / 24
  • 10. DisadvantagesTunning is hardware-specificAlgorithm is binded with data shapeRecursion is not available nowFunction pointer is not supported now Brief Introduction to OpenCL 7 / 24
  • 11. Outline1 Some Background about OpenCL2 Programming with OpenCL Prerequisite Main Flow of Host Code Four Models3 Demo about OpenCL Brief Introduction to OpenCL 8 / 24
  • 12. PrerequisiteDriver support OpenCLATI Stream SDK/NVIDIA CUDA Toolkit/. . .Host code: control kernel codeOpenCL kernel code: written in OpenCL and run on devices Brief Introduction to OpenCL 9 / 24
  • 13. Main Flow of Host CodeGet information about the platform and devicesSelect devices to be used in executionCreate an OpenCL contextCreate a command queueCreate memory buffer objectsCreate program objectLoad the kernel source code and compile itCreate kernel objectSet kernel argumentsExecute the kernelCopy memory from GPU to CPU Brief Introduction to OpenCL 10 / 24
  • 14. OpenCL Summary Brief Introduction to OpenCL 11 / 24
  • 15. Four ModelsPlatform modelExecution modelMemory modelProgramming model Brief Introduction to OpenCL 12 / 24
  • 16. Platform ModelA host connected to one or more OpenCL devicesDevice can be divided into one or more compute units (CUs)Compute unit can be further divided into one or moreprocessing elements (PEs)Application send commands from host to PEPE within CU execute instructions as SIMD/SPMD units Brief Introduction to OpenCL 13 / 24
  • 17. Platform Model (Cont.) Brief Introduction to OpenCL 14 / 24
  • 18. Execution ModelWork item is the basic unit of work Brief Introduction to OpenCL 15 / 24
  • 19. Execution ModelWork item is the basic unit of workKernel is code for work itemExecuted on OpenCL devices, basically a C function Brief Introduction to OpenCL 15 / 24
  • 20. Execution ModelWork item is the basic unit of workKernel is code for work itemExecuted on OpenCL devices, basically a C functionHost program executed on host Brief Introduction to OpenCL 15 / 24
  • 21. Execution ModelWork item is the basic unit of workKernel is code for work itemExecuted on OpenCL devices, basically a C functionHost program executed on hostCreate index space based on NDRangeOrganize work-item as work-group Brief Introduction to OpenCL 15 / 24
  • 22. Execution Model (Cont.) Brief Introduction to OpenCL 16 / 24
  • 23. Memory ModelGlobal mem: r/w to all work-item in all work-groupsConstant mem: global mem and remain constant duringexecutionLocal mem: local to a work-groupPrivate mem: private to work-item Brief Introduction to OpenCL 17 / 24
  • 24. Memory ModelGlobal mem: r/w to all work-item in all work-groupsConstant mem: global mem and remain constant duringexecutionLocal mem: local to a work-groupPrivate mem: private to work-itemData move path: host -¿ global -¿ local and back Brief Introduction to OpenCL 17 / 24
  • 25. Memory Model Brief Introduction to OpenCL 18 / 24
  • 26. Programming ModelData parallel programming modelTask parallel programming modelSynchronization Brief Introduction to OpenCL 19 / 24
  • 27. Outline1 Some Background about OpenCL2 Programming with OpenCL3 Demo about OpenCL Matrix Add Matrix Multiply Brief Introduction to OpenCL 20 / 24
  • 28. Kernel Codenormal add__kernel void add(__global int *a, __global int *b, __global int *c) { int i = get_global_id(0); c[i] = a[i] + b[i];} Brief Introduction to OpenCL 21 / 24
  • 29. Normal Kernel Codenormal multiply__kernel void mul(__global int *a, __global int *b, __global int *c) { int x = get_global_id(1); int y = get_global_id(0); int i = 0; c[y * WC + x] = 0; for (; i < W; i++) { c[y * WC + x] += a[y * WA + i] * b[i * WB + x]; }} Brief Introduction to OpenCL 22 / 24
  • 30. Kernel Code with Block Supportmultiply with block support__kernel void mul(__global float *a, __global float *b, __global float *c, __local float *as, __local float *bs) { int x = get_global_id(1); int y = get_global_id(0); int bx = get_group_id(1); int by = get_group_id(0); int tx = get_local_id(1); int ty = get_local_id(0); int tmp_val = 0; c[x * WC + y] = 0; for (int i = 0; i < WA / BLOCK_SIZE; i++) { as[ty * BLOCK_SIZE + tx] = a[y * WA + x]; bs[ty * BLOCK_SIZE + tx] = b[y * WA + x]; barrier(CLK_LOCAL_MEM_FENCE); for (int j = 0; j < BLOCK_SIZE; j++) { tmp_val += a[y * WA + i] * b[i * WB + x]; barrier(CLK_LOCAL_MEM_FENCE); } c[y * WB + x] = tmp_val; }} Brief Introduction to OpenCL 23 / 24
  • 31. Q AND A Brief Introduction to OpenCL 24 / 24

×