Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

of

SDAccel Design Contest: Xilinx SDAccel  Slide 1 SDAccel Design Contest: Xilinx SDAccel  Slide 2 SDAccel Design Contest: Xilinx SDAccel  Slide 3 SDAccel Design Contest: Xilinx SDAccel  Slide 4 SDAccel Design Contest: Xilinx SDAccel  Slide 5 SDAccel Design Contest: Xilinx SDAccel  Slide 6 SDAccel Design Contest: Xilinx SDAccel  Slide 7 SDAccel Design Contest: Xilinx SDAccel  Slide 8 SDAccel Design Contest: Xilinx SDAccel  Slide 9 SDAccel Design Contest: Xilinx SDAccel  Slide 10 SDAccel Design Contest: Xilinx SDAccel  Slide 11 SDAccel Design Contest: Xilinx SDAccel  Slide 12 SDAccel Design Contest: Xilinx SDAccel  Slide 13 SDAccel Design Contest: Xilinx SDAccel  Slide 14 SDAccel Design Contest: Xilinx SDAccel  Slide 15 SDAccel Design Contest: Xilinx SDAccel  Slide 16 SDAccel Design Contest: Xilinx SDAccel  Slide 17 SDAccel Design Contest: Xilinx SDAccel  Slide 18 SDAccel Design Contest: Xilinx SDAccel  Slide 19 SDAccel Design Contest: Xilinx SDAccel  Slide 20 SDAccel Design Contest: Xilinx SDAccel  Slide 21 SDAccel Design Contest: Xilinx SDAccel  Slide 22 SDAccel Design Contest: Xilinx SDAccel  Slide 23 SDAccel Design Contest: Xilinx SDAccel  Slide 24 SDAccel Design Contest: Xilinx SDAccel  Slide 25 SDAccel Design Contest: Xilinx SDAccel  Slide 26 SDAccel Design Contest: Xilinx SDAccel  Slide 27 SDAccel Design Contest: Xilinx SDAccel  Slide 28 SDAccel Design Contest: Xilinx SDAccel  Slide 29 SDAccel Design Contest: Xilinx SDAccel  Slide 30 SDAccel Design Contest: Xilinx SDAccel  Slide 31 SDAccel Design Contest: Xilinx SDAccel  Slide 32 SDAccel Design Contest: Xilinx SDAccel  Slide 33 SDAccel Design Contest: Xilinx SDAccel  Slide 34 SDAccel Design Contest: Xilinx SDAccel  Slide 35 SDAccel Design Contest: Xilinx SDAccel  Slide 36 SDAccel Design Contest: Xilinx SDAccel  Slide 37 SDAccel Design Contest: Xilinx SDAccel  Slide 38 SDAccel Design Contest: Xilinx SDAccel  Slide 39 SDAccel Design Contest: Xilinx SDAccel  Slide 40 SDAccel Design Contest: Xilinx SDAccel  Slide 41 SDAccel Design Contest: Xilinx SDAccel  Slide 42 SDAccel Design Contest: Xilinx SDAccel  Slide 43 SDAccel Design Contest: Xilinx SDAccel  Slide 44 SDAccel Design Contest: Xilinx SDAccel  Slide 45 SDAccel Design Contest: Xilinx SDAccel  Slide 46 SDAccel Design Contest: Xilinx SDAccel  Slide 47 SDAccel Design Contest: Xilinx SDAccel  Slide 48 SDAccel Design Contest: Xilinx SDAccel  Slide 49 SDAccel Design Contest: Xilinx SDAccel  Slide 50 SDAccel Design Contest: Xilinx SDAccel  Slide 51 SDAccel Design Contest: Xilinx SDAccel  Slide 52 SDAccel Design Contest: Xilinx SDAccel  Slide 53 SDAccel Design Contest: Xilinx SDAccel  Slide 54 SDAccel Design Contest: Xilinx SDAccel  Slide 55 SDAccel Design Contest: Xilinx SDAccel  Slide 56 SDAccel Design Contest: Xilinx SDAccel  Slide 57 SDAccel Design Contest: Xilinx SDAccel  Slide 58 SDAccel Design Contest: Xilinx SDAccel  Slide 59 SDAccel Design Contest: Xilinx SDAccel  Slide 60 SDAccel Design Contest: Xilinx SDAccel  Slide 61 SDAccel Design Contest: Xilinx SDAccel  Slide 62 SDAccel Design Contest: Xilinx SDAccel  Slide 63 SDAccel Design Contest: Xilinx SDAccel  Slide 64 SDAccel Design Contest: Xilinx SDAccel  Slide 65 SDAccel Design Contest: Xilinx SDAccel  Slide 66 SDAccel Design Contest: Xilinx SDAccel  Slide 67 SDAccel Design Contest: Xilinx SDAccel  Slide 68 SDAccel Design Contest: Xilinx SDAccel  Slide 69 SDAccel Design Contest: Xilinx SDAccel  Slide 70 SDAccel Design Contest: Xilinx SDAccel  Slide 71 SDAccel Design Contest: Xilinx SDAccel  Slide 72 SDAccel Design Contest: Xilinx SDAccel  Slide 73 SDAccel Design Contest: Xilinx SDAccel  Slide 74 SDAccel Design Contest: Xilinx SDAccel  Slide 75 SDAccel Design Contest: Xilinx SDAccel  Slide 76 SDAccel Design Contest: Xilinx SDAccel  Slide 77 SDAccel Design Contest: Xilinx SDAccel  Slide 78 SDAccel Design Contest: Xilinx SDAccel  Slide 79 SDAccel Design Contest: Xilinx SDAccel  Slide 80 SDAccel Design Contest: Xilinx SDAccel  Slide 81 SDAccel Design Contest: Xilinx SDAccel  Slide 82 SDAccel Design Contest: Xilinx SDAccel  Slide 83 SDAccel Design Contest: Xilinx SDAccel  Slide 84 SDAccel Design Contest: Xilinx SDAccel  Slide 85 SDAccel Design Contest: Xilinx SDAccel  Slide 86 SDAccel Design Contest: Xilinx SDAccel  Slide 87 SDAccel Design Contest: Xilinx SDAccel  Slide 88 SDAccel Design Contest: Xilinx SDAccel  Slide 89 SDAccel Design Contest: Xilinx SDAccel  Slide 90 SDAccel Design Contest: Xilinx SDAccel  Slide 91 SDAccel Design Contest: Xilinx SDAccel  Slide 92 SDAccel Design Contest: Xilinx SDAccel  Slide 93 SDAccel Design Contest: Xilinx SDAccel  Slide 94 SDAccel Design Contest: Xilinx SDAccel  Slide 95 SDAccel Design Contest: Xilinx SDAccel  Slide 96 SDAccel Design Contest: Xilinx SDAccel  Slide 97 SDAccel Design Contest: Xilinx SDAccel  Slide 98 SDAccel Design Contest: Xilinx SDAccel  Slide 99 SDAccel Design Contest: Xilinx SDAccel  Slide 100 SDAccel Design Contest: Xilinx SDAccel  Slide 101 SDAccel Design Contest: Xilinx SDAccel  Slide 102 SDAccel Design Contest: Xilinx SDAccel  Slide 103 SDAccel Design Contest: Xilinx SDAccel  Slide 104 SDAccel Design Contest: Xilinx SDAccel  Slide 105 SDAccel Design Contest: Xilinx SDAccel  Slide 106 SDAccel Design Contest: Xilinx SDAccel  Slide 107 SDAccel Design Contest: Xilinx SDAccel  Slide 108 SDAccel Design Contest: Xilinx SDAccel  Slide 109 SDAccel Design Contest: Xilinx SDAccel  Slide 110
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

1 Like

Share

Download to read offline

SDAccel Design Contest: Xilinx SDAccel

Download to read offline

SDAccel Design Contest: Xilinx SDAccel

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

SDAccel Design Contest: Xilinx SDAccel

  1. 1. Courses @ NECST Lorenzo Di Tucci <lorenzo.ditucci@polimi.it> Emanuele Del Sozzo <emanuele.delsozzo@polimi.it> Marco Rabozzi <marco.rabozzi@polimi.it> Marco D. Santambrogio <marco.santambrogio@polimi.it> Xilinx SDAccel 15/02/2018 DEIB Seminar Room
  2. 2. 2 Agenda - Recall on Hardware Design Flow - Introduction to SDAccel Framework - OpenCL - computational model - platform - memory model - SDAccel Design Flow - Kernel Specification - Examples
  3. 3. 3 Did you register? Use this Google Doc to provide your data https://goo.gl/FRCG6y First, install the VPN we have provided you. (Mac: Tunnelblick - Windows/Linux: OpenVPN) To SSH to the machine: ssh <name>.<surname>@nags31.local.necst.it password: user
  4. 4. 4 Installation Party You can change your password here: http://changepassword.local.necst.it/ You can also RDP to the instance using • Microsoft Remote Desktop (Microsoft/Mac OS) • Remmina (Linux) To connect to the machine, or change your password you must have started the VPN.
  5. 5. 5 Hardware Design Flow for HPC • Hardware Design Flow (HDF): process to realize a hardware module • HDF for FPGAs can be seen as a 2 step process High Level Synthesis From High level code to Hardware Description Language (HDL) System Level Design Implementation on board High Level Code FPGA
  6. 6. 6 The Hardware Design Flow
  7. 7. 7 The Hardware Design Flow
  8. 8. 8 The Hardware Design Flow
  9. 9. 9 The Hardware Design Flow System integration, driver generation and runtime management
  10. 10. 10 The Hardware Design Flow • Complete automation of the 2 steps of the hardware design flow
  11. 11. 11 Xilinx SDAccel - Provided a high level code, completely automates the steps of the hardware design flow - Respect the OpenCL memory and computational model
  12. 12. 12 OpenCL (Open Computing Language) • Open, cross platform parallel programming language for heterogeneous architectures • Standard for the development and acceleration of data parallel applications • Allows to write accelerated portable code across different devices and architectures (FPGA, GPGPU, DSPs, …)
  13. 13. • Work item: – The basic unit of work within an OpenCL device • Global size: – Declares an N-dimensional size of the total number of work-items – Size of the computational problem size_t global[N] • Local size – Declares an N-dimensional work-group size – The number of work-items that will execute within a workgroup size_t local[N] OpenCL Computational model
  14. 14. • global and local can be 1D, 2D, 3D and corresponds to the dimensionality of the data to be processed 1D 2D 3D N-Dimensional kernel range
  15. 15. • global and local can be 1D, 2D, 3D and corresponds to the dimensionality of the data to be processed 1D 2D 3D N-Dimensional kernel range
  16. 16. size_t global[1]; size_t local[1]; global[0] = 10; local[0] = 1; err = clEnqueueNDRangeKernel( commands, kernel, 1, NULL, (size_t*)&global, (size_t*) &local, 0, NULL, NULL ); 1-Dimensional kernel range (host code) Global and local size of dimension 1 1-Dimensional Kernel → work-group size of 1 work-item → 10 total work items
  17. 17. 1-Dimensional kernel range size_t global[1]; size_t local[1]; global[0] = 10; local[0] = 1; OpenCL deviceHOST Communication System Compute Unit PE PE = Processing Element
  18. 18. 1-Dimensional kernel range size_t global[1]; size_t local[1]; global[0] = 10; local[0] = 1; OpenCL deviceHOST Communication System Compute Unit PE PE = Processing Element Work item: maps to a PE Work group: mapped to a compute unit
  19. 19. 1-Dimensional kernel range size_t global[1]; size_t local[1]; global[0] = 10; local[0] = 1; OpenCL deviceHOST Communication System Compute Unit PE PE = Processing Element
  20. 20. 1-Dimensional kernel range size_t global[1]; size_t local[1]; global[0] = 10; local[0] = 1; OpenCL deviceHOST Communication System Compute Unit PE PE = Processing Element
  21. 21. 1-Dimensional kernel range size_t global[1]; size_t local[1]; global[0] = 10; local[0] = 1; OpenCL deviceHOST Communication System Compute Unit PE PE = Processing Element
  22. 22. 1-Dimensional kernel range size_t global[1]; size_t local[1]; global[0] = 10; local[0] = 2; OpenCL deviceHOST Communication System Compute Unit PE PE = Processing Element PE
  23. 23. 1-Dimensional kernel range size_t global[1]; size_t local[1]; global[0] = 10; local[0] = 2; OpenCL deviceHOST Communication System Compute Unit PE PE = Processing Element PE Work item: maps to PEs Work group: mapped to a compute unit
  24. 24. 1-Dimensional kernel range size_t global[1]; size_t local[1]; global[0] = 10; local[0] = 2; OpenCL deviceHOST Communication System Compute Unit PE PE = Processing Element PE
  25. 25. 1-Dimensional kernel range size_t global[1]; size_t local[1]; global[0] = 10; local[0] = 2; OpenCL deviceHOST Communication System Compute Unit PE PE = Processing Element PE Compute Unit PE PE
  26. 26. 1-Dimensional kernel range size_t global[1]; size_t local[1]; global[0] = 10; local[0] = 2; OpenCL deviceHOST Communication System Compute Unit PE PE = Processing Element PE Compute Unit PE PE Work items Work groups Increased parallelism: 2 compute units working in parallel on different work items
  27. 27. • global and local can be 1D, 2D, 3D and corresponds to the dimensionality of the data to be processed 1D 2D 3D 2-Dimensional kernel range
  28. 28. size_t global[2]; size_t local[2]; global[0] = 10; global[1] = 10; local[0] = 2; local[1] = 2; err = clEnqueueNDRangeKernel( commands, kernel, 2, NULL, (size_t*)&global, (size_t*) &local, 0, NULL, NULL ); 2-Dimensional kernel range (host code) Global and local size of dimension 2 2-Dimensional Kernel → work-group size of 2x2 work-item → 10x10 total work items
  29. 29. 29 2-Dimensional kernel range Problem Size Dim 2 (10) ProblemSizeDim1(10) Work group Work item OpenCL device HOST Compute Unit PE PE = Processing Element PE PE PE
  30. 30. 30 OpenCL Platform & Memory Model Host’s responsibility involves: - manage the operating system and enable drivers for all devices - pick up correct device for computation - Execute the application host program - manage and create Memory Buffers - launch and manage kernel execution
  31. 31. 31 OpenCL Platform & Memory Model The Device: - memory based transfer - reconfigured at runtime to execute our kernel - divided into multiple compute units - Each compute unit executes a work-group - Each work-group contains multiple work-items - A compute unit is further divided into processing elements - A PE is responsible for the execution of a work-item
  32. 32. 32 OpenCL Platform & Memory Model
  33. 33. 33 OpenCL Platform & Memory Model Three layers of Memory: 1) Global - shared among host and device (DRAM - host accesses via PCIe) 2) Local - Accessible by all the work-items inside a compute unit (BRAM) 3) Private - Accessible only to the processing element/ single work-item (Registers) OpenCL memory abstraction does not allow to write directly from the host to the device, it is necessary to pass from Global Memory
  34. 34. 34 OpenCL Platform & Memory Model Processing Element
  35. 35. 35 Design Flow
  36. 36. 36 Design Flow: System Build GUI Makefile
  37. 37. 37 Design Flow: Makefile Specify source files, host and kernel optimizations, emulation type or system build via Makefile
  38. 38. 38 Design Flow: Makefile - compile the host - generate the xo for each kernel - link xo(s) to .xclbin to be executed - emulate or build your application
  39. 39. 39 Design Flow: GUI Use the Eclipse-based GUI to perform each step of the flow
  40. 40. 40 Kernel Specification As seen before, Kernels can be specified in: - OpenCL - C/C++ - RTL
  41. 41. 41 OpenCL Kernel • Simply define the OpenCL Kernel and the associated work group size (in the following example 10 elements per group item) • Must be called from the host as an NDRange kernel
  42. 42. 42 C/C++ Kernel • Use standard AXI Master and AXI Lite interface as for Vivado HLS • All memory ports must be mapped to the same bundle • Include your kernel code within an extern “C” block • Must be called from the host as a simple task
  43. 43. 43 RTL Kernel 1) write your code using a HDL (Verilog/VHDL/Chisel HDL, etc...) 2) Integrate your HDL into SDAccel and generate a Xilinx Object (.xo)
  44. 44. 44 RTL Kernel
  45. 45. 45 RTL Kernel
  46. 46. 46 RTL Kernel
  47. 47. 47 RTL Kernel
  48. 48. 48 RTL Kernel
  49. 49. 49 RTL Kernel
  50. 50. 50 RTL Kernel
  51. 51. 51 RTL Kernel
  52. 52. 52 RTL Kernel 1) write your code using a HDL (Verilog/VHDL/Chisel HDL, etc...) 2) Integrate your HDL into SDAccel and generate a Xilinx Object (.xo) 3) Perform Hardware Emulation to check correctness 4) Build for FPGA
  53. 53. 53 Examples - Let’s start with the Vector Addition code presented by Emanuele last time. - Let’s produce a C/C++ version and an OpenCL one Example code are available on NAGS31 @ /sdaccel_contest/
  54. 54. 54 Launch SDx
  55. 55. 55 Create new SDx Project
  56. 56. 56 Select target FPGA board
  57. 57. 57 New Hardware Function
  58. 58. 58 Add Hardware Function
  59. 59. 59 Specify data width for ports
  60. 60. 60 Software Emulation
  61. 61. 61 Software Emulation
  62. 62. 62 Automatically include binary
  63. 63. 63 Software Emulation
  64. 64. 64 Launch Vivado HLS from SDx
  65. 65. 65 Launch Vivado HLS from SDx
  66. 66. 66 Interface Specification
  67. 67. 67 Wrap your C++ code
  68. 68. 68 Port Mapping - external pointers
  69. 69. 69 Port Mapping - external pointers
  70. 70. 70 Port Mapping
  71. 71. 71 Port Mapping - the compiler
  72. 72. 72 Hardware Emulation
  73. 73. 73 Hardware Emulation
  74. 74. 74 Reports
  75. 75. 75 Reports
  76. 76. 76 Reports
  77. 77. 77 Reports
  78. 78. 78 Build for the FPGA
  79. 79. 79 Execute on Board
  80. 80. 80 Build for the FPGA
  81. 81. 81 Build for the FPGA
  82. 82. 82 Vector Addition - OCL Kernel
  83. 83. 83 Choose Target board
  84. 84. 84 Create kernel file
  85. 85. 85 Create kernel file
  86. 86. 86 Define kernel behavior
  87. 87. 87 Create Host File
  88. 88. 88 ClEnqueNDRangeKernel
  89. 89. 89 Define global and local
  90. 90. 90 Wait for Kernel Completion
  91. 91. 91 Wait for Kernel Completion
  92. 92. 92 Add hardware function
  93. 93. 93 Kernel features
  94. 94. 94 Software Emulation
  95. 95. 95 Vector Addition - OCL Kernel
  96. 96. 96 Automatically include binaries
  97. 97. 97 Hardware Emulation
  98. 98. 98 Emulation Log
  99. 99. 99 Kernel Report
  100. 100. 100 Report
  101. 101. 101 Increase compute units to 2
  102. 102. 102 Change local size
  103. 103. 103 Reports
  104. 104. 104 Increase Data Locality
  105. 105. 105 Execution log
  106. 106. 106 Report
  107. 107. 107 Report
  108. 108. 108 This is only the beginning!! For more information, read SDAccel manual(s) https://www.xilinx.com/support/documentation-navigatio n/development-tools/software-development/sdaccel.html X
  109. 109. 109 Feedbacks • We are working at improving this course, would you share your feedback for this lesson? https://goo.gl/forms/mcmtcojJEqFTpg8j1
  110. 110. Thank You for the Attention! 110 Lorenzo Di Tucci lorenzo.ditucci@polimi.it Emanuele Del Sozzo emanuele.delsozzo@polimi.it Marco Rabozzi marco.rabozzi@polimi.it Marco D. Santambrogio marco.santambrogio@polimi.it
  • YasinKaanTakn

    Nov. 27, 2019

SDAccel Design Contest: Xilinx SDAccel

Views

Total views

664

On Slideshare

0

From embeds

0

Number of embeds

2

Actions

Downloads

42

Shares

0

Comments

0

Likes

1

×