Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Aca2 08 new

Multivector and SIMD Computers

Aca2 08 new

  1. 1. CSE539: Advanced Computer Architecture Chapter 8 Multivector and SIMD Computers Book: “Advanced Computer Architecture – Parallelism, Scalability, Programmability”, Hwang & Jotwani Sumit Mittu Assistant Professor, CSE/IT Lovely Professional University sumit.12735@lpu.co.in
  2. 2. In this chapter… • • • • Vector Processing Principles Compound Vector Operations Vector Loops and Chaining SIMD Computer Implementation Models Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 2
  3. 3. VECTOR PROCESSING PRINCIPLES • Vector Processing Definitions o o o o o o Vector Stride Vector Processor Vector Processing Vectorization Vectorizing Compiler or Vectorizer • Vector Instruction Types o Vector-vector instructions o Vector-scalar instructions o Vector-memory instructions Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 3
  4. 4. VECTOR PROCESSING PRINCIPLES Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 4
  5. 5. VECTOR PROCESSING PRINCIPLES • Vector-Vector Instructions o F1: o F2: o Examples: Vi  Vj Vi x Vj Vk V1 = sin(V2) V3 = V1+ V2 • Vector-Scalar Instructions o F3: o Examples: s x Vi  Vj V2 = 6 + V1 • Vector-Memory Instructions o F4: o F5: o Examples: MV VM X = V1 (Vector Load) (Vector Store) V2 = Y Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 5
  6. 6. VECTOR PROCESSING PRINCIPLES • Vector Reduction Instructions o F6: o F7: Vi  s Vi x Vj  s • Gather and Scatter Instructions o F8: o F9: M  Vi x Vj Vi x Vj  M (Gather) (Scatter) Vi x Vm  Vj (Vm is a binary vector) • Masking o F10: • Examples… Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 6
  7. 7. VECTOR PROCESSING PRINCIPLES Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 7
  8. 8. VECTOR PROCESSING PRINCIPLES • Vector-Access Memory Schemes o Vector-operand Specifications • Base address, stride and length o C-Access Memory Organization • Low-order m-way interleaved memory o S-access Memory Organizations • High-order m-way interleaved memory o C/S Access Memory Organization • Early Supercomputers (Vectors Processors) o Cray Series o CDC Cyber ETA 10E Fujitsu VP2600 NEC Sx-X 44 Hitachi 820/80 Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 8
  9. 9. VECTOR PROCESSING PRINCIPLES • Relative Vector/Scalar Performance o Vector/scalar speed ratio o Vectorization ratio in program o Relative Performance P is given by: • 𝑷= 𝟏 𝟏−𝒇 + 𝒇/𝒓 = r f 𝒓 𝟏−𝒇 𝒓 + 𝒇 o When f is low, the speedup cannot be high even with very high r o Limiting Case: • P  1 if f  0 o Maximum Case: • P  r if f  1 o Powerful single chip processors and multicore system-on-a-chip provide High-Performance Computing (HPC) using MIMD and/or SPMD configurations with large no. of processors. Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 9
  10. 10. COMPUOUND VECTOR PROCESSING • Compound Vector Operations o Compound Vector Functions (CVFs) • Composite function of vector operations converted from a looping structure of linked scalar operations o CVF Example: The SAXPY (Single-precision A multiply X Plus Y) Code • For I = 1 to N o Load R1, X(I) o Load R2, Y(I) o Multiply R1, A o Add R2, R1 o Store Y(I), R2 • (End of Loop) Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 10
  11. 11. COMPUOUND VECTOR PROCESSING • One-dimensional CVF Examples o V(I) = V2(I) + V(3) x V(4) o V1(I) = B(I) + C(I) o A(I) = V(I) x S + B(I) o A(I) = V(I) + B(I) + C(I) o A(I) = Q x v1(I) (R x B(I) + C(I)), etc. Legend: o Vi(I) are vector registers o A(I), B(I), C(I) are vectors in memory o Q, S are scalars available from scalar registers in memory Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 11
  12. 12. COMPUOUND VECTOR PROCESSING • Vector Loops o Vector segmentation or strip-mining approach o Example • Vector Chaining o Example: SAXPY code • Limited Chaining using only one memory-access pipe in Cray-I • Complete Chaining using three memory-access pipes in Cray X-MP • Functional Unit Independence • Vector Recurrence Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 12
  13. 13. COMPUOUND VECTOR PROCESSING Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 13
  14. 14. COMPUOUND VECTOR PROCESSING Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 14
  15. 15. SIMD COMPUTER ORGANIZATIONS • SIMD Computer Variants o Array Processor o Associative Processor • SIMD Processor v/s SISD v/s Vector Processor Operation o Illustration: for(i=0;i<5;i++) a[i] = a[i]+2; o Lockstep mode of operation in SIMD processor o Relative Performance comparison • SIMD Implementation Models o Distributed Memory Model • E.g. Illiac IV o Shared memory Model • E.g. BSP (Burroughs Scientific Processor) Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 15
  16. 16. SIMD COMPUTER ORGANIZATIONS Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 16
  17. 17. SIMD COMPUTER ORGANIZATIONS Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 17
  18. 18. SIMD COMPUTER ORGANIZATIONS • SIMD Instructions o Scalar Operations • Arithmetic/Logical o Vector Operations • Arithmetic/Logical o Data Routing Operations • Permutations, broadcasts, multicasts, rotation and shifting o Masking Operations • Enable/Disable PEs • Host and I/O • Bit-slice and Word-slice Processing o WSBS, WSBP, WPBS, WPBP Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 18

×