Successfully reported this slideshow.
Upcoming SlideShare
×

# Aca2 08 new

Multivector and SIMD Computers

• Full Name
Comment goes here.

Are you sure you want to Yes No

### Aca2 08 new

1. 1. CSE539: Advanced Computer Architecture Chapter 8 Multivector and SIMD Computers Book: “Advanced Computer Architecture – Parallelism, Scalability, Programmability”, Hwang & Jotwani Sumit Mittu Assistant Professor, CSE/IT Lovely Professional University sumit.12735@lpu.co.in
2. 2. In this chapter… • • • • Vector Processing Principles Compound Vector Operations Vector Loops and Chaining SIMD Computer Implementation Models Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 2
3. 3. VECTOR PROCESSING PRINCIPLES • Vector Processing Definitions o o o o o o Vector Stride Vector Processor Vector Processing Vectorization Vectorizing Compiler or Vectorizer • Vector Instruction Types o Vector-vector instructions o Vector-scalar instructions o Vector-memory instructions Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 3
4. 4. VECTOR PROCESSING PRINCIPLES Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 4
5. 5. VECTOR PROCESSING PRINCIPLES • Vector-Vector Instructions o F1: o F2: o Examples: Vi  Vj Vi x Vj Vk V1 = sin(V2) V3 = V1+ V2 • Vector-Scalar Instructions o F3: o Examples: s x Vi  Vj V2 = 6 + V1 • Vector-Memory Instructions o F4: o F5: o Examples: MV VM X = V1 (Vector Load) (Vector Store) V2 = Y Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 5
6. 6. VECTOR PROCESSING PRINCIPLES • Vector Reduction Instructions o F6: o F7: Vi  s Vi x Vj  s • Gather and Scatter Instructions o F8: o F9: M  Vi x Vj Vi x Vj  M (Gather) (Scatter) Vi x Vm  Vj (Vm is a binary vector) • Masking o F10: • Examples… Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 6
7. 7. VECTOR PROCESSING PRINCIPLES Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 7
8. 8. VECTOR PROCESSING PRINCIPLES • Vector-Access Memory Schemes o Vector-operand Specifications • Base address, stride and length o C-Access Memory Organization • Low-order m-way interleaved memory o S-access Memory Organizations • High-order m-way interleaved memory o C/S Access Memory Organization • Early Supercomputers (Vectors Processors) o Cray Series o CDC Cyber ETA 10E Fujitsu VP2600 NEC Sx-X 44 Hitachi 820/80 Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 8
9. 9. VECTOR PROCESSING PRINCIPLES • Relative Vector/Scalar Performance o Vector/scalar speed ratio o Vectorization ratio in program o Relative Performance P is given by: • 𝑷= 𝟏 𝟏−𝒇 + 𝒇/𝒓 = r f 𝒓 𝟏−𝒇 𝒓 + 𝒇 o When f is low, the speedup cannot be high even with very high r o Limiting Case: • P  1 if f  0 o Maximum Case: • P  r if f  1 o Powerful single chip processors and multicore system-on-a-chip provide High-Performance Computing (HPC) using MIMD and/or SPMD configurations with large no. of processors. Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 9
10. 10. COMPUOUND VECTOR PROCESSING • Compound Vector Operations o Compound Vector Functions (CVFs) • Composite function of vector operations converted from a looping structure of linked scalar operations o CVF Example: The SAXPY (Single-precision A multiply X Plus Y) Code • For I = 1 to N o Load R1, X(I) o Load R2, Y(I) o Multiply R1, A o Add R2, R1 o Store Y(I), R2 • (End of Loop) Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 10
11. 11. COMPUOUND VECTOR PROCESSING • One-dimensional CVF Examples o V(I) = V2(I) + V(3) x V(4) o V1(I) = B(I) + C(I) o A(I) = V(I) x S + B(I) o A(I) = V(I) + B(I) + C(I) o A(I) = Q x v1(I) (R x B(I) + C(I)), etc. Legend: o Vi(I) are vector registers o A(I), B(I), C(I) are vectors in memory o Q, S are scalars available from scalar registers in memory Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 11
12. 12. COMPUOUND VECTOR PROCESSING • Vector Loops o Vector segmentation or strip-mining approach o Example • Vector Chaining o Example: SAXPY code • Limited Chaining using only one memory-access pipe in Cray-I • Complete Chaining using three memory-access pipes in Cray X-MP • Functional Unit Independence • Vector Recurrence Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 12
13. 13. COMPUOUND VECTOR PROCESSING Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 13
14. 14. COMPUOUND VECTOR PROCESSING Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 14
15. 15. SIMD COMPUTER ORGANIZATIONS • SIMD Computer Variants o Array Processor o Associative Processor • SIMD Processor v/s SISD v/s Vector Processor Operation o Illustration: for(i=0;i<5;i++) a[i] = a[i]+2; o Lockstep mode of operation in SIMD processor o Relative Performance comparison • SIMD Implementation Models o Distributed Memory Model • E.g. Illiac IV o Shared memory Model • E.g. BSP (Burroughs Scientific Processor) Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 15
16. 16. SIMD COMPUTER ORGANIZATIONS Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 16
17. 17. SIMD COMPUTER ORGANIZATIONS Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 17
18. 18. SIMD COMPUTER ORGANIZATIONS • SIMD Instructions o Scalar Operations • Arithmetic/Logical o Vector Operations • Arithmetic/Logical o Data Routing Operations • Permutations, broadcasts, multicasts, rotation and shifting o Masking Operations • Enable/Disable PEs • Host and I/O • Bit-slice and Word-slice Processing o WSBS, WSBP, WPBS, WPBP Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 18