Like this presentation? Why not share!

# Aca2 08 new

## on Nov 26, 2013

• 397 views

Multivector and SIMD Computers

Multivector and SIMD Computers

### Views

Total Views
397
Views on SlideShare
397
Embed Views
0

Likes
0
31
0

No embeds

### Report content

• Comment goes here.
Are you sure you want to

## Aca2 08 newPresentation Transcript

• CSE539: Advanced Computer Architecture Chapter 8 Multivector and SIMD Computers Book: “Advanced Computer Architecture – Parallelism, Scalability, Programmability”, Hwang & Jotwani Sumit Mittu Assistant Professor, CSE/IT Lovely Professional University sumit.12735@lpu.co.in
• In this chapter… • • • • Vector Processing Principles Compound Vector Operations Vector Loops and Chaining SIMD Computer Implementation Models Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 2
• VECTOR PROCESSING PRINCIPLES • Vector Processing Definitions o o o o o o Vector Stride Vector Processor Vector Processing Vectorization Vectorizing Compiler or Vectorizer • Vector Instruction Types o Vector-vector instructions o Vector-scalar instructions o Vector-memory instructions Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 3
• VECTOR PROCESSING PRINCIPLES Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 4
• VECTOR PROCESSING PRINCIPLES • Vector-Vector Instructions o F1: o F2: o Examples: Vi  Vj Vi x Vj Vk V1 = sin(V2) V3 = V1+ V2 • Vector-Scalar Instructions o F3: o Examples: s x Vi  Vj V2 = 6 + V1 • Vector-Memory Instructions o F4: o F5: o Examples: MV VM X = V1 (Vector Load) (Vector Store) V2 = Y Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 5
• VECTOR PROCESSING PRINCIPLES • Vector Reduction Instructions o F6: o F7: Vi  s Vi x Vj  s • Gather and Scatter Instructions o F8: o F9: M  Vi x Vj Vi x Vj  M (Gather) (Scatter) Vi x Vm  Vj (Vm is a binary vector) • Masking o F10: • Examples… Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 6
• VECTOR PROCESSING PRINCIPLES Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 7
• VECTOR PROCESSING PRINCIPLES • Vector-Access Memory Schemes o Vector-operand Specifications • Base address, stride and length o C-Access Memory Organization • Low-order m-way interleaved memory o S-access Memory Organizations • High-order m-way interleaved memory o C/S Access Memory Organization • Early Supercomputers (Vectors Processors) o Cray Series o CDC Cyber ETA 10E Fujitsu VP2600 NEC Sx-X 44 Hitachi 820/80 Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 8
• VECTOR PROCESSING PRINCIPLES • Relative Vector/Scalar Performance o Vector/scalar speed ratio o Vectorization ratio in program o Relative Performance P is given by: • 𝑷= 𝟏 𝟏−𝒇 + 𝒇/𝒓 = r f 𝒓 𝟏−𝒇 𝒓 + 𝒇 o When f is low, the speedup cannot be high even with very high r o Limiting Case: • P  1 if f  0 o Maximum Case: • P  r if f  1 o Powerful single chip processors and multicore system-on-a-chip provide High-Performance Computing (HPC) using MIMD and/or SPMD configurations with large no. of processors. Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 9
• COMPUOUND VECTOR PROCESSING • Compound Vector Operations o Compound Vector Functions (CVFs) • Composite function of vector operations converted from a looping structure of linked scalar operations o CVF Example: The SAXPY (Single-precision A multiply X Plus Y) Code • For I = 1 to N o Load R1, X(I) o Load R2, Y(I) o Multiply R1, A o Add R2, R1 o Store Y(I), R2 • (End of Loop) Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 10
• COMPUOUND VECTOR PROCESSING • One-dimensional CVF Examples o V(I) = V2(I) + V(3) x V(4) o V1(I) = B(I) + C(I) o A(I) = V(I) x S + B(I) o A(I) = V(I) + B(I) + C(I) o A(I) = Q x v1(I) (R x B(I) + C(I)), etc. Legend: o Vi(I) are vector registers o A(I), B(I), C(I) are vectors in memory o Q, S are scalars available from scalar registers in memory Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 11
• COMPUOUND VECTOR PROCESSING • Vector Loops o Vector segmentation or strip-mining approach o Example • Vector Chaining o Example: SAXPY code • Limited Chaining using only one memory-access pipe in Cray-I • Complete Chaining using three memory-access pipes in Cray X-MP • Functional Unit Independence • Vector Recurrence Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 12
• COMPUOUND VECTOR PROCESSING Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 13
• COMPUOUND VECTOR PROCESSING Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 14
• SIMD COMPUTER ORGANIZATIONS • SIMD Computer Variants o Array Processor o Associative Processor • SIMD Processor v/s SISD v/s Vector Processor Operation o Illustration: for(i=0;i<5;i++) a[i] = a[i]+2; o Lockstep mode of operation in SIMD processor o Relative Performance comparison • SIMD Implementation Models o Distributed Memory Model • E.g. Illiac IV o Shared memory Model • E.g. BSP (Burroughs Scientific Processor) Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 15
• SIMD COMPUTER ORGANIZATIONS Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 16
• SIMD COMPUTER ORGANIZATIONS Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 17
• SIMD COMPUTER ORGANIZATIONS • SIMD Instructions o Scalar Operations • Arithmetic/Logical o Vector Operations • Arithmetic/Logical o Data Routing Operations • Permutations, broadcasts, multicasts, rotation and shifting o Masking Operations • Enable/Disable PEs • Host and I/O • Bit-slice and Word-slice Processing o WSBS, WSBP, WPBS, WPBP Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 18