• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Aca2 08 new
 

Aca2 08 new

on

  • 397 views

Multivector and SIMD Computers

Multivector and SIMD Computers

Statistics

Views

Total Views
397
Views on SlideShare
397
Embed Views
0

Actions

Likes
0
Downloads
31
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Aca2 08 new Aca2 08 new Presentation Transcript

    • CSE539: Advanced Computer Architecture Chapter 8 Multivector and SIMD Computers Book: “Advanced Computer Architecture – Parallelism, Scalability, Programmability”, Hwang & Jotwani Sumit Mittu Assistant Professor, CSE/IT Lovely Professional University sumit.12735@lpu.co.in
    • In this chapter… • • • • Vector Processing Principles Compound Vector Operations Vector Loops and Chaining SIMD Computer Implementation Models Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 2
    • VECTOR PROCESSING PRINCIPLES • Vector Processing Definitions o o o o o o Vector Stride Vector Processor Vector Processing Vectorization Vectorizing Compiler or Vectorizer • Vector Instruction Types o Vector-vector instructions o Vector-scalar instructions o Vector-memory instructions Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 3
    • VECTOR PROCESSING PRINCIPLES Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 4
    • VECTOR PROCESSING PRINCIPLES • Vector-Vector Instructions o F1: o F2: o Examples: Vi  Vj Vi x Vj Vk V1 = sin(V2) V3 = V1+ V2 • Vector-Scalar Instructions o F3: o Examples: s x Vi  Vj V2 = 6 + V1 • Vector-Memory Instructions o F4: o F5: o Examples: MV VM X = V1 (Vector Load) (Vector Store) V2 = Y Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 5
    • VECTOR PROCESSING PRINCIPLES • Vector Reduction Instructions o F6: o F7: Vi  s Vi x Vj  s • Gather and Scatter Instructions o F8: o F9: M  Vi x Vj Vi x Vj  M (Gather) (Scatter) Vi x Vm  Vj (Vm is a binary vector) • Masking o F10: • Examples… Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 6
    • VECTOR PROCESSING PRINCIPLES Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 7
    • VECTOR PROCESSING PRINCIPLES • Vector-Access Memory Schemes o Vector-operand Specifications • Base address, stride and length o C-Access Memory Organization • Low-order m-way interleaved memory o S-access Memory Organizations • High-order m-way interleaved memory o C/S Access Memory Organization • Early Supercomputers (Vectors Processors) o Cray Series o CDC Cyber ETA 10E Fujitsu VP2600 NEC Sx-X 44 Hitachi 820/80 Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 8
    • VECTOR PROCESSING PRINCIPLES • Relative Vector/Scalar Performance o Vector/scalar speed ratio o Vectorization ratio in program o Relative Performance P is given by: • 𝑷= 𝟏 𝟏−𝒇 + 𝒇/𝒓 = r f 𝒓 𝟏−𝒇 𝒓 + 𝒇 o When f is low, the speedup cannot be high even with very high r o Limiting Case: • P  1 if f  0 o Maximum Case: • P  r if f  1 o Powerful single chip processors and multicore system-on-a-chip provide High-Performance Computing (HPC) using MIMD and/or SPMD configurations with large no. of processors. Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 9
    • COMPUOUND VECTOR PROCESSING • Compound Vector Operations o Compound Vector Functions (CVFs) • Composite function of vector operations converted from a looping structure of linked scalar operations o CVF Example: The SAXPY (Single-precision A multiply X Plus Y) Code • For I = 1 to N o Load R1, X(I) o Load R2, Y(I) o Multiply R1, A o Add R2, R1 o Store Y(I), R2 • (End of Loop) Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 10
    • COMPUOUND VECTOR PROCESSING • One-dimensional CVF Examples o V(I) = V2(I) + V(3) x V(4) o V1(I) = B(I) + C(I) o A(I) = V(I) x S + B(I) o A(I) = V(I) + B(I) + C(I) o A(I) = Q x v1(I) (R x B(I) + C(I)), etc. Legend: o Vi(I) are vector registers o A(I), B(I), C(I) are vectors in memory o Q, S are scalars available from scalar registers in memory Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 11
    • COMPUOUND VECTOR PROCESSING • Vector Loops o Vector segmentation or strip-mining approach o Example • Vector Chaining o Example: SAXPY code • Limited Chaining using only one memory-access pipe in Cray-I • Complete Chaining using three memory-access pipes in Cray X-MP • Functional Unit Independence • Vector Recurrence Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 12
    • COMPUOUND VECTOR PROCESSING Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 13
    • COMPUOUND VECTOR PROCESSING Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 14
    • SIMD COMPUTER ORGANIZATIONS • SIMD Computer Variants o Array Processor o Associative Processor • SIMD Processor v/s SISD v/s Vector Processor Operation o Illustration: for(i=0;i<5;i++) a[i] = a[i]+2; o Lockstep mode of operation in SIMD processor o Relative Performance comparison • SIMD Implementation Models o Distributed Memory Model • E.g. Illiac IV o Shared memory Model • E.g. BSP (Burroughs Scientific Processor) Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 15
    • SIMD COMPUTER ORGANIZATIONS Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 16
    • SIMD COMPUTER ORGANIZATIONS Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 17
    • SIMD COMPUTER ORGANIZATIONS • SIMD Instructions o Scalar Operations • Arithmetic/Logical o Vector Operations • Arithmetic/Logical o Data Routing Operations • Permutations, broadcasts, multicasts, rotation and shifting o Masking Operations • Enable/Disable PEs • Host and I/O • Bit-slice and Word-slice Processing o WSBS, WSBP, WPBS, WPBP Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 18