Dedicated Fully Parallel
Architecture
Group members:
Ghufran Hasan
 CMS: 11432
Wali Khan
 CMS: 11225
Introduction
Definition: “A parallel computer is a collection of processing elements
that cooperate and communicate to solve large problems fast.”
It is also called “Parallel Computers” or “Parallel Processors”
The dream of computer architects since 1960: replicate processors to
add performance vs. design a faster processor
Led to innovative organization tied to particular programming models
since
“uniprocessors can’t keep going”
• e.g., uniprocessors must stop getting faster due to limit of speed of light: 1972, … , 1989
• Borders religious fervor: you must believe!
• Fervor damped some when 1990s companies went out of business: Thinking Machines,
Kendall Square, ...
Argument instead is the “pull” of opportunity of scalable performance,
not the “push” of uniprocessor performance plateau
Opportunities:
Scientific Computing
Nearly Unlimited Demand (Grand Challenge):
App Perf (GFLOPS) Memory (GB)
48 hour weather 0.1 0.1
72 hour weather 3 1
Pharmaceutical design 100 10
Global Change, Genome 1000 1000
Opportunities:
Scientific Computing(Cont.)
Successes in some real industries:
Petroleum: reservoir modeling
Automotive: crash simulation, drag analysis, engine
Aeronautics: airflow analysis, engine, structural mechanics
Pharmaceuticals: molecular modeling
Entertainment: full length movies (“Toy Story”)
Opportunities:
Commercial Computing
Throughput (Transactions per minute) vs. Time (1996)
Speedup: 1 4 8 16 32 64 112
IBM RS6000 735 1438 3119
1.00 1.96 4.24
Tandem Himilaya 3043 6067 12021 20918
1.00 1.99 3.95 6.87
IBM performance hit 1=>4, good 4=>8
Tandem scales: 112/16 = 7.0
Others: File servers, eletronic CAD simulation (multiple processes),
WWW search engines
What level Parallelism?
Bit level parallelism: 1970 to 1985
4 bits, 8 bit, 16 bit, 32 bit microprocessors
Instruction level parallelism (ILP):
1985 through today
Pipelining
Superscalar
VLIW
Out-of-Order execution
Limits to benefits of ILP?
Process Level or Thread level parallelism; mainstream for general
purpose computing?
Servers are parallel
High end Desktop dual processor PC
Parallel Architecture
Parallel Architecture extends traditional computer architecture with a
communication architecture
abstractions (HW/SW interface)
organizational structure to realize abstraction efficiently
Fundamental Issues
3 Issues to characterize parallel machines
1) Naming
2) Synchronization
3) Latency and Bandwidth
Parallel Framework
Layers:
Programming Model:
• Multiprogramming : lots of jobs, no communication
• Shared address space: communicate via memory
• Message passing: send and receive messages
• Data Parallel: several agents operate on several data sets simultaneously and then
exchange information globally and simultaneously (shared or message passing)
CommunicationAbstraction:
• Shared address space: e.g., load, store, atomic swap
• Message passing: e.g., send, receive library calls
• Debate over this topic (ease of programming, scaling)
=> many hardware designs 1:1 programming model
Multiple Processor Organization
Single instruction, single data stream – SISD
Single instruction, multiple data stream – SIMD
Multiple instruction, single data stream – MISD
Multiple instruction, multiple data stream- MIMD
Single Instruction, Single Data Stream
- SISD
Single processor
Single instruction stream
Data stored in single memory
Uni-processor
Single Instruction, Multiple Data Stream -
SIMD
Single machine instruction
Controls simultaneous execution
Number of processing elements
Lockstep basis
Each processing element has associated data memory
Each instruction executed on different set of data by different
processors
Vector and array processors
Examples:
Illiac-IV, CM-2
Multiple Instruction, Single Data
Stream - MISD
Sequence of data
Transmitted to set of processors
Each processor executes different instruction sequence
Never been implemented
Multiple Instruction, Multiple Data
Stream- MIMD
Set of processors
Simultaneously execute different instruction sequences
Different sets of data
SMPs, clusters and NUMA systems
Examples:
Sun Enterprise 5000, Cray T3D, SGI Origin
Thank You

Dedicated fully parallel architecture

  • 1.
    Dedicated Fully Parallel Architecture Groupmembers: Ghufran Hasan  CMS: 11432 Wali Khan  CMS: 11225
  • 2.
    Introduction Definition: “A parallelcomputer is a collection of processing elements that cooperate and communicate to solve large problems fast.” It is also called “Parallel Computers” or “Parallel Processors” The dream of computer architects since 1960: replicate processors to add performance vs. design a faster processor Led to innovative organization tied to particular programming models since “uniprocessors can’t keep going” • e.g., uniprocessors must stop getting faster due to limit of speed of light: 1972, … , 1989 • Borders religious fervor: you must believe! • Fervor damped some when 1990s companies went out of business: Thinking Machines, Kendall Square, ... Argument instead is the “pull” of opportunity of scalable performance, not the “push” of uniprocessor performance plateau
  • 3.
    Opportunities: Scientific Computing Nearly UnlimitedDemand (Grand Challenge): App Perf (GFLOPS) Memory (GB) 48 hour weather 0.1 0.1 72 hour weather 3 1 Pharmaceutical design 100 10 Global Change, Genome 1000 1000
  • 4.
    Opportunities: Scientific Computing(Cont.) Successes insome real industries: Petroleum: reservoir modeling Automotive: crash simulation, drag analysis, engine Aeronautics: airflow analysis, engine, structural mechanics Pharmaceuticals: molecular modeling Entertainment: full length movies (“Toy Story”)
  • 5.
    Opportunities: Commercial Computing Throughput (Transactionsper minute) vs. Time (1996) Speedup: 1 4 8 16 32 64 112 IBM RS6000 735 1438 3119 1.00 1.96 4.24 Tandem Himilaya 3043 6067 12021 20918 1.00 1.99 3.95 6.87 IBM performance hit 1=>4, good 4=>8 Tandem scales: 112/16 = 7.0 Others: File servers, eletronic CAD simulation (multiple processes), WWW search engines
  • 6.
    What level Parallelism? Bitlevel parallelism: 1970 to 1985 4 bits, 8 bit, 16 bit, 32 bit microprocessors Instruction level parallelism (ILP): 1985 through today Pipelining Superscalar VLIW Out-of-Order execution Limits to benefits of ILP? Process Level or Thread level parallelism; mainstream for general purpose computing? Servers are parallel High end Desktop dual processor PC
  • 7.
    Parallel Architecture Parallel Architectureextends traditional computer architecture with a communication architecture abstractions (HW/SW interface) organizational structure to realize abstraction efficiently
  • 8.
    Fundamental Issues 3 Issuesto characterize parallel machines 1) Naming 2) Synchronization 3) Latency and Bandwidth
  • 9.
    Parallel Framework Layers: Programming Model: •Multiprogramming : lots of jobs, no communication • Shared address space: communicate via memory • Message passing: send and receive messages • Data Parallel: several agents operate on several data sets simultaneously and then exchange information globally and simultaneously (shared or message passing) CommunicationAbstraction: • Shared address space: e.g., load, store, atomic swap • Message passing: e.g., send, receive library calls • Debate over this topic (ease of programming, scaling) => many hardware designs 1:1 programming model
  • 10.
    Multiple Processor Organization Singleinstruction, single data stream – SISD Single instruction, multiple data stream – SIMD Multiple instruction, single data stream – MISD Multiple instruction, multiple data stream- MIMD
  • 11.
    Single Instruction, SingleData Stream - SISD Single processor Single instruction stream Data stored in single memory Uni-processor
  • 12.
    Single Instruction, MultipleData Stream - SIMD Single machine instruction Controls simultaneous execution Number of processing elements Lockstep basis Each processing element has associated data memory Each instruction executed on different set of data by different processors Vector and array processors Examples: Illiac-IV, CM-2
  • 13.
    Multiple Instruction, SingleData Stream - MISD Sequence of data Transmitted to set of processors Each processor executes different instruction sequence Never been implemented
  • 14.
    Multiple Instruction, MultipleData Stream- MIMD Set of processors Simultaneously execute different instruction sequences Different sets of data SMPs, clusters and NUMA systems Examples: Sun Enterprise 5000, Cray T3D, SGI Origin
  • 15.