Connection Machine

1,257 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,257
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
27
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Connection Machine

  1. 1. Connection MachineArchitecture<br />Greg Faust, Mike Gibson, Sal Valente<br />CS-6354 Computer Architecture<br />Fall 2009<br />1<br />
  2. 2. Historic Timeline<br />1981: MIT AI-Lab Technical Memo on CM<br />1982: Thinking Machines Inc. Founded <br />1985: Danny Hillis wins ACM “Best PhD” Award<br />1986: CM-1 Ships<br />1987: CM-2 Ships<br />1991: CM-5 Announced<br />1991: CM-5 Ships<br />1994: TMI Chapter 11 – Sun/Oracle pick bones<br />Heavily DARPA funded/backed<br />$16M+ Direct Contracts plus subsidized CM sales<br />2<br />
  3. 3. Involved Notables<br />Danny Hillis – CM inventor and TMI Founder<br />Charles Leiserson – Fat tree inventor<br />Richard Feynman – Noble Prize winning Physicist<br />Marvin Minsky – MIT AI Lab “Visionary”<br />Guy Steele – Common Lisp, Grace Hopper Award<br />Stephen Wolfram – Mathematica inventor<br />Doug Lenat – Mind/Body problem philosopher<br />Greg Papadopoulos – MIT Media lab, Sun CTO<br />various others<br />3<br />
  4. 4. CM-1 and CM-2 Architecture<br />Original design goal to support neuron like simulations<br />Up to 64K single bit processors (actually 3 bits in and 2 out)<br />16 Processors/chip, 32chips/PCB, 16 PCBs/cube, 8cubes/hypercube<br />Hypercube architecture – Each 16-Proc chip a hyper-node<br />Each proc has 4K bits of bit addressable RAM<br />Distributed Physical Memory <br />Global Memory Addresses<br />Up to 4 front-end computers talk to sequencers via 4x4 crossbar<br />“Sequencers” issue SIMD instructions over a Broadcast Network<br />Bit procs communicate via 2D local HW grid connections (“NEWS”)<br />Bit procs communicate via hypercube network using MSG passing<br />Lots of Twinkling Lights!!<br />4<br />
  5. 5. CM-1 CM-2 Architecture<br />5<br />
  6. 6. CM-1 and CM-2 Programming<br />ISA supports:<br />Bit-oriented operations<br />Arbitrary precision multi-bit scalar Ops using bit-serial implementation on bit procs<br />Full Multi-Dimensional Vector Ops<br />“Virtual Processor” idea similar to CUDA threadsbut they are statically allocated<br />OS and Programming Tools run on front-ends<br />*Lisp as the initial programming language<br />Later C* and CM-Fortran<br />6<br />
  7. 7. CM-2 Improvements<br />1 Weitek IEEE FP coprocessor per 32 1-bit procs<br />Up to 256K bits of memory per processor<br />Added ECC to Memory<br />Implemented the IO subsystem<br />Up to 80 GByte RAID array called “Data Vault”uses 39 Striped Disks and ECC, plus spare disks on standby<br />High Speed Graphics Output<br />En-route MSG combining in H-Cube router<br />New implementation of Multi-DimensionalNEWS on top of H-Cube (special addressing mode)<br />7<br />
  8. 8. CM-1 Photo<br />8<br />
  9. 9. CM-5 vs CM-1 and CM-2<br />Significant departure from CM-1 and CM-2<br />Targeted at more scientific and business applications <br />More Commercial Off-The-Shelf components (“COTS”)<br />Large Array of SPARC Processing Nodes<br />1-bit processors are abandoned<br />Abandoned “NEWS” Grid and Hyper-Cube Networks<br />Delivered 1024 node machine, with claims 16K nodes possible<br />Even More Twinkling Lights!<br />9<br />
  10. 10. CM-5 Photo – Watch it Blink<br />10<br />
  11. 11. CM-5 Overall Architecture<br />"Coordinated Homogeneous Array of RISC Processors“ or “CHARM”<br />Asymmetric CoProcessors Model<br />Large Array of Processor Nodes<br />Small Collection of Control Nodes<br />2 Separate scalable networks<br />One for data<br />One for control and synchronization<br />Still uses striped RAID for high disk BandWidth<br />11<br />
  12. 12. Division of Labor<br />Processor Nodes can be assigned to a “Partition”<br />One Control Node per Partition<br />Control Node runs scalar code, then broadcasts parallel work to Processor Nodes<br />Processor Nodes receive a program, not an instruction stream, have own Program Counter<br />Processor nodes can access other node's memory by reading or writing a global memory address<br />Processor Nodes also communicate via MSG passing<br />Processor Nodes cannot issue system calls<br />12<br />
  13. 13. Control Nodes<br />Full Sun Workstations<br />Running UNIX<br />Connected to the “Outside World”<br />Handles Partition Time Sharing<br />Connected to both data and control networks<br />Performs System Diagnostics<br />13<br />
  14. 14. Processor Nodes<br />Nodes are a 5-chip microprocessor<br />Off the Shelf SPARC processor @ 40 MHz<br />32MBytes local node memory<br />Multi-port memory controller for added BW<br />“Caching techniques do not perform as well on large parallel machines”<br />Proprietary 4-FPU Vector coprocessor<br />Proprietary network controller<br />14<br />
  15. 15. CM-5 Processor Node Diagram<br />15<br />
  16. 16. Data Network Architecture<br />Point to Point Inter-node communication and I/O<br />Implemented as a Fat Tree<br />Fat Trees invented by TMI employee Charles Leiserson<br />Claim: Onsite BandWidth Expandable<br />Delivering 5GB/sec Bisection BW on 1024 node machine<br />Data router chip is a 8x8 crossbar switch<br />Faulty nodes are mapped out of network<br />Programs can not assume a network topology<br />Network can be flushed when Time Share swaps occur<br />Network, not processors, guarantee end to end delivery<br />16<br />
  17. 17. Fat Tree Structure<br />17<br />
  18. 18. Separate Control Network<br />Synchronization & control network<br />Complete Binary Tree organization<br />Provides broadcast capability<br />Implements barrier operations<br />Implements interrupts for timesharing<br />Performs reduction operators (Sum, Max, AND, OR, Count, etc)<br />18<br />
  19. 19. CM-5 Programming<br />Supports multiple Parallel High Level Languages and Programming Styles<br />Including Data Parallel Model from CM-1 and CM-2<br />Goal: Hide many decisions from programmers<br />CM-1, CM-2 vs CM-5 ISA changes<br />Use of Processor Node CPU vs Vector CoProcessors<br />Partition Wide Synchronizations generate by Compiler<br />Is it MIMD, SPMD, SIMD? <br />“Globally Synchronized MIMD”<br />19<br />
  20. 20. Sample CM Apps<br />Machine Learning<br />Neural Nets, concept clustering, genetic algorithms<br />VLSI Design<br />Geophysics (Oil Exploration), Plate Tectonics<br />Particle Simulation<br />Fluid Flow Simulation<br />Computer Vision<br />Computer Graphics , Animation<br />Protein Sequence Matching<br />Global Climate Model Simulation<br />20<br />
  21. 21. References<br />Danny Hillis PhD: The Connection Machine<br />Inc: The Rise and Fall of Thinking Machines<br />Wiki: Connection Machine<br />ACM: The CM-5 Connection Machine<br />ACM: The Network Architecture of the CM-5<br />IEEE: Architecture and Applications of the Connection Machine<br />IEEE: Fat-trees: universal networks for hardware-efficient supercomputing<br />Encyclopedia of Computer Science and Technology<br />21<br />

×