Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Connection Machine


Published on

  • Be the first to comment

  • Be the first to like this

Connection Machine

  1. 1. Connection MachineArchitecture<br />Greg Faust, Mike Gibson, Sal Valente<br />CS-6354 Computer Architecture<br />Fall 2009<br />1<br />
  2. 2. Historic Timeline<br />1981: MIT AI-Lab Technical Memo on CM<br />1982: Thinking Machines Inc. Founded <br />1985: Danny Hillis wins ACM “Best PhD” Award<br />1986: CM-1 Ships<br />1987: CM-2 Ships<br />1991: CM-5 Announced<br />1991: CM-5 Ships<br />1994: TMI Chapter 11 – Sun/Oracle pick bones<br />Heavily DARPA funded/backed<br />$16M+ Direct Contracts plus subsidized CM sales<br />2<br />
  3. 3. Involved Notables<br />Danny Hillis – CM inventor and TMI Founder<br />Charles Leiserson – Fat tree inventor<br />Richard Feynman – Noble Prize winning Physicist<br />Marvin Minsky – MIT AI Lab “Visionary”<br />Guy Steele – Common Lisp, Grace Hopper Award<br />Stephen Wolfram – Mathematica inventor<br />Doug Lenat – Mind/Body problem philosopher<br />Greg Papadopoulos – MIT Media lab, Sun CTO<br />various others<br />3<br />
  4. 4. CM-1 and CM-2 Architecture<br />Original design goal to support neuron like simulations<br />Up to 64K single bit processors (actually 3 bits in and 2 out)<br />16 Processors/chip, 32chips/PCB, 16 PCBs/cube, 8cubes/hypercube<br />Hypercube architecture – Each 16-Proc chip a hyper-node<br />Each proc has 4K bits of bit addressable RAM<br />Distributed Physical Memory <br />Global Memory Addresses<br />Up to 4 front-end computers talk to sequencers via 4x4 crossbar<br />“Sequencers” issue SIMD instructions over a Broadcast Network<br />Bit procs communicate via 2D local HW grid connections (“NEWS”)<br />Bit procs communicate via hypercube network using MSG passing<br />Lots of Twinkling Lights!!<br />4<br />
  5. 5. CM-1 CM-2 Architecture<br />5<br />
  6. 6. CM-1 and CM-2 Programming<br />ISA supports:<br />Bit-oriented operations<br />Arbitrary precision multi-bit scalar Ops using bit-serial implementation on bit procs<br />Full Multi-Dimensional Vector Ops<br />“Virtual Processor” idea similar to CUDA threadsbut they are statically allocated<br />OS and Programming Tools run on front-ends<br />*Lisp as the initial programming language<br />Later C* and CM-Fortran<br />6<br />
  7. 7. CM-2 Improvements<br />1 Weitek IEEE FP coprocessor per 32 1-bit procs<br />Up to 256K bits of memory per processor<br />Added ECC to Memory<br />Implemented the IO subsystem<br />Up to 80 GByte RAID array called “Data Vault”uses 39 Striped Disks and ECC, plus spare disks on standby<br />High Speed Graphics Output<br />En-route MSG combining in H-Cube router<br />New implementation of Multi-DimensionalNEWS on top of H-Cube (special addressing mode)<br />7<br />
  8. 8. CM-1 Photo<br />8<br />
  9. 9. CM-5 vs CM-1 and CM-2<br />Significant departure from CM-1 and CM-2<br />Targeted at more scientific and business applications <br />More Commercial Off-The-Shelf components (“COTS”)<br />Large Array of SPARC Processing Nodes<br />1-bit processors are abandoned<br />Abandoned “NEWS” Grid and Hyper-Cube Networks<br />Delivered 1024 node machine, with claims 16K nodes possible<br />Even More Twinkling Lights!<br />9<br />
  10. 10. CM-5 Photo – Watch it Blink<br />10<br />
  11. 11. CM-5 Overall Architecture<br />"Coordinated Homogeneous Array of RISC Processors“ or “CHARM”<br />Asymmetric CoProcessors Model<br />Large Array of Processor Nodes<br />Small Collection of Control Nodes<br />2 Separate scalable networks<br />One for data<br />One for control and synchronization<br />Still uses striped RAID for high disk BandWidth<br />11<br />
  12. 12. Division of Labor<br />Processor Nodes can be assigned to a “Partition”<br />One Control Node per Partition<br />Control Node runs scalar code, then broadcasts parallel work to Processor Nodes<br />Processor Nodes receive a program, not an instruction stream, have own Program Counter<br />Processor nodes can access other node's memory by reading or writing a global memory address<br />Processor Nodes also communicate via MSG passing<br />Processor Nodes cannot issue system calls<br />12<br />
  13. 13. Control Nodes<br />Full Sun Workstations<br />Running UNIX<br />Connected to the “Outside World”<br />Handles Partition Time Sharing<br />Connected to both data and control networks<br />Performs System Diagnostics<br />13<br />
  14. 14. Processor Nodes<br />Nodes are a 5-chip microprocessor<br />Off the Shelf SPARC processor @ 40 MHz<br />32MBytes local node memory<br />Multi-port memory controller for added BW<br />“Caching techniques do not perform as well on large parallel machines”<br />Proprietary 4-FPU Vector coprocessor<br />Proprietary network controller<br />14<br />
  15. 15. CM-5 Processor Node Diagram<br />15<br />
  16. 16. Data Network Architecture<br />Point to Point Inter-node communication and I/O<br />Implemented as a Fat Tree<br />Fat Trees invented by TMI employee Charles Leiserson<br />Claim: Onsite BandWidth Expandable<br />Delivering 5GB/sec Bisection BW on 1024 node machine<br />Data router chip is a 8x8 crossbar switch<br />Faulty nodes are mapped out of network<br />Programs can not assume a network topology<br />Network can be flushed when Time Share swaps occur<br />Network, not processors, guarantee end to end delivery<br />16<br />
  17. 17. Fat Tree Structure<br />17<br />
  18. 18. Separate Control Network<br />Synchronization & control network<br />Complete Binary Tree organization<br />Provides broadcast capability<br />Implements barrier operations<br />Implements interrupts for timesharing<br />Performs reduction operators (Sum, Max, AND, OR, Count, etc)<br />18<br />
  19. 19. CM-5 Programming<br />Supports multiple Parallel High Level Languages and Programming Styles<br />Including Data Parallel Model from CM-1 and CM-2<br />Goal: Hide many decisions from programmers<br />CM-1, CM-2 vs CM-5 ISA changes<br />Use of Processor Node CPU vs Vector CoProcessors<br />Partition Wide Synchronizations generate by Compiler<br />Is it MIMD, SPMD, SIMD? <br />“Globally Synchronized MIMD”<br />19<br />
  20. 20. Sample CM Apps<br />Machine Learning<br />Neural Nets, concept clustering, genetic algorithms<br />VLSI Design<br />Geophysics (Oil Exploration), Plate Tectonics<br />Particle Simulation<br />Fluid Flow Simulation<br />Computer Vision<br />Computer Graphics , Animation<br />Protein Sequence Matching<br />Global Climate Model Simulation<br />20<br />
  21. 21. References<br />Danny Hillis PhD: The Connection Machine<br />Inc: The Rise and Fall of Thinking Machines<br />Wiki: Connection Machine<br />ACM: The CM-5 Connection Machine<br />ACM: The Network Architecture of the CM-5<br />IEEE: Architecture and Applications of the Connection Machine<br />IEEE: Fat-trees: universal networks for hardware-efficient supercomputing<br />Encyclopedia of Computer Science and Technology<br />21<br />