1b.3Conventional ComputerConsists of a processor executing a program stored in a(main) memory:Each main memory location located by its address.Addresses start at 0 and extend to 2b- 1 when there areb bits (binary digits) in address.Main memoryProcessorInstructions (to processor)Data (to or from processor)
1b.4Shared Memory Multiprocessor SystemNatural way to extend single processor model - have multipleprocessors connected to multiple memory modules, such thateach processor can access any memory module:ProcessorsProcessor-memoryInterconnectionsMemory moduleOneaddressspace
1b.5Simplistic view of a small shared memorymultiprocessorExamples:• Dual Pentiums• Quad PentiumsProcessors Shared memoryBus
1b.6Real computer system have cache memory between the mainmemory and processors. Level 1 (L1) cache and Level 2 (L2) cache.Example Quad Shared Memory MultiprocessorProcessorL2 CacheBus interfaceL1 cacheProcessorL2 CacheBus interfaceL1 cacheProcessorL2 CacheBus interfaceL1 cacheProcessorL2 CacheBus interfaceL1 cacheMemory controllerMemoryProcessor/memorybusShared memory
1b.7“Recent” innovation• Dual-core and multi-core processors• Two or more independent processors in onepackage• Actually an old idea but not put into wide practiceuntil recently.• Since L1 cache is usually inside package and L2cache outside package, dual-/multi-core processorsusually share L2 cache.
1b.9Examples• Intel:– Core Dual processors -- Two processors in one packagesharing a common L2 Cache. 2005-2006– Intel Core 2 family dual cores, with quad core from Nov2006 onwards– Core i7 processors replacing Core 2 family - Quad coreNov 2008– Intel Teraflops Research Chip (Polaris), a 3.16 GHz, 80-core processor prototype.• Xbox 360 game console -- triple core PowerPCmicroprocessor.• PlayStation 3 Cell processor -- 9 core design.References and more information -- wikipedia
1b.11Programming Shared MemoryMultiprocessorsSeveral possible ways1. Thread libraries - programmer decomposes program intoindividual parallel sequences, (threads), each being ableto access shared variables declared outside threads.Example Pthreads2. Higher level library functions and preprocessor compilerdirectives to declare shared variables and specifyparallelism. Uses threads.Example OpenMP - industry standard. Consists oflibrary functions, compiler directives, and environmentvariables - needs OpenMP compiler
1b.123. Use a modified sequential programming language -- addedsyntax to declare shared variables and specify parallelism.Example UPC (Unified Parallel C) - needs a UPCcompiler.4. Use a specially designed parallel programming language --with syntax to express parallelism. Compiler automaticallycreates executable code for each processor (not nowcommon).5. Use a regular sequential programming language such as Cand ask parallelizing compiler to convert it into parallelexecutable code. Also not now common.
1b.13Message-Passing MulticomputerComplete computers connected through aninterconnection network:ProcessorInterconnectionnetworkLocalComputersMessagesmemory
1b.14Interconnection NetworksMany explored in the 1970s and 1980s• Limited and exhaustive interconnections• 2- and 3-dimensional meshes• Hypercube• Using Switches:– Crossbar– Trees– Multistage interconnection networks
1b.15Networked Computers as aComputing Platform• A network of computers became a very attractivealternative to expensive supercomputers andparallel computer systems for high-performancecomputing in early 1990s.• Several early projects. Notable:– Berkeley NOW (network of workstations)project.– NASA Beowulf project.
1b.16Key advantages:• Very high performance workstations and PCsreadily available at low cost.• The latest processors can easily beincorporated into the system as they becomeavailable.• Existing software can be used or modified.
1b.17Beowulf Clusters*• A group of interconnected “commodity”computers achieving high performance withlow cost.• Typically using commodity interconnects -high speed Ethernet, and Linux OS.* Beowulf comes from name given by NASA GoddardSpace Flight Center cluster project.
1b.18Cluster Interconnects• Originally fast Ethernet on low cost clusters• Gigabit Ethernet - easy upgrade pathMore Specialized/Higher Performance• Myrinet - 2.4 Gbits/sec - disadvantage: single vendor• cLan• SCI (Scalable Coherent Interface)• QNet• Infiniband - may be important as infininbandinterfaces may be integrated on next generation PCs
1b.19Dedicated cluster with a master nodeand compute nodesUserMaster nodeCompute nodesDedicated ClusterEthernet interfaceSwitchExternal networkComputersLocal network
1b.20Software Tools for Clusters• Based upon message passing programming model• User-level libraries provided for explicitly specifyingmessages to be sent between executing processes oneach computer .• Use with regular programming languages (C, C++, ...).• Can be quite difficult to program correctly as we shallsee.
Next step• Learn the message passingprogramming model, some MPIroutines, write a message-passingprogram and test on the cluster.1b.21