Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Aca 2


Published on

Published in: Technology
  • Be the first to comment

Aca 2

  1. 1. System Attributes to Performance 22/9/2012
  2. 2. CPU/Processor driven by- A clock with a constant cycle time (τ) in nSecond Clock Rate: f = 1/ τ in megahertz  Ic- Instruction Count: Size of program/number of machine instructions to be executed in the program. Different machine instructions needed- different no. of clock cycles to execute CPI (Cycles per Instruction): Time needed to execute each Instruction. Average CPI: For a given Instruction Set.
  3. 3. Performance Factors: CPU Time (T): -Time needed to execute a Program. - in seconds/program T = CPU Time = Ic * CPI * τ Execution of an instruction going through a cycle of events : Instruction fetch Decode Operand(s) fetch Execution Store results
  4. 4. Events Carried out in the CPU: Instruction decodes Execution phases Remaining three required to Access the memory. Memory Cycle : Time needed to complete one memory reference. Note- Memory cycle is k times processor cycle τ. k depends upon speed of memory technology.
  5. 5. System Attributes Influence on Performance Factor (Ic, p, m, k, t): 1.Instruction-set architecture- Affects the program length (Ic) and processor cycle needed (p) 2.Compiler Technology- Affect value of Ic, p, m 3.CPU Implementation & Control- Determine total processor time (p * τ) 4.Cache & Memory Hierarchy- Affect the memory access latency (k*τ)
  6. 6. System Attributes Performance Factors Instr. Count (Ic) Avg. Cycles per Instruction, CPI Processor Cycle Time τ Processor Cycles per instruction (p) Memory Reference/ Instruction, (m) Memory Access Latency, (k) Instruction-Set Architecture X X Compiler Technology X X X Processor Implementation & Control X X Cache & Memory Hierarchy X X
  7. 7. MIPS Rate: Million Instructions per Second C = Total no. clock cycle needed to execute a Program T = C * τ = C/f CPI = C/Ic T = Ic * CPI * τ = (Ic * CPI)/f
  8. 8. Throughput Rate (Ws): No. of programs a system can execute per unit Time. Ws = Program/Second Note:- In Multiprogrammed system, System throughput (Ws) is often lower than CPU throughput Wp. Wp = f/ (Ic * CPI) = 1/ Ic * CPI * τ = 1 Program/T Ws = Wp If the CPU is kept busy in a perfect program-interleaving fashion
  9. 9. Two approaches to parallel programming : Sequential Coded Source Program Detect Parallelism & Assign target Machine Resources Note:- Compiler Approach applied in programming Shared-Memory Multiprocessors
  10. 10. •Parallel Dialects of C…… •Parallelism specified in user Program Note:- Approach applied in Multicomputer
  11. 11. Parallel Computers Architectural Model/ Physical Model Distinguished by having- 1. Shared Common Memory: Three Shared-Memory Multiprocessor Models are: i. UMA (Uniform-Memory Access) ii. NUMA (Non-Uniform-Memory Access) iii. COMA (Cache-Only Memory Architecture) 2. Unshared Distributed Memory i. CC-NUMA (Cache-Coherent -NUMA)
  12. 12. UMA Multiprocessor Model
  13. 13. Physical memory is uniformly shared by all the processors. All Processors have equal access time to all memory words, so it is called Uniform Memory Access. Peripherals are also shared in some fashion. Also called Tightly Coupled Systems -due to the high degree of resource sharing.
  14. 14. Symmetric Vs Asymmetric Multiprocessor Symmetric Multiprocessor: All processors have equal access to all peripheral devices. Asymmetric Multiprocessor: Only one or a subset of processors are executive capable. i. MP (Executive or Master Processor)- Can execute the O.S. and handle I/O ii. AP (Attached Processor)- No I/O capability AP execute user codes under Supervision of MP
  15. 15. NUMA Multiprocessor Model Shared-Memory System Access Time varies with the location of the Memory Word Local Memories (LM): Shared Memory is physically distributed to all processors Global Address Space: Forms by collection of all Local Memories (LM) that is accessible by all processors. Faster Access to a local memory with a local processor Slow Access to remote memory attached to other processors due to the added delay through the interconnection network
  16. 16. LM – Local Memory P - Local Processor
  17. 17. P – Processor CSM – Cluster Shared Memory CIN – Cluster Interconnection Network GSM – Global Shared Memory UMA or NUMA (Access of Remote Memory)
  18. 18. Three Memory-Access Patterns when Globally Shared Memory (GSM) added to a multiprocessor system: i. The fastest is Local Memory(LM) access ii. The next is global memory (GSM)access iii. The slowest is access of Remote Memory Remote Memory- LM attach to other processor Note: All cluster have equal access to GSM Access right among Intercluster memories can be specified.
  19. 19. COMA Multiprocessor Model • Distributed Main Memory converted to Cache •Cache form Global Address Space •Remote Cache access by – Distributed cache Directories C – Cache P – Processor D - Directories
  20. 20. Multiprocessor System Suitable for- General purpose Multiuser Applications Programmability is major concern Shortcoming of Multiprocessor System- Lack of Scalability Limitation in Latency Tolerance for Remote Memory Access
  21. 21. Mini – Super Computer Near- Super Computer MPP Class
  22. 22. Distributed-Memory Multicomputers Nodes- Multiple Computer in System Interconnection by Message-Passing Network Node is an Autonomous Computer consists of: Processor Local memory Sometimes attached Disks Sometimes attached I/O Peripherals Message-passing network provide: Point-to-point Static connection among nodes Local Memories(LM)- private (accessible only by Local Processor) NORMA(No-remote-memory-access)-traditional multicomputer
  23. 23. Fig:- Generic model of a message-passing multicomputer M – Local Memory P - Processor Node
  24. 24. Parallel Computers: SIMD or MIMD configuration SIMD- For special purpose applications CM 2 (Connection Machine) on SIMD architecture MIMD- CM 5 on MIMD architecture Having globally shared virtual address space Scalable multiprocessors or multicomputer: use distributed shared memory Unscalable multiprocessors: use centrally shared memory
  25. 25. Fig:- Gordon Bell's taxonomy of MIMD computers.
  26. 26. Supercomputer Classification: Pipelined Vector machine/ Vector Supercomputers- *Using a few powerful processors equipped with vector hardware *Vector Processing SIMD Computers / Parallel Processors- *Emphasizing massive data parallelism
  27. 27. Vector Supercomputers 1 2 3 4 5 6
  28. 28. Step 1-2 Program & data are first loaded into the Main Memory through a Host computer. Step 3 All instructions are first decoded by the Scalar Control Unit. Step 4 If the decoded instruction is a scalar operation or a program control operation, it will be directly executed by the scalar processor using the Scalar Functional Pipelines. Step 5 If the instructions are decoded as a Vector operation, it will be sent to the vector control unit. Step 6 Vector control unit will supervise the flow of vector data between the main memory and vector functional pipelines. Note: A number of vector functional pipelines may be built into a
  29. 29. SIMD Supercomputers CU- Control Unit PE- Processing Element LM- Local Memory IS- Instruction Stream DS- Data Stream (Abstract Model of a SIMD computer)
  30. 30. (Operational model of SIMD computer)
  31. 31. SIMD Machine Model: An operational model of an SIMD computer is specified by a 5-tuple: M = <N , C , I , M , R> (1) N = No. of Processing Elements (PE) in the machine. (2) C =Set of instructions directly executed by the control unit (CU). Scalar & Program Flow Control Instructions. (3) I = Set of instructions broadcast by the CU to all PEs for parallel execution. Include: Arithmetic, logic, data routing, masking, and other local operations executed by each active PE over data within that PE.
  32. 32. (4) M = Set of Masking Schemes Each mask partitions the set of PEs into enabled and disabled subsets. (5) R = Set of data-routing functions Specifying various patterns to be set up in the interconnection network for inter-PE communications.