Successfully reported this slideshow.



Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this


  1. 1. Reg. No. :M.E. DEGREE EXAMINATION, JUNE 2010Second SemesterApplied ElectronicsAP9222 — COMPUTER ARCHITECTURE AND PARALLEL PROCESSING(Common to M.E-Computer and Communication, M.E-VLSI Design andM.E-Embedded System Technologies)(Regulation 2009)Time : Three hours Maximum : 100 MarksAnswer ALL QuestionsPART A — (10 × 2 = 20 Marks)1. Define Bernstein conditions related to parallelism and dependence relations.2. A workstation uses a 15-MHz processor with a claimed 10-MIPS rating toexecute a given program mix. What is the effective CPI of this computerassuming a one-cycle delay for each memory access?3. List the parameters used for evaluating parallel computations.4. Topologically equivalent networks are those whose graph representations areisomorphic with the same interconnection capabilities. Prove that the Omeganetwork is topologically equivalent to the Baseline network.5. A two — level memory system has eight virtual pages on a disk to be mappedinto four page frames in the main memory. A certain program generated thefollowing page trace:1,0,2,2,1,7,6,7,0,1,2,0,3,0,4,5,1,5,2,4,5,6,7,6,7,2,4,2,7,3,3,2,3Show the successive virtual pages residing in the four page frames withrespect to the above trace using LRU replacement policy. Compute the hit ratioin the main memory. Assume the page frames are initially empty.6. State the two sufficient conditions to achieve sequential consistency in sharedmemory access.7. Why are MIMD, MPMD or SPMD control preferred over SIMD dataparallelism?Question Paper Code: J7605315315315
  2. 2. J760528. Compare the advantages and disadvantages of chained directories for cachecoherence control in large-scale multiprocessor systems.9. Bring out the differences in the message passing OS models.10. Distinguish between spin locks and suspended locks for sole access Lou criticalsection.PART B — (5 × 16 = 80 Marks)11. (a) (i) Analyze the data dependencies among the following statements inthe given program:s1: Load RI, 1024s2: Load R2, M(10)s3: Add Rl, R2s4: Store M(1024), R1s5: Store M((R2)), 1024where (Ri) means the content of register Ri and M(10) contains 64initially.(1) Draw a dependence graph to show all the dependencies.(2) Are there any resource dependencies if only one copy of eachfunctional unit is available the CPU? (8)(ii) Explain about the theoretical models of parallel computers used byalgorithm designers and chip developers. (8)Or(b) Characterize the architectural operations of SIMD and MIMD computers.Distinguish between multiprocessors and multicomputer based on theirstructures, resource sharing and interprocessor communications. Alsoexplain the differences among UMA, NUMA, COMA and NORMAcomputers. (16)12. (a) Explain the applicability and restrictions involved in using Amdhal’s law,Guustafon’s law, Sun and Ni’s law to estimate the speedup performanceof an n-processor system compared with that of a single-processor systemignoring all communication overheads. (16)Or(b) (i) Compare control flow, data flow and reduction computers in termsof the program flow mechanism used. Comment on the advantagesand disadvantages of the above computer models. (8)(ii) Explain the steps involved in calculating the grain size andcommunication latency for multiplying two 2 × 2 matrices. (8)315315315
  3. 3. J7605313. (a) (i) Explain the difference between superscalar and VLIW architecturesin terms of hardware and software requirements. (8)(ii) Consider a two level memory hierarchy M1 and M2. Denote the hitratio of MI as h. Let c1 and c2 be the costs per kilobyte, s1 and s2the memory capacities, and t1 and t2 the access times respectively.(8)(1) Under what conditions will the average cost of the entirememory approach c2.(2) What is the effective memory access time of this hierarchy?(3) Let r=t2/tl be the speed ratio of the two memories.Let E=t1/ta be the access efficiency of the memory system.Express E in terms of r and h.(4) What is the required hit ratio h to make E>0.95 if r=l00?Or(b) (i) Describe the daisy chaining and the distributed arbiter for busarbitration on a multiprocessor system. State the advantages andshortcomings of each from both the implementational andoperational points of view. (8)(ii) Consider the following three interleaved memory designs for a mainmemory system with 16 memory modules. Each module is assumedto have a capacity of 1Mbyte. The machine is byte-addressable.Design I: 16- way interleaving with one memory bankDesign 2: 8-way interleaving with two memory banks.Design 3: 4 way interleaving with four memory banks.(1) Specify the address formats for each of the above memoryorganizations.(2) Determine the maximum memory bandwidth obtained if onlyone memory module fails in each of the above memoryorganizations.(3) Comment on the relative merits of the three interleavedmemory organizations. (8)14. (a) (i) Why are fine-grain processors chosen for future multiprocessorsover medium-grain processors used in the past? From scalabilitypoint of view why is fine-grain parallelism more appealing thanmedium-grain or coarse-grain parallelism for building MPPsystems? (8)(ii) Compare the connection machines CM-2 and CM-5 in theirarchitectures, operation modes, functional capabilities and potentialperformance. Comment on the improvement made in CM-5 overCM-2 from the viewpoints of a computer architect and a machineprogrammer. (8)Or315315315
  4. 4. J76054(b) (i) Prove that the greedy algorithm for multicast routing on awormhole routed hypercube network always yields the minimumnetwork traffic and minimum distance from the source to any of thedestinations. (6)(ii) Consider the following reservation table for a four stage pipelinewith a clock cycle r = 20 ns.1 2 3 4 5 6S1 X XS2 X XS3 XS4 X XOne non-compute delay stage into the pipeline can be inserted to make alatency of 1 permissible in the shortest greedy cycle. The purpose is toyield a new reservation table leading to an optimal latency equal to theupper bound. (10)(1) Show the modified reservation table with five rows and sevencolumns.(2) Draw the state transition diagram for the optimal cycle.(3) List all the simple and greedy cycles from the state diagram.(4) Prove that the new MAL equals the lower bound.(5) What is the optimal throughput of this pipeline?15. (a) (i) What is perfect decomposition? Discuss the differences in programreplication techniques on multi-computers as opposed to programpartitioning on multiprocessors. (8)(ii) Explain the multiprocessor UNIX design goals in the areas ofcompatibility, portability, address space, load balancing, parallelI/O and network services. (8)Or(b) Explain loop transformation theory and discuss how it can be applied forloop vectorization or Parallelization.———————315315315