Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Ceg4131 models

697 views

Published on

Published in: Automotive, Technology, Business
  • Be the first to comment

  • Be the first to like this

Ceg4131 models

  1. 1. Parallel Computer Models CEG 4131 Computer Architecture III Miodrag Bolic 1
  2. 2. Overview• Flynn’s taxonomy• Classification based on the memory arrangement• Classification based on communication• Classification based on the kind of parallelism – Data-parallel – Function-parallel 2
  3. 3. Flynn’s Taxonomy– The most universally excepted method of classifying computer systems– Published in the Proceedings of the IEEE in 1966– Any computer can be placed in one of 4 broad categories» SISD: Single instruction stream, single data stream» SIMD: Single instruction stream, multiple data streams» MIMD: Multiple instruction streams, multiple data streams» MISD: Multiple instruction streams, single data stream 3
  4. 4. SISD Instructions Processing Main memoryelement (PE) (M) Data IS IS DS Control Unit PE Memory 4
  5. 5. SIMD Applications: • Image processing • Matrix manipulations • Sorting 5
  6. 6. SIMD Architectures• Fine-grained – Image processing application – Large number of PEs – Minimum complexity PEs – Programming language is a simple extension of a sequential language• Coarse-grained – Each PE is of higher complexity and it is usually built with commercial devices – Each PE has local memory 6
  7. 7. MIMD 7
  8. 8. MISD Applications: • Classification • Robot vision 8
  9. 9. Flynn taxonomy– Advantages of Flynn» Universally accepted» Compact Notation» Easy to classify a system (?)– Disadvantages of Flynn» Very coarse-grain differentiation among machine systems» Comparison of different systems is limited» Interconnections, I/O, memory not considered in the scheme 9
  10. 10. Classification based on memory arrangement Shared memory Interconnection I/O1 network Interconnection network I/On PE1 PEn PE1 PEn M1 Mn Processors P1 PnShared memory - multiprocessors Message passing - multicomputers 10
  11. 11. Shared-memory multiprocessors• Uniform Memory Access (UMA)• Non-Uniform Memory Access (NUMA)• Cache-only Memory Architecture (COMA)• Memory is common to all the processors.• Processors easily communicate by means of shared variables. 11
  12. 12. The UMA Model• Tightly-coupled systems (high degree of resource sharing)• Suitable for general-purpose and time-sharing applications by multiple users. P1 Pn $ $ Interconnection network Mem Mem 12
  13. 13. Symmetric and asymmetric multiprocessors• Symmetric: - all processors have equal access to all peripheral devices. - all processors are identical.• Asymmetric: - one processor (master) executes the operating system - other processors may be of different types and may be dedicated to special tasks. 13
  14. 14. The NUMA Model• The access time varies with the location of the memory word.• Shared memory is distributed to local memories.• All local memories form a global address space accessible by all processors Access time: Cache, Local memory, Remote memory COMA - Cache-only Memory Architecture P1 Pn $ $ Mem Mem Interconnection network Distributed Memory (NUMA) 14
  15. 15. Distributed memory multicomputers• Multiple computers- nodes• Message-passing network• Local memories are private with its own program and data M M M• No memory contention so that the PE PE PE number of processors is very large• The processors are connected by Interconnection communication lines, and the precise network way in which the lines are connected is called the topology of the multicomputer. PE PE PE• A typical program consists of M M M subtasks residing in all the memories. 15
  16. 16. Classification based on type of interconnections• Static networks• Dynamic networks 16
  17. 17. Interconnection Network [1]• Mode of Operation (Synchronous vs. Asynchronous)• Control Strategy (Centralized vs. Decentralized)• Switching Techniques (Packet switching vs. Circuit switching)• Topology (Static Vs. Dynamic) 17
  18. 18. Classification based on the kind of parallelism[3] Parallel architectures PAs Data-parallel architectures Function-parallel architectures Instruction-level Thread-level Process-level PAs PAs PAs DPs ILPS MIMDs Vector Associative SIMDs Systolic Pipelined VLIWs Superscalar Ditributed Shared and neural architecture processors processors memory memoryarchitecture architecture MIMD (multi- (multi-computer) Processors) 18
  19. 19. References• Advanced Computer Architecture and Parallel Processing, by Hesham El-Rewini and Mostafa Abd-El- Barr, John Wiley and Sons, 2005.• Advanced Computer Architecture Parallelism, Scalability, Programmability, by K. Hwang, McGraw-Hill 1993.• Advanced Computer Architectures – A Design Space Approach by Desco Sima, Terence Fountain and Peter Kascuk, Pearson, 1997. 19
  20. 20. Speedup• S = Speed(new) / Speed(old)• S = Work/time(new) / Work/time(old)• S = time(old) / time(new)• S = time(before improvement) / time(after improvement) 20
  21. 21. Speedup• Time (one CPU): T(1)• Time (n CPUs): T(n)• Speedup: S• S = T(1)/T(n) 21
  22. 22. Amdahl’s LawThe performance improvement to be gained from using some faster mode of execution is limited by the fraction of the time the faster mode can be used 22
  23. 23. Example 20 hours A B must walk 200 milesWalk 4 miles /hourBike 10 miles / hourCar-1 50 miles / hourCar-2 120 miles / hourCar-3 600 miles /hour 23
  24. 24. Example 20 hours A B must walk 200 milesWalk 4 miles /hour  50 + 20 = 70 hours S=1Bike 10 miles / hour  20 + 20 = 40 hours S = 1.8Car-1 50 miles / hour  4 + 20 = 24 hours S = 2.9Car-2 120 miles / hour  1.67 + 20 = 21.67 hours S = 3.2Car-3 600 miles /hour  0.33 + 20 = 20.33 hours S = 3.4 24
  25. 25. Amdahl’s Law (1967)∀ β : The fraction of the program that is naturally serial• (1- β): The fraction of the program that is naturally parallel 25
  26. 26. S = T(1)/T(N) T(1)(1- β )T(N) = T(1)β + N 1 NS= β+ (1- β ) = βN + (1- β ) N 26
  27. 27. Amdahl’s Law 27

×