Computer architecture

6,918 views

Published on

Published in: Technology, Business

Computer architecture

  1. 1. COMPUTER ARCHITECTURE
  2. 2. MICROPROCESSOR• IT IS ONE OF THE GREATESTACHIEVEMENTS OF THE 20THCENTURY.• IT USHERED IN THE ERA OF WIDESPREADCOMPUTERIZATION.
  3. 3. EARLY ARCHITECTURE• VON NEUMANN ARCHITECTURE, 1940• PROGRAM IS STORED IN MEMORY.• SEQUENTIAL OPERATION• ONE INST IS RETRIEVED AT A TIME, DECODEDAND EXECUTED• LIMITED SPEED
  4. 4. Conventional 32 bitMicroprocessors• Higher data throughput with 32 bit wide data bus• Larger direct addressing range• Higher clock frequencies and operating speeds as a resultof improvements in semiconductor technology• Higher processing speeds because larger registers requirefewer calls to memory and reg-to-reg transfers are 5 timesfaster than reg-to-memory transfers
  5. 5. Conventional 32 bitMicroprocessors• More insts and addressing modes to improve softwareefficiency• More registers to support High-level languages• More extensive memory management & coprocessorcapabilities• Cache memories and inst pipelines to increase processingspeed and reduce peak bus loads
  6. 6. Conventional 32 bitMicroprocessors• To construct a complete general purpose 32 bitmicroprocessor, five basic functions are necessary: ALU MMU FPU INTERRUPT CONTROLLER TIMING CONTROL
  7. 7. Conventional Architecture• KNOWN AS VON NEUMANN ARCHITECTURE• ITS MAIN FEATURES ARE: A SINGLE COMPUTING ELEMENT INCORPORATING APROCESSOR, COMM. PATH AND MEMORY A LINEAR ORGANIZATION OF FIXED SIZE MEMORYCELLS A LOW-LEVEL MACHINE LANGUAGE WITH INSTSPERFORMING SIMPLE OPERATIONS ON ELEMENTARYOPERANDS SEQUENTIAL, CENTRALIZED CONTROL OFCOMPUTATION
  8. 8. Conventional Architecture• Single processor configuration:PROCESSORMEMORY INPUT-OUTPUT
  9. 9. Conventional Architecture• Multiple processor configuration with a global bus:SYSTEMINPUT-OUTPUTGLOBALMEMORYPROCESSORSWITH LOCALMEM & I/O
  10. 10. Conventional Architecture• THE EARLY COMP ARCHITECTURES WEREDESIGNED FOR SCIENTIFIC AND COMMERCIALCALCULATIONS AND WERE DEVEPOED TOINCREASE THE SPEED OF EXECUTION OF SIMPLEINSTRUCTIONS.• TO MAKE THE COMPUTERS PERFORM MORECOMPLEX PROCESSES, MUCH MORE COMPLEXSOFTWARE WAS REQUIRED.
  11. 11. Conventional Architecture• ADVANCEMENT OF TECHNOLOGY ENHANCEDTHE SPEED OF EXECUTION OF PROCESSORS BUTA SINGLE COMM PATH HAD TO BE USED TOTRANSFER INSTS AND DATA BETWEEN THEPROCESSOR AND THE MEMORY.• MEMORY SIZE INCREASED. THE RESULT WASTHE DATA TRANSFER RATE ON THE MEMORYINTERFACE ACTED AS A SEVERE CONSTRAINTON THE PROCESSING SPEED.
  12. 12. Conventional Architecture• AS HIGHER SPEED MEMORY BECAMEAVAILABLE, THE DELAYS INTRODUCED BY THECAPACITANCE AND THE TRANSMISSION LINEDELAYS ON THE MEMORY BUS AND THEPROPAGATION DELAYS IN THE BUFFER ANDADDRESS DECODING CIRCUITRY BECAME MORESIGNIFICANT AND PLACED AN UPPER LIMIT ONTHE PROCESSING SPEED.
  13. 13. Conventional Architecture• THE USE OF MULTIPROCESSOR BUSES WITH THENEED FOR ARBITRATION BETWEEN COMPUTERSREQUESTING CONTROL OF THE BUS REDUCEDTHE PROBLEM BUT INTRODUCED SEVERAL WAITCYCLES WHILE THE DATA OR INST WEREFETCHED FROM MEMORY.• ONE METHOD OF INCREASING PROCESSINGSPEED AND DATA THROUGHPUT ON THEMEMORY BUS WAS TO INCREASE THE NUMBEROF PARALLEL BITS TRANSFERRED ON THE BUS.
  14. 14. Conventional Architecture• THE GLOBAL BUS AND THE GLOBAL MEMORYCAN ONLY SERVE ONE PROCESSOR AT A TIME.• AS MORE PROCESSORS ARE ADDED TOINCREASE THE PROCESSING SPEED, THE GLOBALBUS BOTTLENECK BECAME WORSE.• IF THE PROCESSING CONSISTS OF SEVERALINDEPENDENT TASKS, EACH PROC WILLCOMPETE FOR GLOBAL MEMORY ACCESS ANDGLOBAL BUS TRANSFER TIME.
  15. 15. Conventional Architecture• TYPICALLY, ONLY 3 OR 4 TIMES THE SPEED OF ASINGLE PROCESSOR CAN BE ACHIEVED INMULTIPROCESSOR SYSTEMS WITH GLOBAL MEMORYAND A GLOBAL BUS.• TO REDUCE THE EFFECT OF GLOBAL MEMORY BUS AS ABOTTLENECK, (1) THE LENGTH OF THE PIPE WASINCREASED SO THAT THE INST COULD BE BROKENDOWN INTO BASIC ELEMENTS TO BE MANIPULATEDSIMULTANEOUSLY AND (2) CACHE MEM WASINTRODUCED SO THAT INSTS AND/OR DATA COULD BEPREFETCHED FROM GLOBAL MEMORY AND STORED INHIGH SPEED LOCAL MEMORY.
  16. 16. PIPELINING• THE PROC. SEPARATES EACH INST INTO ITS BASICOPERATIONS AND USES DEDICATED EXECUTION UNITSFOR EACH TYPE OF OPERATION.• THE MOST BASIC FORM OF PIPELINING IS TO PREFETCHTHE NEXT INSTRUCTION WHILE SIMULTANEOULYEXECUTING THE PREVIOUS INSTRUCTION.• THIS MAKES USE OF THE BUS TIME WHICH WOULDOTHERWISE BE WASTED AND REDUCES INSTRUCTIONEXECUTION TIME.
  17. 17. PIPELINING• TO SHOW THE USE OF A PIPELINE, CONSIDER THEMULTIPLICATION OF 2 DECIMAL NOS: 3.8 X 102AND9.6X103. THE PROC. PERFORMS 3 OPERATIONS:• A: MULTIPLIES THE MANTISSA• B: ADDS THE EXPONENTS• C: NORMALISES THE RESULT TO PLACE THE DECIMALPOINT IN THE CORRECT POSITION.• IF 3 EXECUTION UNITS PERFORMED THESEOPERATIONS, OPS.A & B WOULD DO NOTHING WHILE CIS BEING PERFORMED.• IF A PIPELINE WERE IMPLEMENTED, THE NEXTNUMBER COULD BE PROCESSED IN EXECUTION UNITS AAND B WHILE C WAS BEING PERFORMED.
  18. 18. PIPELINING• TO GET A ROUGH INDICATION OF PERFORMANCEINCREASE THROUGH PIPELINE, THE STAGE EXECUTIONINTERVAL MAY BE TAKEN TO BE THE EXECUTION TIMEOF THE SLOWEST PIPELINE STAGE.• THE PERFORMANCE INCREASE FROM PIPELINING ISROUGHLY EQUAL TO THE SUM OF THE AVERAGEEXECUTION TIMES FOR ALL STAGES OF THE PIPELINE,DIVIDED BY THE AVERAGE VALUE OF THE EXECUTIONTIME OF THE SLOWEST PIPELINE STAGE FOR THE INSTMIX CONSIDERED.
  19. 19. PIPELINING• NON-SEQUENTIAL INSTS CAUSE THEINSTRUCTIONS BEHIND IN THE PIPELINE TO BEEMPTIED AND FILLING TO BE RESTARTED.• NON-SEQUENTIAL INSTS. TYPICALLY COMPRISE15 TO 30% OF INSTRUCTIONS AND THEY REDUCEPIPELINE PERFORMANCE BY A GREATERPERCENTAGE THAN THEIR PROBABILITY OFOCCURRENCE.
  20. 20. CACHE MEMORY• VON-NEUMANN SYSTEM PERFORMANCE ISCONSIDERABLY EFFECTED BY MEMORY ACCESS TIMEAND MEMORY BW (MAXIMUM MEMORY TRANSFERRATE).• THESE LIMITATIONS ARE SPECIALLY TIGHT FOR 32 BITPROCESSORS WITH HIGH CLOCK SPEEDS.• WHILE STATIC RAM WITH 25ns ACCESS TIMES ARECAPABLE OF KEEPING PACE WITH PROC SPEED, THEYMUST BE LOCATED ON THE SAME BOARD TO MINIMISEDELAYS, THUS LIMITING THE AMOUNT OF HIGH SPEEDMEMORY AVAILABLE.
  21. 21. CACHE MEMORY• DRAM HAS A GREATER CAPACITY PER CHIP ANDA LOWER COST, BUT EVEN THE FASTEST DRAMCAN’T KEEP PACE WITH THE PROCESSOR,PARTICULARLY WHEN IT IS LOCATED ON ASEPARATE BOARD ATTACHED TO A MEMORYBUS.• WHEN A PROC REQUIRES INST/DATA FROM/TOMEMORY, IT ENTERS A WAIT STATE UNTIL IT ISAVAILABLE. THIS REDUCES PROCESSORSPERFORMANCE.
  22. 22. CACHE MEMORY• CACHE ACTS AS A FAST LOCAL STORAGEBUFFER BETWEEN THE PROC AND THE MAINMEMORY.• OFF-CHIP BUT ON-BOARD CACHE MAY REQUIRESEVERAL MEMORY CYCLES WHEREAS ON-CHIPCACHE MAY ONLY REQUIRE ONE MEMORYCYCLE, BUT ON-BOARD CACHE CAN PREVENTTHE EXCESSIVE NO. OF WAIT STATES IMPOSEDBY MEMORY ON THE SYSTEM BUS AND ITREDUCES THE SYSTEM BUS LOAD.
  23. 23. CACHE MEMORY• THE COST OF IMPLEMENTING AN ON-BOARDCACHE IS MUCH LOWER THAN THE COST OFFASTER SYSTEM MEMORY REQUIRED TOACHIEVE THE SAME MEMORY PERFORMANCE.• CACHE PERFORMANCE DEPENDS ON ACCESSTIME AND HIT RATIO, WHICH IS DEPENDENT ONTHE SIZE OF THE CACHE AND THE NO. OF BYTESBROUGHT INTO CACHE ON ANY FETCH FROMTHE MAIN MEMORY (THE LINE SIZE).
  24. 24. CACHE MEMORY• INCREASING THE LINE SIZE INCREASES THECHANCE THAT THERE WILL BE A CACHE HIT ONTHE NEXT MEMORY REFERENCE.• IF A 4K BYTE CACHE WITH A 4 BYTE LINE SIZEHAS A HIT RATIO OF 80%, DOUBLING THE LINESIZE MIGHT INCREASE THE HIT RATIO TO 85%BUT DOUBLING THE LINE SIZE AGAIN MIGHTONLY INCREASE THE HIT RATIO TO 87%.
  25. 25. CACHE MEMORY• OVERALL MEMORY PERFORMANCE IS AFUNCTION OF CACHE ACCESS TIME, CACHE HITRATIO AND MAIN MEMORY ACCESS TIME FORCACHE MISSES.• A SYSTEM WITH 80% CACHE HIT RATIO AND 120nsCACHE ACCESS TIME ACCESSES MAIN MEMORY20% OF THE TIME WITH AN ACCESS TIME OF 600ns. THE AV ACCESS TIME IN ns WILL BE (0.8x120)+[0.2x(600 + 120)]= 240
  26. 26. CACHE DESIGN• PROCESSORS WITH DEMAND PAGED VIRTUALMEMORY SYSTEMS REQUIRE AN ASSOCIATIVECACHE.• VIRTUAL MEM SYSTEMS ORGANIZE ADDRESSESBY THE START ADDRESSES FOR EACH PAGE ANDAN OFFSET WHICH LOCATES THE DATA WITHINTHE PAGE.• AN ASSOCIATIVE CACHE ASSOCIATES THEOFFSET WITH THE PAGE ADDRESS TO FIND THEDATA NEEDED.
  27. 27. CACHE DESIGN• WHEN ACCESSED, THE CACHE CHECKS TO SEE IF ITCONTAINS THE PAGE ADDRESS (OR TAG FIELD); IF SO,IT ADDS THE OFFSET AND, IF A CACHE HIT ISDETECTED, THE DATA IS FETCHED IMMEDIATELYFROM THE CACHE.• PROBLEMS CAN OCCUR IN A SINGLE SET-ASSOCIATIVECACHE IF WORDS WITHIN DIFFERENT PAGES HAVE THESAME OFFSET.• TO MINIMISE THIS PROBLEM A 2-WAY SET-ASSOCIATIVE CACHE IS USED. THIS IS ABLE TOASSOCIATE MORE THAN ONE SET OF TAGS AT A TIMEALLOWING THE CACHE TO STORE THE SAME OFFSETFROM TWO DIFFERENT PAGES.
  28. 28. CACHE DESIGN• A FULLY ASSOCIATIVE CACHE ALLOWS ANYNUMBER OF PAGES TO USE THE CACHESIMULTANEOUSLY.• A CACHE REQUIRES A REPLACEMENTALGORITHM TO FIND REPLACEMENT CACHELINES WHEN A MISS OCCURS.• PROCESSORS THAT DO NOT USE DEMAND PAGEDVIRTUAL MEMORY, CAN EMPLOY A DIRECTMAPPED CACHE WHICH CORRESPONDS EXACTLYTO THE PAGE SIZE AND ALLOWS DATA FROMONLY ONE PAGE TO BE STORED AT A TIME.
  29. 29. MEMORY ARCHITECTURES• 32 BIT PROCESSORS HAVE INTRODUCED 3 NEWCONCEPTS IN THE WAY THE MEMORY ISINTERFACED:1. LOCAL MEMORY BUS EXTENSIONS2. MEMORY INTEREAVING3. VIRTUAL MEMORY MANAGEMENT
  30. 30. LOCAL MEM BUS EXTENSIONS• IT PERMITS LARGER LOCAL MEMORIES TO BECONNECTED WITHOUT THE DELAYS CAUSED BY BUSREQUESTS AND BUS ARBITRATION FOUND ONMULTIPROCESSOR BUSES.• IT HAS BEEN PROVIDED TO INCREASE THE SIZE OF THELOCAL MEMORY ABOVE THAT WHICH CAN BEACCOMODATED ON THE PROCESSOR BOARD.• BY OVERLAPPING THE LOCAL MEM BUS AND THESYSTEM BUS CYCLES IT IS POSSIBLE TO ACHIEVEHIGHER MEM ACCESS RATES FROM PROCESSORS WITHPIPELINES WHICH PERMIT THE ADDRESS OF THE NEXTMEMORY REFERENCE TO BE GENERATED WHILE THEPREVIOUS DATA WORD IS BEING FETCHED.
  31. 31. MEMORY INTERLEAVING• PIPELINED PROCESSORS WITH THE ABILITY TOGENERATE THE ADDRESS OF THE NEXT MEMORYREFERENCE WHILE FETCHING THE PREVIOUSDATA WORD WOULD BE SLOWED DOWN IF THEMEMORY WERE UNABLE TO BEGIN THE NEXTMEMORY ACCESS UNTIL THE PREVIOUS MEMCYCLE HAD BEEN COMPLETED.• THE SOLUTION IS TO USE TWO-WAY MEMORYINTERLEAVING. IT USES 2 MEM BOARDS- 1 FORODD ADDRESSES AND 1 FOR EVEN ADDRESSES.
  32. 32. MEMORY INTERLEAVING• ONE BOARD CAN BEGIN THE NEXT MEM CYCLEWHILE THE OTHER BOARD COMPLETES THEPREVIOUS CYCLE.• THE SPEED ADV IS GREATEST WHEN MULTIPLESEQUENTIAL MEM ACCESSES ARE REQUIRED FORBURST I/O TRANSFERS BY DMA.• DMA DEFINES A BLOCK TRANSFER IN TERMS OFA STARTING ADDRESS AND A WORD COUNT FORSEQUENTIAL MEM ACCESSES.
  33. 33. MEMORY INTERLEAVING• TWO WAY INTERLEAVING MAY NOT PREVENTMEM WAIT STATES FOR SOME FAST SIGNALPROCESSING APPLICATIONS AND SYSTEMS HAVEBEEN DESIGNED WITH 4 OR MORE WAYINTERLEAVING IN WHICH THE MEM BOARDS AREASSIGNED CONSECUTIVE ADDRESSES BY AMEMORY CONTROLLER.
  34. 34. Conventional Architecture• EVEN WITH THESE ENHANCEMENTS, THE SEQUENTIALVON NEUMANN ARCHITECTURE REACHED THE LIMITSIN PROCESSING SPEED BECAUSE THE SEQUENTIALFETCHING OF INSTS AND DATA THROUGH A COMMONMEMORY INTERFACE FORMED THE BOTTLENECK.• THUS, PARALLEL PROC ARCHITECTURES CAME INTOBEING WHICH PERMIT LARGE NUMBER OF COMPUTINGELEMENTS TO BE PROGRAMMED TO WORK TOGETHERSIMULTANEOUSLY. THE USEFULNESS OF PARALLELPROCESSOR DEPENDS UPON THE AVAILABILITY OFSUITABLE PARALLEL ALGORITHMS.
  35. 35. HOW TO INCREASE THE SYSTEM SPEED?1. USING FASTER COMPONENTS. COSTS MORE,DISSIPATE CONSIDERABLE HEAT.THE RATE OF GROWTH OF SPEED USINGBETTER TECHNOLOGY IS VERY SLOW. eg., IN80’S BASIC CLOCK RATE WAS 50 MHz ANDTODAY IT IS AROUND 2 GHz, DURING THISPERIOD SPEED OF COMPUTER IN SOLVINGINTENSIVE PROBLEMS HAS GONE UP BY AFACTOR OF 100,000. IT IS DUE TO THEINCREASED ARCHITECTURE.
  36. 36. HOW TO INCREASE THE SYSTEM SPEED?2. ARCHITECTURAL METHODS:A. USE PARALLELISM IN SINGLE PROCESSOR[ OVERLAPPING EXECUTION OF NO OF INSTS(PIPELINING)]B. OVERLAPPING OPERATION OF DIFFERENTUNITSC. INCREASE SPEED OF ALU BY EXPLOITINGDATA/TEMPORAL PARALLELISMD. USING NO OF INTERCONNECTED PROCESSORSTO WORK TOGETHER
  37. 37. PARALLEL COMPUTERS• THE IDEA EMERGED AT CIT IN 1981• A GROUP HEADED BY CHARLES SEITZ ANDGEOFFREY FOX BUILT A PARALLEL COMPUTERIN 1982• 16 NOS 8085 WERE CONNECTED IN A HYPERCUBECONFIGURATION• ADV WAS LOW COST PER MEGAFLOP
  38. 38. PARALLEL COMPUTERS• BESIDES HIGHER SPEED, OTHER FEATURES OFPARALLEL COMPUTERS ARE: BETTER SOLUTION QUALITY: WHEN ARITHMETICOPS ARE DISTRIBUTED, EACH PE DOES SMALLERNO OF OPS, THUS ROUNDING ERRORS AREREDUCED BETTER ALGORITHMS BETTER AND FASTER STORAGE GREATER RELIABILITY
  39. 39. CLASSIFICATION OF COMPUTERARCHITECTURE• FLYNN’S TAXONOMY: IT IS BASEDUPON HOW THE COMPUTERRELATES ITS INSTRUCTIONS TO THEDATA BEING PROCESSED –SISDSIMDMISDMIMD
  40. 40. FLYNN’S TAXONOMY• SISD: CONVENTIONAL VON-NEUMANN SYSTEM.CONTROLUNITPROCESSORINST STREAMDATASTREAM
  41. 41. FLYNN’S TAXONOMY• SIMD: IT HAS A SINGLE STREAM OF VECTORINSTS THAT INITIATE MANY OPERATIONS. EACHELEMENT OF A VECTOR IS REGARDED AS AMEMBER OF A SEPARATE DATA STREAM GIVINGMULTIPLE DATA STREAMS.CONTROLUNITINSTSTREAMPROCESSORPROCESSORPROCESSORDATA STREAM 1DATA STREAM 2DATA STREAM 3SYNCHRONOUSMULTIPROCESSOR
  42. 42. FLYNN’S TAXONOMY• MISD: NOT POSSIBLEC U 1C U 2PU 3PU 2C U 3PU 1INST STREAM 1INST STREAM 2INST STREAM 3DS
  43. 43. FLYNN’S TAXONOMY• MIMD: MULTIPROCESSOR CONFIGURATION ANDARRAY OF PROCESSORS.CU 1CU 2CU 3IS 1IS 2IS 3DS 1DS 2DS 3
  44. 44. FLYNN’S TAXONOMY• MIMD COMPUTERS COMPRISE OF INDEPENDENTCOMPUTERS, EACH WITH ITS OWN MEMORY,CAPABLE OF PERFORMING SEVERALOPERATIONS SIMULTANEOUSLY.• MIMD COMPS MAY COMPRISE OF A NUMBER OFSLAVE PROCESSORS WHICH MAY BEINDIVIUALLY CONNECTED TO MULTI-ACCESSGLOBAL MEMORY BY A SWITCHING MATRIXUNDER THE CONTROL OF MASTER PROCESSOR.
  45. 45. FLYNN’S TAXONOMY• THIS CLASSIFICATION IS TOO BROAD.• IT PUTS EVERYTHING EXCEPTMULTIPROCESSORS IN ONE CLASS.• IT DOES NOT REFLECT THE CONCURRENCYAVAILABLE THROUGH THE PIPELINEPROCESSING AND THUS PUTS VECTORCOMPUTERS IN SISD CLASS.
  46. 46. SHORE’S CLASSIFICATION• SHORE CLASSIFIED THE COMPUTERS ON THE BASISOF ORGANIZATION OF THE CONSTITUENT ELEMENTSOF THE COMPUTER.• SIX DIFFERENT KINDS OF MACHINES WERERECOGNIZED:1. CONVENTIONAL VON NEWMANN ARCHITECTUREWITH 1 CU, 1 PU, IM AND DM. A SINGLE DM READPRODUCES ALL BITS FOR PROCESSING BY PU. THE PUMAY CONTAIN MULTIPLE FUNCTIONAL UNITS WHICHMAY OR MAY NOT BE PIPELINED. SO, IT INCLUDESBOTH THE SCALAR COMPS (IBM 360/91, CDC7600)ANDPIPELINED VECTOR COMPUTERS (CRAY 1, CYBER 205)
  47. 47. SHORE’S CLASSIFICATION• TYPE I:IM CUHORIZONTAL PUWORD SLICE DMNOTE THAT THE PROCESSINGIS CHARACTERISED ASHORIZONTAL (NO OF BITS INPARALLEL AS A WORD)
  48. 48. SHORE’S CLASSIFICATION• MACHINE 2: SAME AS MACHINE 1 EXCEPT THATSM FETCHES A BIT SLICE FROM ALL THE WORDSIN THE MEMORY AND PU IS ORGANIZED TOPERFORM THE OPERATIONS IN A BIT SERIALMANNER ON ALL THE WORDS.• IF THE MEMORY IS REGARDED AS A 2D ARRAYOF BITS WITH ONE WORD STORED PER ROW,THEN THE MACHINE 2 READS VERTICAL SLICEOF BITS AND PROCESSES THE SAME, WHEREASTHE MACHINE 1 READS AND PROCESSESHORIZONTAL SLICE OF BITS. EX. MPP, ICL DAP
  49. 49. SHORE’S CLASSIFICATION• MACHINE 2:IMCUVERTICALPUBIT SLICEDM
  50. 50. SHORE’S CLASSIFICATION• MACHINE 3: COMBINATION OF 1 AND 2.• IT COULD BE CHARACTERISED HAVING AMEMORY AS AN ARRAY OF BITS WITH BOTHHORIZONTAL AND VERTICAL READING ANDPROCESSING POSSIBLE.• SO, IT WILL HAVE BOTH VERTICAL ANDHORIZONTAL PROCESSING UNITS.• EXAMPLE IS OMENN 60 (1973)
  51. 51. SHORE’S CLASSIFICATION• MACHINE 3:IMCUVERTICAL PUHORIZONTAL PUDM
  52. 52. SHORE’S CLASSIFICATION• MACHINE 4: IT IS OBTAINED BY REPLICATING THE PUAND DM OF MACHINE 1.• AN ENSEMBLE OF PU AND DM IS CALLED PROCESSINGELEMENT (PE).• THE INSTS ARE ISSUED TO THE PEs BY A SINGLE CU. PEsCOMMUNICATE ONLY THROUGH CU.• ABSENCE OF COMM BETWEEN PEs LIMITS ITSAPPLICABILITY• EX: PEPE(1976)
  53. 53. SHORE’S CLASSIFICATION• MACHINE 4:IMCUPU PU PUDM DM DM
  54. 54. SHORE’S CLASSIFICATION• MACHINE 5: SIMILAR TO MACHINE 4 WITH THEADDITION OF COMMUNICATION BETWEENPE.EXAMPLE: ILLIAC IVIMCUPU PU PUDM DM DM
  55. 55. SHORE’S CLASSIFICATION• MACHINE 6:• MACHINES 1 TO 5 MAINTAIN SEPARATIONBETWEEN DM AND PU WITH SOME DATA BUS ORCONNECTION UNIT PROVIDING THECOMMUNICATION BETWEEN THEM.• MACHINE 6 INCLUDES THE LOGIC IN MEMORYITSELF AND IS CALLED ASSOCIATIVEPROCESSOR.• MACHINES BASED ON SUCH ARCHITECTURESSPAN A RANGE FROM SIMPLE ASSOCIATIVEMEMORIES TO COMPLEX ASSOCIATIVE PROCS.
  56. 56. SHORE’S CLASSIFICATION• MACHINE 6:IMCUPU + DM
  57. 57. FENG’S CLASSIFICATION• FENG PROPOSED A SCHEME ON THE BASIS OFDEGREE OF PARALLELISM TO CLASSIFYCOMPUTER ARCHITECTURE.• MAXIMUM NO OF BITS THAT CAN BE PROCESSEDEVERY UNIT OF TIME BY THE SYSTEM IS CALLED“MAXIMUM DEGREE OF PARALLELISM”
  58. 58. FENG’S CLASSIFICATION• BASED ON FENG’S SCHEME, WE HAVESEQUENTIAL AND PARALLEL OPERATIONS ATBIT AND WORD LEVELS TO PRODUCE THEFOLLOWING CLASSIFICATION: WSBS NO CONCEIVABLE IMPLEMENTATION WPBS STARAN WSBP CONVENTIONAL COMPUTERS WPBP ILLIAC IVo THE MAX DEGREE OF PARALLELISM IS GIVEN BYTHE PRODUCT OF THE NO OF BITS IN THE WORDAND NO OF WORDS PROCESSED IN PARALLEL
  59. 59. HANDLER’S CLASSIFICATION• FENG’S SCHEME, WHILE INDICATING THEDEGREE OF PARALLELISM DOES NOT ACCOUNTFOR THE CONCURRENCY HANDLED BY THEPIPELINED DESIGNS.• HANDLER’S SCHEME ALLOWS THE PIPELININGTO BE SPECIFIED.• IT ALLOWS THE IDENTIFICATION OFPARALLELISM AND DEGREE OF PIPELININGBUILT IN THE HARDWARE STRUCTURE
  60. 60. HANDLER’S CLASSIFICATION• HANDLER DEFINED SOME OF THE TERMS AS:• PCU – PROCESSOR CONTROL UNITS• ALU – ARITHMETIC LOGIC UNIT• BLC – BIT LEVEL CIRCUITS• PE – PROCESSING ELEMENTS• A COMPUTING SYSTEM C CAN THEN BE CHARATERISEDBY A TRIPLE AS T(C) = (KxK, DxD, WxW)• WHERE K=NO OF PCU, K= NO OF PROCESSORS THATARE PIPELINED, D=NO OF ALU,D= NO OF PIPELINEDALU, W=WORDLENGTH OF ALU OR PE AND W= NO OFPIPELINE STAGES IN ALU OR PE
  61. 61. COMPUTER PROGRAM ORGANIZATION• BROADLY, THEY MAY BE CLASSIFIED AS:CONTROL FLOW PROGRAMORGANIZATIONDATAFLOW PROGRAM ORGANIZATIONREDUCTION PROGRAM ORGANIATION
  62. 62. COMPUTER PROGRAM ORGANIZATION• IT USES EXPLICIT FLOWS OF CONTROL INFO TOCAUSE THE EXECUTION OF INSTS.• DATAFLOW COMPS USE THE AVAILABILITY OFOPERANDS TO TRIGGER THE EXECUTION OFOPERATIONS.• REDUCTION COMPUTERS USE THE NEED FOR ARESULT TO TRIGGER THE OPERATION WHICHWILL GENERATE THE REQUIRED RESULT.
  63. 63. COMPUTER PROGRAM ORGANIZATION• THE THREE BASIC FORMS OF COMP PROGRAMORGANIZATION MAY BE DESCRIBED IN TERMSOF THEIR DATA MECHANISM (WHICH DEFINESTHE WAY A PERTICULAR ARGUMENT IS USED BYA NUMBER OF INSTRUCTIONS) AND THECONTROL MECHANISM (WHICH DEFINES HOWONE INST CAUSES THE EXECUTION OF ONE ORMORE OTHER INSTS AND THE RESULTINGCONTROL PATTERN).
  64. 64. COMPUTER PROGRAM ORGANIZATION• CONTROL FLOW PROCESSORS HAVE A “BYREFERENCE” DATA MECHANISM (WHICH USESREFERENCES EMBEDDED IN THE INSTS BEINGEXECUTED TO ACCESS THE CONTENTS OF THESHARED MEMORY) AND TYPICALLY A‘SEQUENTIAL’ CONTROL MECHANISM ( WHICHPASSES A SINGLE THREAD OF CONTROL FROMINSTRUCTION TO INSTRUCTION).
  65. 65. COMPUTER PROGRAM ORGANIZATION• DATAFLOW COMPUTERS HAVE A “BY VALUE”DATA MECHANISM (WHICH GENERATES ANARGUMENT AT RUN-TIME WHICH IS REPLICATEDAND GIVEN TO EACH ACCESSING INSTRUCTIONFOR STORAGE AS A VALUE) AND A ‘PARALLEL’CONTROL MECHANISM.• BOTH MECHANISMS ARE SUPPORTED BY DATATOKENS WHICH CONVEY DATA FROM PRODUCERTO CONSUMER INSTRUCTIONS AND CONTRIBUTETO THE ACTIVATION OF CONSUMER INSTS.
  66. 66. COMPUTER PROGRAM ORGANIZATION• TWO BASIC TYPES OF REDUCTION PROGRAMORGANIZATIONS HAVE BEEN DEVELOPED:A. STRING REDUCTION WHICH HAS A ‘BY VALUE’DATA MECHANISM AND HAS ADVANTAGESWHEN MANIPULATING SIMPLE EXPRESSIONS.B. GRAPH REDUCTION WHICH HAS A ‘BYREFERENCE’ DATA MECHANISM AND HASADVANTAGES WHEN LARGER STRUCTURESARE INVOLVED.
  67. 67. COMPUTER PROGRAM ORGANIZATION• CONTROL-FLOW AND DATA-FLOW PROGRAMSARE BUILT FROM FIXED SIZE PRIMITIVE INSTSWITH HIGHER LEVEL PROGRAMS CONSTRUCTEDFROM SEQUENCES OF THESE PRIMITIVEINSTRUCTIONS AND CONTROL OPERATIONS.• REDUCTION PROGRAMS ARE BUILT FROM HIGHLEVEL PROGRAM STRUCTURES WITHOUT THENEED FOR CONTROL OPERATORS.
  68. 68. COMPUTER PROGRAM ORGANIZATION• THE RELATIONSHIP OF THEDATA AND CONTROLMECHANISMS TO THE BASIC COMPUTERPROGRAM ORGANIZATIONS CAN BE SHOWN ASUNDER:DATA MECHANISMBY VALUE BY REFERENCECONTROL MECHANISMSEQUENTIAL VON-NEUMANN CON.FLOWPARALLEL DATA FLOW PARALLEL CONTROL FLOWRECURSIVE STRING REDUCTION GRAPH REDUCTION
  69. 69. MACHINE ORGANIZATION• MACHINE ORGANIZATION CAN BE CLASSIFIEDAS FOLLOWS: CENTRALIZED: CONSISTING OF A SINGLEPROCESSOR, COMM PATH AND MEMORY. ASINGLE ACTIVE INST PASSES EXECUTION TO ASPECIFIC SUCCESSOR INSTRUCTION.o TRADITIONAL VON-NEUMANN PROCESSORSHAVE CENTRALIZED MACHINE ORGANIZATIONAND A CONTROL FLOW PROGRAMORGANIZATION.
  70. 70. MACHINE ORGANIZATION PACKET COMMUNICATION: USING A CIRCULARINST EXECUTION PIPELINE IN WHICHPROCESSORS, COMMUNICATIONS ANDMEMORIES ARE LINKED BY POOLS OF WORK.o NEC 7281 HAS A PACKET COMMUNICATIONMACHINE ORGANIZATION AND DATAFLOWPROGRAM ORGANIZATION.
  71. 71. MACHINE ORGANIZATION• EXPRESSION MANIPULATION WHICH USES IDENTICALRESOURCES IN A REGULAR STRUCTURE, EACHRESOURCE CONTAINING A PROCESSOR,COMMUNICATION AND MEMORY. THE PROGRAMCONSISTS OF ONE LARGE STRUCTURE, PARTS OFWHICH ARE ACTIVE WHILE OTHER PARTS ARETEMPORARILY SUSPENDED.• AN EXPRESSION MANIPULATION MACHINE MAY BECONSTRUCTED FROM A REGULAR STRUCTURE OF T414TRANSPUTERS, EACH CONTAINING A VON-NEUMANNPROCESSOR, MEMORY AND COMMUNICATION LINKS.
  72. 72. MULTIPROCESSING SYSTEMS• IT MAKES USE OF SEVERAL PROCESSORS, EACHOBEYING ITS OWN INSTS, USUALLYCOMMUNICATING VIA A COMMON MEMORY.• ONE WAY OF CLASSIFYING THESE SYSTEMS ISBY THEIR DEGREE OF COUPLING.• TIGHTLY COUPLED SYSTEMS HAVE PROCESSORSINTERCONNECTED BY A MULTIPROCESSORSYSTEM BUS WHICH BECOMES A PERFORMANCEBOTTLENECK.
  73. 73. MULTIPROCESSING SYSTEMS• INTERCONNECTION BY A SHARED MEMORY IS LESSTIGHTLY COUPLED AND A MULTIPORT MEMORY MAYBE USED TO REDUCE THE BUS BOTTLENECK.• THE USE OF SEVERAL AUTONOMOUS SYSTEMS, EACHWITH ITS OWN OS, IN A CLUSTER IS MORE LOOSELYCOUPLED.• THE USE OF NETWORK TO INTERCONNECT SYSTEMS,USING COMM SOFTWARE, IS THE MOST LOOSELYCOUPLED ALTERNATIVE.
  74. 74. MULTIPROCESSING SYSTEMS• DEGREE OF COUPLING:NETWORKSWNETWORKSWNETWORK LINKOS OSCLUSTER LINKSYSTEMMEMORYSYSTEMMEMORYSYSTEM BUSCPU CPUMULTIPROCESSOR BUS
  75. 75. MULTIPROCESSING SYSTEMS• MULTIPROCESSORS MAY ALSO BE CLASSIFIEDAS AUTOCRATIC OR EGALITARIAN.• AUTOCRATIC CONTROL IS SHOWN WHERE AMASTER-SLAVE RELATIONSHIP EXISTS BETWEENTHE PROCESSORS.• EGALITARIAN CONTROL GIVES ALL PROCESSORSEQUAL CONTROL OF SHARED BUS ACCESS.
  76. 76. MULTIPROCESSING SYSTEMS• MULTIPROCESSING SYSTEMS WITH SEPARATEPROCESSORS AND MEMORIES MAY BECLASSIFIED AS ‘DANCE HALL’ CONFIGURATIONSIN WHICH THE PROCESSORS ARE LINED UP ONONE SIDE WITH THE MEMORIES FACING THEM.• CROSS CONNECTIONS ARE MADE BY ASWITCHING NETWORK.
  77. 77. MULTIPROCESSING SYSTEMS• DANCE HALL CONFIGURATION:CPU 1CPU 2CPU 3CPU 4SWITCHINGNETWORKMEM 1MEM 2MEM 3MEM 4
  78. 78. MULTIPROCESSING SYSTEMS• ANOTHER CONFIGURATION IS ‘BOUDOIR CONFIG’ IN WHICH EACHPROCESSOR IS CLOSELY COUPLED WITHITS OWN MEMORY AND ANETWORK OF SWITCHES IS USED TO LINK THE PROCESSOR-MEMORYPAIRS.CPU 1MEM 1CPU 2MEM 2CPU 3MEM 3CPU 4MEM 4SWITCHINGNETWORK
  79. 79. MULTIPROCESSING SYSTEMS• ANOTHER TERM, WHICH IS USED TO DESCRIBE AFORM OF PARALLEL COMPUTING ISCONCURRENCY.• IT DENOTES INDEPENDENT, AYNCHRONOUSOPERATION OF A COLLECTION OF PARALLELCOMPUTING DEVICES RATHER THAN THESYNCHRONOUS OPERATION OF DEVICES IN AMULTIPROCESSOR SYSTEM.
  80. 80. SYSTOLIC ARRAYS• IT MAY BE TERMED AS MISD SYSTEM.• IT IS A REGULAR ARRAY OF PROCESSING ELEMENTS,EACH COMMUNICATING WITH ITS NEARESTNEIGHBOURS AND OPERATING SYNCHRONOUSLYUNDER THE CONTROL OF A COMMON CLOCK WITH ARATE LIMITED BY THE SLOWEST PROCESSOR IN THEARRAY.• THE ERM SYSTOLIC IS DERIVED FROM THR RHYTHMICCONTRACTION OF THE HEART, ANALOGOUS TO THERHYTHMIC PUMPING OF DATA THROUGH AN ARRAY OFPROCESSING ELEMENTS.
  81. 81. WAVEFRONT ARRAY• IT IS A REGULAR ARRAY OF PROCESSING ELEMENTS,EACH COMMUNICATING WITH ITS NEARESTNEIGHBOURS BUT OPERATING WITH NO GLOBALCLOCK.• IT EXHIBITS CONCURRENCY AND IS DATA DRIVEN.• THE OPERATION OF EACH PROCESSOR IS CONTROLLEDLOCALLY AND IS ACTIVATED BY THE ARRIVAL OF DATAAFTER ITS PREVIOUS OUTPUT HAS BEEN DELIVERED TOTHE APPROPRIATE NEIGHBOURING PROCESSOR.
  82. 82. WAVEFRONT ARRAY• PROCESSING WAVEFRONTS DEVELOP ACROSSTHE ARRAY AS PROCESSORS PASS ON THEOUTPUT DATA TO THEIR NEIGHBOUR. HENCETHE NAME.
  83. 83. GRANULARITY OF PARALLELISM• PARALLEL PROCESSING EMPHASIZES THE USE OFSEVERAL PROCESSING ELEMENTS WITH THEMAIN OBJECTIVE OF GAINING SPEED INCARRYING OUT A TIME CONSUMING COMPUTINGJOB• A MULTI-TASKING OS EXECUTES JOBCONCURRENTLY BUT THE OBJECTIVE IS TOEFFECT THE CONTINUED PROGRESS OF ALL THETASKS BY SHARING THE RESOURCES IN ANORDERLY MANNER.
  84. 84. GRANULARITY OF PARALLELISM• THE PARALLEL PROCESSING EMPHASIZES THEEXPLOITATION OF CONCURRENCY AVAILABLEIN A PROBLEM FOR CARRYING OUT THECOMPUTATION BY EMPLOYING MORE THAN ONEPROCESSOR TO ACHIEVE BETTER SPEED AND/ORTHROUGHPUT.• THE CONCURRENCY IN THE COMPUTINGPROCESS COULD BE LOOKED UPON FORPARALLEL PROCESSING AT VARIOUS LEVELS(GRANULARITY OF PARALLELISM) IN THESYSTEM.
  85. 85. GRANULARITY OF PARALLELISM• THE FOLLOWING GRANULARITIES OF PARALLELISMMAY BE IDENTIFIED IN ANY EXISTING SYSTEM:o PROGRAM LEVEL PARALLELISMo PROCESS OR TASK LEVEL PARALLELISMo PARALLELISM AT THE LEVEL OF GROUP OFSTATEMENTSo STATEMENT LEVEL PARALLELISMo PARALLELISM WITHIN A STATEMENTo INSTRUCTION LEVEL PARALLELISMo PARALLELISM WITHIN AN INSTRUCTIONo LOGIC AND CIRCUIT LEVEL PARALLELISM
  86. 86. GRANULARITY OF PARALLELISM• THE GRANULARITIES ARE LISTED IN THEINCREASING DEGREE OF FINENESS.• GRANULARITIES AT LEVELS 1,2 AND 3 CAN BEEASILY IMPLEMENTED ON A CONVENTIONALMULTIPROCESSOR SYSTEM.• MOST MULTI-TASKING OS ALLOW CREATIONAND SCHEDULING OF PROCESSES ON THEAVAILABLE RESOURCES.
  87. 87. GRANULARITY OF PARALLELISM• SINCE A PROCESS REPRESENTS A SIZABLE CODE INTERMS OF EXECUTION TIME, THE OVERLOADS INEXPLOITING THE PARALLELISM AT THESEGRANULARITIES ARE NOT EXCESSIVE.• IF THE SAME PRINCIPLE IS APPLIED TO THE NEXT FEWLEVELS, INCREASED SCHEDULING OVERHEADS MAYNOT WARRANT PARALLEL EXECUTION• IT IS SO BECAUSE THE UNIT OF WORK OF A MULTI-PROCESSOR IS CURRENTLY MODELLED AT THE LEVELOF A PROCESS OR TASK AND IS REASONABLYSUPPORTED ON THE CURRENT ARCHITECTURES.
  88. 88. GRANULARITY OF PARALLELISM• THE LAST THREE LEVELS ARE BEST HANDLED BYHARDWARE. SEVERAL MACHINES HAVE BEEN BUILT TOPROVIDE THE FINE GRAIN PARALLELISM IN VARYINGDEGREES.• A MACHINE HAVING INST LEVEL PARALLELISMEXECUTES SEVERAL INSTS SIMULTANEOUSLY.EXAMPLES ARE PIPELINE INST PROCESSORS,SYNCHRONOUS ARRAY PROCESSORS, ETC.• CIRCUIT LEVEL PARALLELISM EXISTS IN MOSTMACHINES IN THE FORM OF PROCESSING MULTIPLEBITS/BYTES SIMULTANEOUSLY.
  89. 89. PARALLEL ARCHITECTURES• THERE ARE NUMEROUS ARCHITECTURES THAT HAVEBEEN USED IN THE DESIGN OF HIGH SPEED COMPUTERS.IT FALLS BASICALLY INTO 2 CLASSES: GENERAL PURPOSE & SPECIAL PURPOSEo GENERAL PURPOSE ARCHITECTURES ARE DESIGNED TOPROVIDE THE RATED SPEEDS AND OTHER COMPUTINGREQUIREMENTS FOR VARIETY OF PROBLEMS WITHSAME PERFORMANCE.
  90. 90. PARALLEL ARCHITECTURES• THE IMPORTANT ARCHITECTURAL IDEAS BEINGUSED IN DESIGNING GEN PURPOSE HIGH SPEEDCOMPUTERS ARE: PIPELINED ARCHITECTURES ASYNCHRONOUS MULTI-PROCESSORS DATA-FLOW COMPUTERS
  91. 91. PARALLEL ARCHITECTURES• THE SPECIAL PURPOSE MACHINES HAVE TO EXCEL FORWHAT THEY HAVE BEEN DESIGNED. IT MAY OR MAYNOT DO SO FOR OTHER APPLICATIONS. SOME OF THEIMPORTANT ARCHITECTURAL IDEAS FOR DEDICATEDCOMPUTERS ARE: SYNCHRONOUS MULTI-PROCESSORS(ARRAYPROCESSOR) SYSTOLIC ARRAYS NEURAL NETWORKS
  92. 92. ARRAY PROCESSORS• IT CONSISTS OF SEVERAL PE, ALL OF WHICH EXECUTETHE SAME INST ON DIFFERENT DATA.• THE INSTS ARE FETCHED AND BROADCAST TO ALL THEPE BY A COMMON CU.• THE PE EXECUTE INSTS ON DATA RESIDING IN THEIROWN MEMORY.• THE PE ARE LINKED VIA AN INTERCONNECTIONNETWORK TO CARRY OUT DATA COMMUNICATIONBETWEEN THEM.
  93. 93. ARRAY PROCESSORS• THERE ARE SEVERAL WAYS OF CONNECTING PE• THESE MACHINES REQUIRE SPECIALPROGRAMMING EFFORTS TO ACHIEVE THESPEED ADVANTAGE• THE COMPUTATIONS ARE CARRIED OUTSYNCHRONOUSLY BY THE HW AND THEREFORESYNC IS NOT AN EXPLICIT PROBLEM
  94. 94. ARRAY PROCESSORS• USING AN INTERCONNECTION NETWORK:PE1 PE2 PE3 PE4 PEnCU ANDSCALARPROCESSORINTERCONNECTIONNETWORK
  95. 95. ARRAY PROCESSORS• USING AN ALIGNMENT NETWORK:PE0 PE1 PE2 PEnALIGNMENT NETWORKMEM O MEM1 MEM2 MEMKCONTROLUNIT ANDSCALARPROCESSOR
  96. 96. CONVENTIONAL MULTI-PROCESSORS• ASYNCHRONOUS MULTIPROCESSORS• BASED ON MULTIPLE CPUs AND MEM BANKSCONNECTED THROUGH EITHER A BUS ORCONNECTION NETWORK IS A COMMONLY USEDTECHNIQUE TO PROVIDE INCREASEDTHROUGHPUT AND/OR RESPONSE TIME IN AGENERAL PURPOSE COMPUTING ENVIRONMENT.
  97. 97. CONVENTIONAL MULTI-PROCESSORS• IN SUCH SYSTEMS, EACH CPU OPERATESINDEPENDENTLY ON THE QUANTUM OF WORK GIVENTO IT• IT HAS BEEN HIGHLY SUCCESSFUL IN PROVIDINGINCREASED THROUGHPUT AND/OR RESPONSE TIME INTIME SHARED SYSTEMS.• EFFECTIVE REDUCTION OF THE EXECUTION TIME OF AGIVEN JOB REQUIRES THE JOB TO BE BROKEN INTOSUB-JOBS THAT ARE TO BE HANDLED SEPARATELY BYTHE AVAILABLE PHYSICAL PROCESSORS.
  98. 98. CONVENTIONAL MULTI-PROCESSORS• IT WORKS WELL FOR TASKS RUNNING MORE ORLESS INDEPENDENTLY ie., FOR TASKS HAVINGLOW COMMUNICATION AND SYNCHRONIZATIONREQUIREMENTS.• COMM AND SYNC IS IMPLEMENTED EITHERTHROUGH THE SHARED MEMORY OR BYMESSAGE SYSTEM OR THROUGH THE HYBRIDAPPROACH.
  99. 99. CONVENTIONAL MULTI-PROCESSORS• SHARED MEMORY ARCHITECTURE:MEMORYCPU CPU CPUCOMMON BUS ARCHITECTUREMEM0 MEM1 MEMnPROCESSOR MEMORY SWITCHCPU CPU CPUSWITCH BASEDMULTIPROCESSOR
  100. 100. CONVENTIONAL MULTI-PROCESSORS• MESSAGE BASED ARCHITECTURE:………………PE 1 PE 2 PE nCONNECTION NETWORK
  101. 101. CONVENTIONAL MULTI-PROCESSORS• HYBRID ARCHITECTURE:PE 1 PE 2 PE nCONNECTION NETWORKMEM 1 MEM 2 MEM k
  102. 102. CONVENTIONAL MULTI-PROCESSORS• ON A SINGLE BUS SYSTEM, THERE IS A LIMIT ON THENUMBER OF PROCESSORS THAT CAN BE OPERATED INPARALLEL.• IT IS USUALLY OF THE ORDER OF 10.• COMM NETWORK HAS THE ADVANTAGE THAT THE NOOF PROCESSORS CAN GROW WITHOUT LIMIT, BUT THECONNECTION AND COMM COST MAY DOMINATE ANDTHUS SATURATE THE PERFORMANCE GAIN.• DUE TO THIS REASON, HYBRID APPROACH MAY BEFOLLOWED• MANY SYSTEMS USE A COMMON BUS ARCH FORGLOBAL MEM, DISK AND I/O WHILE THE PROC MEMTRAFFIC IS HANDLED BY SEPARATE BUS.
  103. 103. DATA FLOW COMPUTERS• A NEW FINE GRAIN PARALLEL PROCESSINGAPPROACH BASED ON DATAFLOW COMPUTINGMODEL HAS BEEN SUGGESTED BY JACK DENNISIN 1975.• HERE, A NO OF DATA FLOW OPERATORS, EACHCAPABLE OF DOING AN OPERATION AREEMPLOYED.• A PROGRAM FOR SUCH A MACHINE IS ACONNECTION GRAPH OF THE OPERATORS.
  104. 104. DATA FLOW COMPUTERS• THE OPERATORS FORM THE NODES OF THEGRAPH WHILE THE ARCS REPRESENT THE DATAMOVEMENT BETWEEN NODES.• AN ARC IS LABELED WITH A TOKEN TO INDICATETHAT IT CONTAINS THE DATA.• A TOKEN IS GENERATED ON THE OUTPUT OF ANODE WHEN IT COMPUTES THE FUNCTIONBASED ON THE DATA ON ITS INPUT ARCS.
  105. 105. DATA FLOW COMPUTERS• THIS IS KNOWN AS FIRING OF THE NODE.• A NODE CAN FIRE ONLY WHEN ALL OF ITS INPUTARCS HAVE TOKENS AND THERE IS NO TOKENON THE OUTPUT ARC.• WHEN A NODE FIRES, IT REMOVES THE INPUTTOKENS TO SHOW THAT THE DATA HAS BEENCONSUMED.• USUALLY, COMPUTATION STARTS WITHARRIVAL OF DATA ON THE INPUT NODES OF THEGRAPH.
  106. 106. DATA FLOW COMPUTERS• DATA FLOW GRAPH FOR THE COMPUTATION:• A = 5 + C – D5CD+ -COMPUTATION PROGRESSESAS PER DATA AVAILABILITY
  107. 107. DATA FLOW COMPUTERS• MANY CONVENTIONAL MACHINES EMPLOYINGMULTIPLE FUNCTIONAL UNITS EMPLOY THE DATAFLOW MODEL FOR SCHEDULING THE FUNCTIONALUNITS.• EXAMPLE EXPERIMENTAL MACHINES AREMANCHESTER MACHINE (1984) AND MIT MACHINE.• THE DATA FLOW COMPUTERS PROVIDE FINEGRANULARITY OF PARALLEL PROCESSING, SINCE THEDATA FLOW OPERATORS ARE TYPICALLYELEMENTARY ARITHMETIC AND LOGIC OPERATORS.
  108. 108. DATA FLOW COMPUTERS• IT MAY PROVIDE AN EFFECTIVE SOLUTION FOR USINGVERY LARGE NUMBER OF COMPUTING ELEMENTS INPARALLEL.• WITH ITS ASYNCHRONOUS DATA DRIVEN CONTROL, ITHAS A PROMISE FOR EXPLOITATION OF THEPARALLELISM AVAILABLE BOTH IN THE PROBLEM ANDTHE MACHINE.• CURRENT IMPLEMENTATIONS ARE NO BETTER THANCONVENTIONAL PIPELINED MACHINES EMPLOYINGMULTIPLE FUNCTIONAL UNITS.
  109. 109. SYSTOLIC ARCHITECTURES• THE ADVENT OF VLSI HAS MADE IT POSSIBLE TODEVELOP SPECIAL ARCHITECTURES SUITABLEFOR DIRECT IMPLEMENTATION IN VLSI.• SYSTOLIC ARCHITECTURES ARE BASICALLYPIPELINES OPERATING IN ONE OR MOREDIMENSIONS.• THE NAME SYSTOLIC HAS BEEN DERIVED FROMTHE ANALOGY OF THE OPERATION OF BLOODCIRCULATION SYSTEM THROUGH THE HEART.
  110. 110. SYSTOLIC ARCHITECTURES• CONVENTIONAL ARCHITECTURES OPERATE ONTHE DATA USING LOAD AND STORE OPERATIONSFROM THE MEMORY.• PROCESSING USUALLY INVOLVES SEVERALOPERATIONS.• EACH OPERATION ACCESSES THE MEMORY FORDATA, PROCESSES IT AND THEN STORES THERESULT. THIS REQUIRES A NO OF MEMREFERENCES.
  111. 111. SYSTOLIC ARCHITECTURES• CONVENTIONAL PROCESSING:MEMORYF1 F2 FnMEMORYF1 F2 Fn• SYSTOLIC PROCESSING
  112. 112. SYSTOLIC ARCHITECTURES• IN SYSTOLIC PROCESSING, DATA TO BE PROCESSEDFLOWS THROUGH VARIOUS OPERATION STAGES ANDTHEN FINALLY IT IS PUT IN THE MEMORY.• SUCH AN ARCHITECTURE CAN PROVIDE BERY HIGHCOMPUTING THROUGHPUT DUE TO REGULARDATAFLOW AND PIPELINE OPERATION.• IT MAY BE USEFUL IN DESIGNING SPECIAL PROCESSORSFOR GRAPHIC, SIGNAL & IMAGE PROCESSING.
  113. 113. PERFORMANCE OFPARALLEL COMPUTERS• AN IMPORTANT MEASURE OF PARALLELARCHITECTURE IS SPEEDUP.• LET n = NO. OF PROCESSORS; Ts = SINGLE PROC.EXEC TIME; Tn = N PROC. EXEC. TIME,• THEN• SPEEDUP S = Ts/Tn
  114. 114. AMDAHL’S LAW• 1967• BASED ON A VERY SIMPLE OBSERVATION.• A PROGRAM REQUIRING TOTAL TIME T FORSEQUENTIAL EXECUTION SHALL HAVE SOMEPART WHICH IS INHERENTLY SEQUENTIAL.• IN TERMS OF TOTAL TIME TAKEN TO SOLVE THEPROBLEM, THIS FRACTION OF COMPUTING TIMEIS AN IMPORTANT PARAMETER.
  115. 115. AMDAHL’S LAW• LET f = SEQ. FRACTION FOR A GIVEN PROGRAM.• AMDAHL’S LAW STATES THAT THE SPEED UP OFA PARALLEL COMPUTER IS LIMITED BYS <= 1/[f + (1 – f )/n] SO, IT SAYS THAT WHILE DESIGNING APARALLEL COMP, CONNECT SMALL NO OFEXTREMELY POWERFUL PROCS AND LARGE NOOF INEXPENSIVE PROCS.
  116. 116. AMDAHL’S LAW• CONSIDER TWO PARALLEL COMPS. Me AND Mi.Me IS BUILT USING POWERFUL PROCS. CAPABLEOF EXECUTING AT A SPEED OF M MEGAFLOPS.• THE COMP Mi IS BUILT USING CHEAP PROCS.AND EACH PROC. OF Mi EXECUTES r.MMEGAFLOPS, WHERE 0 < r < 1• IF THE MACHINE Me ATTEMPTS A COMPUTATIONWHOSE INHERENTLY SEQ.FRACTION f > r THENMi WILL EXECUTE COMPS. MORE SLOWLY THANA SINGLE PROC. OF Mi.
  117. 117. AMDAHL’S LAW• PROOF:LET W = TOTAL WORK; M = SPEED OF Mi (IN Mflops)R.m = SPEED OF PE OF Ma; f.W = SEQ WORK OF JOB;T(Ma) = TIME TAKEN BY Ma FOR THE WORK W,T(Mi) = TIME TAKEN BY Mi FOR THE WORK W, THENTIME TAKEN BY ANY COMP =T = AMOUNT OF WORK/SPEED
  118. 118. AMDAHL’S LAW• T(Ma) = TIME FOR SEQ PART + TIME FORPARALLEL PART= ((f.W)/(r.M)) + [((1-f).W/n)/(r.M)] = (W/M).(f/r) IF n ISINFINITELY LARGE.T(Me) = (W/M) [ASSUMING ONLY 1 PE]SO IF f > r, THEN T(Ma) > T(Mi)
  119. 119. AMDAHL’S LAW• THE THEOREM IMPLIES THAT A SEQ COMPONENTFRACTION ACCEPTABLE FOR THE MACHINE MiMAY NOT BE ACCEPTABLE FOR THE MACHINEMa.• IT IS NOT GOOD TO HAVE A LARGER PROCESSINGPOWER THAT GOES AS A WASTE. PROCS MUSTMAINTAIN SOME LEVEL OF EFFICIENCY.
  120. 120. AMDAHL’S LAW• RELATION BETWEEN EFFICIENCY e AND SEQ FRACTIONr:• S <= 1/[f + (1 – f )/n]• EFFICIENCY e = S/n• SO, e <= 1/[f.n + 1 – f ]• IT SAYS THAT FOR CONSTANT EFFICIENCY, THEFRACTION OF SEQ COMP OF AN ALGO MUST BEINVERSELY PROPORTIONAL TO THE NO OFPROCESSORS.• THE IDEA OF USING LARGE NO OF PROCS MAY THUS BEGOOD FOR ONLY THOSE APPLICATIONS FOR WHICH ITIS KNOWN THAT THE ALGOS HAVE A VERY SMALL SEQFRACTION f.
  121. 121. MINSKY’S CONJECTURE• 1970• FOR A PARALLEL COMPUTER WITH n PROCS, THESPEEDUP S SHALL BE PROPORTIONAL TO log2n.• MINSKY’S CONJECTURE WAS VERY BAD FOR THEPROPONENTS OF LARGE SCALE PARALLELARCHITECTURES.• FLYNN & HENNESSY (1980) THEN GAVE THATSPEEDUP OF n PROCESSOR PARALEL SYSTEM ISLIMITED BY S<= [n/(log2n)]
  122. 122. PARALLEL ALGORITHMS• IMP MEASURE OF THE PERFORMANCE OF ANYALGO IS ITS TIME AND SPACE COMPLEXITY.THEY ARE SPECIFIED AS SOME FUNCTION OF THEPROBLEM SIZE.• MANY TIMES, THEY DEPEND UPON THE USEDDATA STRUCTURE.• SO, ANOTHER IMP MEASURE IS THEPREPROCESSING TIME COMPLEXITY TOGENERATE THE DESIRED DATA STRUCTURE.
  123. 123. PARALLEL ALGORITHMS• PARALLEL ALGOS ARE THE ALGOS TO BE RUNON PARALLEL MACHINE.• SO, COMPLEXITY OF COMM AMONGSTPROCESORS ALSO BECOMES AN IMPORTANTMEASURE.• SO, AN ALGO MAY FARE BADLY ON ONEMACHINE AND MUCH BETTER ON THE OTHER.
  124. 124. PARALLEL ALGORITHMS• DUE TO THIS REASON, MAPPING OF THE ALGO ONTHE ARCHITECTURE IS AN IMP ACTIVITY IN THESTUDY OF PARALLEL ALGOS.• SPEEDUP AND EFFICIENCY ARE ALSO IMPPERFORMANCE MEASURES FOR A PARALLELALGO WHEN MAPPED ON TO A GIVENARCHITECTURE.
  125. 125. PARALLEL ALGORITHMS• A PARALLEL ALGO FOR A GIVEN PROBLEMMAY BE DEVELOPED USING ONE OR MORE OFTHE FOLLOWING:1. DETECT AND EXPLOIT THE INHERENTPARALLELISM AVAILABLE IN THE EXISTINGSEQUENTIAL ALGORITHM2. INDEPENDENTLY INVENT A NEW PARALLELALGORITHM3. ADAPT AN EXISTING PARALLEL ALGO THATSOLVES A SIMILAR PROBLEM.
  126. 126. DISTRIBUTED PROCESSING• PARALLEL PROCESSING DIFFERS FROM DISTRIBUTEDPROCESSING IN THE SENSE THAT IT HAS (1) CLOSECOUPLING BETWEEN THE PROCESSORS & (2)COMMUNICATION FAILURES MATTER A LOT.• PROBLEMS MAY ARISE IN DISTRIBUTED PROCESSINGBECAUSE OF (1) TIME UNCERTAINTY DUE TO DIFFERINGTIME IN LOCAL CLOCKS, (2) INCOMPLETE INFO ABOUTOTHER NODES IN THE SYSTEM, (3) DUPLICATE INFOWHICH MAY NOT BE ALWAYS CONSISTENT.
  127. 127. PIPELINING PROCESSING• A PIPELINE CAN WORK WELL WHEN:1. THE TIME TAKEN BY EACH STAGE IS NEARLYTHE SAME.2. IT REQUIRES A STEADY STEAM OF JOBS,OTHERWISE UTILIZATION WILL BE POOR.3. IT HONOURS THE PRECEDENCE CONSTRAINTSOF SUB-STEPS OF JOBS.IT IS THE MOST IMP PROPERTY OF PIPELINE. ITALLOWS PARALLEL EXECUTION OF JOBSWHICH HAVE NO PARALLELISM WITHININDIVIDUAL JOBS THEMSELVES.
  128. 128. PIPELINING PROCESSING• IN FACT, A JOB WHICH CAN BE BROKEN INTO A NO OFSEQUENTIAL STEPS IS THE BASIS OF PIPELINEPROCESSING.• THIS IS DONE BY INTRODUCING TEMPORALPARALLELISM WHICH MEANS EXECUTING DIFFERENTSTEPS OF DIFFERENT JOBS INSIDE THE PIPELINE.• THE PERFORMANCE IN TERMS OF THROUGHPUT ISGUARANTEED IF THERE ARE ENOUGH JOBS TO BESTREAMED THROUGH THE PIPELINE, ALTHOUGH ANINDIVIDUAL JOB FINISHES WITH A DELAY EQUALLINGTHE TOTAL DELAY OF ALL THE STAGES.
  129. 129. PIPELINING PROCESSING• THE FOURTH IMP THING IS THAT THE STAGES INTHE PIPELINE ARE SPECIALIZED TO DOPARTICULAR SUBFUNCTIONS, UNLIKE INCONVENTIONAL PARALLEL PROCESSORS WEREEQUIPMENT IS REPLICATED.• IT AMOUNTS TO SAYING THAT DUE TOSPECIALIZATION, THE STAGE PROC COULD BEDESIGNED WITH BETTER COST AND SPEED,OPTIMISED FOR THE SPECIALISED FUNCTION OFTHE STAGE
  130. 130. PERFORMANCE MEASURESOF PIPELINE• EFFICIENCY, SPEEDUP AND THROUGHPUT• EFFICIENCY: LET n BE THE LENGTH OF PIPE ANDm BE THE NO OF TASKS RUN ON THE PIPE, THENEFFICIENCY e CAN BE DEFINED AS• e = [(m.n)/((m+n-1).(n))]• WHEN n>>m, e TENDS TO m/n (A SMALL FRACTION)• WHEN n<<m, e TENDS TO 1• WHEN n = m, e IS APPROX 0.5 (m,n > 4)
  131. 131. PERFORMANCE MEASURESOF PIPELINE• SPEEDUP = S = [((n.ts).m)/((m+n-1).ts)]= [(m.n)/(n+m-1)] WHEN n>>m, S=m (NO. OF TASKS RUN) WHEN n<<m, S=n (NO OF STAGES) WHEN n = m, S=n/2 (m,n > 4)
  132. 132. PERFORMANCE MEASURESOF PIPELINE• THROUGHPUT = Th = [m/((n+m-1).ts)] = e/ts WHERE ts ISTIME THAT ELAPSES AT 1 STAGE.• WHEN n>>m, Th = m/(n.ts)• WHEN n<<m, Th = 1/ts• WHEN n = m, Th = 1/(2.ts) (n,m > 4)• SO, SPEEDUP IS A FUNCTION OF n AND ts. FOR A GIVENTECHNOLOGY ts IS FIXED, SO AS LONG AS ONE IS FREETO CHOOSE n, THERE IS NO LIMIT ON THE SPEEDUPOBTAINABLE FROM A PIPELINED MECHANISM.
  133. 133. OPTIMAL PIPE SEGMENTATION• IN HOW MANY SUBFUNCTIONS A FUNCTIONSHOULD BE DIVIDED?• LET n = NO OF STAGES, T= TIME FOR NON-PIPELINED IMPLEMENTATION, D = LATCH DELAYAND c = COST OF EACH STAGE• STAGE COMPUTE TIME = T/n (SINCE T IS DIVIDEDEQUALLY FOR n STAGES)• PIPELINE COST= c.n + k WHERE k IS A CONSTANTREFLECTING SOME COST OVERHEAD.
  134. 134. OPTIMAL PIPE SEGMENTATION• SPEED (TIME PER OUTPUT) = (T/n + D)• ONE OF THE IMPORTANT PERFORMANCEMEASURE IS THE PRODUCT OF SPEED AND COSTDENOTED BY p.• p = [(T/n) + D).(c.n +k)] = T.c +D.c.n + (k.T)/n + k.D• TO OBTAIN A VALUE OF n WHICH GIVES BESTPERFORMANCE, WE DIFFERENTIATE p w r t n ANDEQUATE IT TO ZERO• dp/dn = D.c –(k.T)/n2= 0• n = SQRT [(k.T)/(D.c)]
  135. 135. PIPELINE CONTROL• IN A NON-PIPELINED SYSTEM, ONE INST IS FULLY EXECUTEDBEFORE THE NEXT ONE STARTS, THUS MATCHING THE ORDEROF EXECUTION.• IN A PIPELINED SYSTEM, INST EXECUTION IS OVERLAPPED.SO, IT CAN CAUSE PROBLEMS IF NOT CONSIDERED PROPERLYIN THE DESIGN OF CONTROL.• EXISTENCE OF SUCH DEPENDENCIES CAUSES “HAZARDS”• CONTROL STRUCTURE PLAYS AN IMP ROLE IN THEOPERATIONAL EFFICIENCY AND THROUGHPUT OF THEMACHINE.
  136. 136. PIPELINE CONTROL• THERE ARE 2 TYPES OF CONTROL STRUCTURESIMPLEMENTED ON COMMERCIAL SYSTEMS.• THE FIRST ONE IS CHARACTERISED BY A STREAMLINE FLOWOF THE INSTS IN THE PIPE.• IN THIS, INSTS FOLLOW ONE AFTER ANOTHER SUCH THATTHE COMPLETION ORDERING IS THE SAME AS THE ORDER OFINITIATION.• THE SYSTEM IS CONCEIVED AS A SEQUENCE OF FUNCTIONALMODULES THROUGH WHICH THE INSTS FLOW ONE AFTERANOTHER WITH AN “INTERLOCK” BETWEEN THE ADJACENTSTAGES TO ALLOW THE TRANSFER OF DATA FROM ONESTAGE TO ANOTHER.
  137. 137. PIPELINE CONTROL• THE INTERLOCK IS NECESSARY BECAUSE THE PIPE ISASYNCHRONOUS DUE TO VARIATIONS IN THE SPEEDSOF DIFFERENT STAGES.• IN THESE SYSTEMS, THE BOTTLRNECKS APPEARDYNAMICALLY AT ANY STAGE AND THE INPUT TO IT ISHALTED TEMPORARILY.• THE SECOND TYPE OF CONTROL IS MORE FLEXIBLE,POWERFUL BUT EXPENSIVE.
  138. 138. PIPELINE CONTROL• IN SUCH SYSTEMS, WHEN A STAGE HAS TO SUSPEND THEFLOW OF A PARTICULAR INSTRUCTION, IT ALLOWS OTHERINSTS TO PASS THROUGH THE STAGE RESULTING IN AN OUT-OF-TURN EXECUTION OF THE INSTS.• THE CONTROL MECHANISM IS DESIGNED SUCH THAT EVENTHOUGH THE INSTS ARE EXECUTED OUT-OF-TURN, THEBEHAVIOUR OF THE PROGRAM IS SAME AS IF THEY WEREEXECUTED IN THE ORIGINAL SEQUENCE.• SUCH CONTROL IS DESIRABLE IN A SYSTEM HAVINGMULTIPLE ARITHMETIC PIPELINES OPERATING IN PARALLEL.
  139. 139. PIPELINE HAZARDS• THE HARDWARE TECHNIQUE THAT DETECTS ANDRESOLVES HAZARDS IS CALLED INTERLOCK.• A HAZARD OCCURS WHENEVER AN OBJECT WITHINTHE SYSTEM (REF, FLAG, MEM LOCATION) IS ACCESSEDOR MODIFIED BY 2 SEPARATE INSTS THAT ARE CLOSEENOUGH IN THE PROGRAM SUCH THAT THEY MAY BEACTIVE SIMULTANEOUSLY IN THE PIPELINE.• HAZARDS ARE OF 3 KINDS: RAW, WAR AND WAW
  140. 140. PIPELINE HAZARDS• ASSUME THAT AN INST j LOGICALLY FOLLOWS AN INST i.• RAW HAZARD: IT OCCURS BETWEEN 2 INSTS WHEN INST jATTEMPTS TO READ SOME OBJECT THAT IS BEING MODIFIEDBY INST i.• WAR HAZARD: IT OCCURS BETWEEN 2 INSTS WHEN THE INST jATTEMPTS TO WRITE ONTO SOME OBJECT THAT IS BEINGREAD BY THE INST i.• WAW HAZARD: IT OCCURS WHEN THE INST j ATTEMPTS TOWRITE ONTO SOME OBJECT THAT IS ALSO REQUIRED TO BEMODIFIED BY THE INST i.
  141. 141. PIPELINE HAZARDS• THE DOMAIN (READ SET) OF AN INST k, DENOTED BY Dk,IS THE SET OF ALL OBJECTS WHOSE CONTENTS AREACCESSED BY THE INST k.• THE RANGE (WRITE SET) OF AN INST k, DENOTED BY Rk,IS THE SET OF ALL OBJECTS UPDATED BY THE INST k.• A HAZARD BETWEEN 2 INSTS i AND j (WHERE j FOLLOWSi) OCCURS WHENEVER ANY OF THE FOLLOWING HOLDS:• Ri * Dj <>{ } (RAW)• Di * Rj <> { } (WAR)• Ri * Rj <> { } (WAW), WHERE * IS INTERSECTIONOPERATION AND { } IS EMPTY SET.
  142. 142. HAZARD DETECTION & REMOVAL• TECHNIQUES USED FOR HAZARD DETECTION CAN BECLASSIFIED INTO 2 CLASSES: CENTRALIZE ALL THE HAZARD DETECTION IN ONESTAGE (USUALLY IU) AND COMPARE THE DOMAIN ANDRANGE SETS WITH THOSE OF ALL THE INSTS INSIDETHE PIPELINE ALLOW THE INSTS TO TRAVEL THROUGH THE PIPELINEUNTIL THE OBJECT EITHER FROM THE DOMAIN ORRANGE IS REQUIRED BY THE INST. AT THIS POINT,CHECK IS MADE FOR A POTENTIAL HAZARD WITH ANYOTHER INST INSIDE THE PIPELINE.
  143. 143. HAZARD DETECTION & REMOVAL• FIRST APPROACH IS SIMPLE BUT SUSPENDS THEINST FLOW IN THE IU ITSELF, IF THE INSTFETCHED IS IN HAZARD WITH THOSE INSIDE THEPIPELINE.• THE SECOND APPROACH IS MORE FLEXIBLE BUTTHE HARDWARE REQUIRED GROWS AS ASQUARE OF THE NO OF STAGES.
  144. 144. HAZARD DETECTION & REMOVAL• THERE ARE 2 APPROACHS FOR HAZARD REMOVAL: SUSPEND THE PIPELINE INITIATION AT THE POINT OFHAZARD. THUS, IF AN INST j DISCOVERS THAT THERE ISA HAZARD WITH THE PREVIOUSLY INITIATED INST i,THEN ALL THE INSTS j+1, j+2,… ARE STOPPED IN THEIRTRACKS TILL THE INST i HAS PASSED THE POINT OFHAZARD. SUSPEND j BUT ALLOW THE INSTS j+1, j+2, … TO FLOW.
  145. 145. HAZARD DETECTION & REMOVAL• THE FIRST APPROACH IS SIMPLE BUT PENALIZES ALL THEINSTS FOLLOWING j.• SECOND APPROACH IS EXPENSIVE.• IF THE PIPELINE STAGES HAVE ADDITIONAL BUFFERS BESIDESA STAGING LATCH, THEN IT IS POSSIBLE TO SUSPEND AN INSTBECAUSE OF HAZARD.• AT EACH POINT IN THE PIPELINE, WHERE DATA IS TO BEACCESSED AS AN INPUT TO SOME STAGE AND THERE IS A RAWHAZARD, ONE CAN LOAD ONE OF THE STAGING LATCH NOTWITH THE DATA BUT ID OF THE STAGE THAT WILL PRODUCEIT.
  146. 146. HAZARD DETECTION & REMOVAL• THE WAITING INST THEN IS FROZEN AT THIS STAGE UNTILTHE DATA IS AVAILABLE.• SINCE THE STAGE HAS MULTIPLE STAGING LATCHES IT CANALLOW OTHER INSTS TO PASS THROUGH IT WHILE THE RAWDEPENDENT ONE IS FROZEN.• ONE CAN INCLUDE LOGIC IN THE STAGE TO FORWARD THEDATA WHICH WAS IN RAW HAZARD TO THE WAITING STAGE.• THIS FORM OF CONTROL ALLOWS HAZARD RESOLUTIONWITH THE MINIMUM PENALTY TO OTHER INSTS.
  147. 147. HAZARD DETECTION & REMOVAL• THIS TECHNIQUE IS KNOWN BY THE NAME “INTERNALFORWARDING” SINCE THE STAGES ARE DESIGNED TOCARRY OUT AUTOMATIC ROUTING OF THE DATA TOTHE REQUIRED PLACE USING IDENTIFICATION CODES(IDs).• IN FACT, MANY OF THE DATA DEPENDENTCOMPUTATIONS ARE CHAINED BY MEANS OF ID TAGSSO THAT UNNECESSARY ROUTING IS ALSO AVOIDED.
  148. 148. MULTIPROCESSOR SYSTEMS• IT IS A COMPUTER SYSTEM COMPRISING OF TWO OR MOREPROCESSORS.• AN INTERCONNECTION NETWORK LINKS THESEPROCESSORS.• THE MAIN OBJECTIVE IS TO ENHANCE THE PERFORMANCEBY MEANS OF PARALLEL PROCESSING.• IT FALLS UNDER THE MIMD ARCHITECTURE.• BESIDES HIGH PERFORMANCE, IT PROVIDES THEFOLLOWING BENEFITS: FAULT TOLERANCE & GRACEFULDEGRADATION; SCALABILITY & MODULAR GROWTH
  149. 149. CLASSIFICATION OF MULTI-PROCESSORS• MULTI-PROCESSOR ARCHITECTURE:TIGHTLY COUPLED LOOSELY COUPLEDUMA NUMA NORMANO REMOTE MEMORY ACCESSIN A TIGHTLY COUPLED MULTI-PROCESSOR, MULTIPLE PROCS SHAREINFO VIA COMMON MEM. HENCE, ALSO KNOWN AS SHARED MEM MULTI-PROCESSOR SYSTEM. BESIDES GLOBAL MEM, EACH PROC CAN ALSOHAVE LOCAL MEM DEDICATED TO IT.DISTRIBUTED MEM MULTI-PROCESSOR SYSTEM
  150. 150. SYMMETRICMULTIPROCESSOR• IN UMA SYSTEM, THE ACCESS TIME FOR MEM IS EQUALFOR ALL THE PROCESSORS.• A SMP SYSTEM IS AN UMA SYSTEM WITH IDENTICALPROCESSORS, EQUALLY CAPABLE IN PERFORMINGSIMILAR FUNCTIONS IN AN IDENTICAL MANNER.• ALL THE PROCS. HAVE EQUAL ACCESS TIME FOR THE MEMAND I/O RESOURCES.• FOR THE OS, ALL SYSTEMS ARE SIMILAR AND ANY PROC.CAN EXECUTE IT.• THE TERMS UMA AND SMP ARE INTERCHANGABLY USED.

×