Executing two or more operations at the same time is known as Parallelism .
Goals of Parallelism:
The purpose of parallel processing is to speedup the computer processing capability or in words, it increases the computational speed.
Increases throughput, i.e. amount of processing that can be accomplished during a given interval of time.
Improves the performance of the computer for a given clock speed.
Two or more ALUs in CPU can work concurrently to increase throughput.
The system may have two or more processors operating concurrently.
Exploitation of Concurrency:
Techniques of Concurrency:
Overlap : execution of multiple operations by heterogenous functional units.
Parallelism : execution of multiple operations by homogenous functional units.
A computer’s performance is measured by the time taken for executing a program.
The program execution involves performing instruction cycles, which includes two types of operations:
Internal Micro-operations: performed inside the hardware functional units such as the processor, memory, I/O etc.
Transfer of information: between different functional hardware units for Instruction fetch, operand fetch, I/O operation etc.
Types of Parallelism:
Instruction Level Parallelism (ILP)
Processor Level Parallelism
Instruction Pipeline (sec-9.4)
An instruction pipeline reads consecutive instructions from memory while previous instructions are being executed in other segments.
Computer needs to process each instruction with the following sequence of steps.
Fetch the instruction from memory
Decode the instruction
Calculate the effective address
Fetch the operands from memory
Execute the instruction
Store the result in the proper place
Four segment CPU Pipeline Fetch Instruction Decode & calculate effective Address Branch? Fetch Operand Execute Instruction Interrupt? Interrupt handling Update PC Empty Pipe Yes No No Yes
Timing of Instruction Pipeline Instruction Step 1 2 3 4 5 6 7 8 9 10 11 12 13 1 FI DA FO EX 2 FI DA FO 3 FI DA 4 FI - - FI DA FO EX 5 - - - FI DA FO EX 6 FI DA FO EX 7 FI DA FO EX
Resource conflicts caused by access to memory by two segments at the same time. These may be resolved by using separate instruction and data memories.
Data Dependency conflicts arise when an instruction depends on the result of a previous instruction, but this result is not yet available.
Branch Difficulties arise from branch and other instructions that change the value of PC.
Instruction-level parallelism (ILP)
Instruction-level parallelism (ILP) is a measure of how many of the operations in a computer program can be performed simultaneously.
Micro-architectural techniques that are used to exploit ILP include:
Instruction pipelining where the execution of multiple instructions can be partially overlapped.
Superscalar execution in which multiple execution units are used to execute multiple instructions in parallel. In typical superscalar processors, the instructions executing simultaneously are adjacent in the original program order.
A superscalar CPU architecture implements a form of parallelism called instruction-level parallelism within a single processor.
It therefore allows faster CPU throughput than would otherwise be possible at a given clock rate .
A superscalar processor executes more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to redundant functional units on the processor.
Each functional unit is not a separate CPU core but an execution resource within a single CPU such as an arithmetic logic unit , a bit shifter, or a multiplier .
While a superscalar CPU is typically also pipelined .
Pipelining and Superscalar architecture are considered different performance enhancement techniques.
The superscalar technique is associated with several identifying characteristics (within a given CPU core):
Instructions are issued from a sequential instruction stream.
CPU hardware dynamically checks for data dependencies between instructions at run time (versus software checking at compile time )
The CPU accepts multiple instructions per clock cycle.
Available performance improvement from superscalar techniques is limited by two key areas:
The degree of intrinsic parallelism in the instruction stream, i.e. limited amount of instruction-level parallelism, and
The complexity and time cost of the dispatcher and associated dependency checking logic.
The branch instruction processing.
Processor Level Parallelism
Multiprocessing is the use of two or more central processing units (CPUs) within a single computer system.
The term also refers to the ability of a system to support more than one processor and/or the ability to allocate tasks between them.
Multiprocessing sometimes refers to the execution of multiple concurrent software processes in a system as opposed to a single process at any one instant.
The terms multitasking or multiprogramming are more appropriate to describe this concept, which is implemented mostly in software, whereas multiprocessing is more appropriate to describe the use of multiple hardware CPUs.
A system can be both multiprocessing and multiprogramming, only one of the two, or neither of the two.
In a multiprocessing system, all CPUs may be equal, or some may be reserved for special purposes.
In multiprocessing, the processors can be used to execute a single sequence of instructions in multiple contexts
In a single instruction stream, single data stream or SISD, one processor sequentially processes instructions, each instruction processes one data item.
S ingle-instruction, multiple-data or SIMD, often used in vector processing
Multiple sequences of instructions in a single context multiple-instruction, single-data or MISD, used to describe pipelined processors.
Multiple sequences of instructions in multiple contexts ( multiple-instruction, multiple-data or MIMD .
Amdahl's law , also known as Amdahl's argument , is named after computer architect Gene Amdahl , and is used to find the maximum expected improvement to an overall system when only part of the system is improved.
It is often used in parallel computing to predict the theoretical maximum speedup using multiple processors.
The speedup of a program using multiple processors in parallel computing is limited by the time needed for the sequential fraction of the program.
Amdahl's law is a model for the relationship between the expected speedup of parallelized implementations of an algorithm relative to the serial algorithm, under the assumption that the problem size remains the same when parallelized.
The law is concerned with the speedup achievable from an improvement to a computation that affects a proportion P of that computation where the improvement has a speedup of S . (For example, if an improvement can speed up 30% of the computation, P will be 0.3; if the improvement makes the portion affected twice as fast, S will be 2.)
Amdahl's law states that the overall speedup of applying the improvement will be:
Old Running Time = 1
New Running Time (1-P)+P/S
To see how this formula was derived, assume that the running time of the old computation was 1, for some unit of time. The running time of the new computation will be the length of time the unimproved fraction takes, (1 − P ), plus the length of time the improved fraction takes.
The length of time for the improved part of the computation is the length of the improved part's former running time divided by the speedup, making the length of time of the improved part P / S . The final speedup is computed by dividing the old running time by the new running time, which is what the above formula does.
In the case of parallelization, Amdahl's law states that if P is the proportion of a program that can be made parallel (i.e. benefit from parallelization), and (1 − P ) is the proportion that cannot be parallelized (remains serial), then the maximum speedup that can be achieved by using N processors is :
P can be estimated by using the measured speedup SU on a specific number of processors NP using
P estimated = (1/SU – 1) / (1/NP-1).
P estimated in this way can then be used in Amdahl's law to predict speedup for a different number of processors.
Basic Page Replacement
Find the location of the desired page on disk
Find a free frame: - If there is a free frame, use it - If there is no free frame, use a page replacement algorithm to select a victim frame
Bring the desired page into the (newly) free frame; update the page and frame tables
Restart the process
Page Replacement : OPT (Optimal Policy)
Replace page that will not be used for longest period of time. It requires future knowledge.
3 frames example
Adv : Reduces Page Faults
Disadv : It is difficult to implement as future knowledge of reference string is required.
CAO Model Question Paper
Unit – 2
Q:1 Discuss hardwired control design method.
Q:2 Explain the concept of Micro programmed Sequencer.
Q:3 Explain five stages instruction cycle with the help of flowchart.
Q:4 What is Content Addressable Memory? Explain the match logic in CAM.
Q:5 What do you mean by “Next Address Generator” present in a Microprogrammed Control Unit? Explain briefly along with its block diagram.
Q:6 Explain 2D RAM organization with suitable diagram. (J. P. Hayes-Sec-6.1.2,fig 6.8, fig 6.13)
CAO Model Question Paper Unit – 3
Q: 1 Discuss the principle of Locality of Reference associated with the Memory Hierarchy.
Q: 2 What are the reasons for using Virtual memory? Distinguish between Paging and Segmentation.
Q: 3 What are LRU and OPT Policies of page replacement? Compare them.
Q: 4 Explain working of a cache memory. What are the relative advantages and disadvantages of direct and Associative mapping of cache memory?
Modes of transfer (11.4)
Data transfer between the central computer and the I/O devices may be handled in a variety of modes.
The modes of transfer are:
Direct memory access (DMA)
Programmed I/O operations are the result of I/O instructions written in the computer program.
Each data item transfer is initiated by an instruction in the program.
Usually, the transfer is to and from CPU register and peripheral.
Other instructions are needed to transfer the data to and from CPU and memory.
Transferring data under program control requires constant monitoring of the peripheral by the CPU .
Once a data transfer is initiated, the CPU is required to monitor the interface to see when a transfer can again be made.
It is up to the programmed instructions executed in the CPU to keep close tabs on everything that is taking place in the interface unit and the I/O device.
In this method, the CPU stays in a program loop until the I/O unit indicates that it is ready for data transfer.
This is a time-consuming process since it keeps the processor busy needlessly.
Interrupt Initiated IO
In the programmed I/O method, the CPU stays in a program loop until the I/O unit indicates that it is ready for data transfer.
This is a time-consuming process since it keeps the processor busy needlessly.
It can be avoided by using interrupt facility and special commands to inform the interface to issue an interrupt request signal when the data are available from the device .
In the mean-time the CPU can proceed to execute another program.
The interface meanwhile keeps monitoring the device.
When the interface determines that the device is ready for the data transfer, it generates an interrupt request to the computer.
Upon detecting the external interrupt signal, the CPU momentarily stops the task it is processing, branches to a service program to process the I/O transfer, and then returns to the task it was originally performing.
Types of Interrupt
External interrupts come from I/O devices, from a timing device, from a circuit monitoring the power supply, or from any other external source. For example: Timeout interrupt
Internal interrupts arise from illegal or erroneous use of an instruction or data. Internal interrupts are also called traps . For example, attempt to divide by zero.
The difference between internal interrupt and external interrupt
The internal interrupt is initiated by some exceptional condition caused by program itself rather than by an external event.
External interrupts depend on external conditions that are independent of the program.
Software Interrupt: A software interrupt is initiated by executing an instruction. Software interrupt is a special call instruction that behaves like an interrupt rather than a subroutine call. The most common use of a software interrupt is associated with a supervisor call instruction. This instruction provides means for switching from a CPU user mode to the supervisor mode.
DMA (Direct Memory Access – 11.6 )
Direct memory access is an I/O technique used for high speed data transfer.
In DMA, the interface transfers data into and out of the memory unit through the memory bus .
In DMA, the CPU releases the control of the buses to a device called a DMA controller.
Removing the CPU from the path and letting the peripheral device manage the memory buses directly would improve the speed of transfer.
The CPU initiates the transfer by supplying the interface with the starting address and the number of words needed to be transferred and then proceeds to execute other tasks.
When the transfer is made, the DMA requests memory cycles through the memory bus.
When the request is granted by the memory controller, the DMA transfers the data directly into memory.
The CPU merely delays its memory access operation to allow the direct memory I/O transfer.
Many computers combine the interface logic with the requirements for direct memory access into one unit and call it an I/O processor ( IOP).
A DMA controller takes over the memory buses to manage the transfer directly between the I/O device and memory using 2 special control signals BR And BG .
The BR (bus request) output signal is used by the DMA controller to the CPU to take control of memory buses.
The CPU then activates the BG (BUS GRANT) signal to inform the external DMA that the buses are in high-impedance state.
Then DMA takes the control of memory buses.
DMA CONTROLLER: It needs the usual circuit of an interface to communicate with the CPU and I/O device.
CPU bus signals for DMA Transfer
BR DBUS ABUS CPU RD BG WR
Block diagram of DMA Controller
Address Bus buffers Address registers Word count Register Control Register DS RS DMA RD Control logic WR BR BG Interrupt DMA Req DMA ACK To IO Device Data Bus buffers Data bus Address bus