2. Evalution of computer system:
Relays and vacuum tubes->diodes and transistors->to small –and medium-scale
integreated(SSI/MSI)circuits to->large scale and very large scale(VLSI).
Increased speed and reliability and reduction in HW cost and size.
Von Neuman I/P->Process->O/P.
16th century->word computer-Chinesh&Egypthian.
Architecture:
Arrangements of components with one another to execute a task.
Hard vand->Separated H/W we have to connect one another.
Von Numan->Constructed as a single unit through bus we can communicate.
3. Modern com/...sys composite of
Processor.
Memories.
Functional units.
Inter connection Networks.
Peripheral devices, database.
4. Computer Architecture:
System component integrated Hardware, Software algorithm and
languages to perform large computations.
6. First Generation(1983-1953):
First Electronic analog computer.
First Electronic digital computer.
ENIAC->Electronic Numercial integrator and computer.
Electromechanical relays were used as switching devices(1940).
1950->Vacum tubes->interconnected by insulated wires.
Arithmetic is done on a bit-by-bit fixed point basis, which uses a single
full adder and one bit of carry flay.
BCD(Machine language was used in early computers.
First stored program computers (os).
EDVACElectronic Discrete Variable Automatic Computer.
7. Second Generation(1952-1963):
Transistors 1948.
TRADIC->by Bell Laboratories.
800 transistor used.
Printed circuits appears.
Current magnetic core memory was developed and sub sequently applaced in many
machines.
Assembly language were used.
Fortan-1956 (FORTAN).
Algol-1960.
1959->Sperry Rand build larc IDM started stretch project.
Larc-had independent I/O processor paralled with one or two processing usints.
COBOL1959.
Inter changeable disk pack-1963.
Batch processing was popular.
8. Third Generation(1962-1975):
SSI and MSI->as the basic building block.
Multiprogramming was well developed to allow the simultaneous
execution of many program segments interleaved with intelligent
compilers during this --------I/O operation.
Time sharing O/S-1960’s.
Virtual memory was developed by using hierarchically structured
memory system.
9. Fourth Generation(1972-present):
LSI-circuit for both logic and memory sections.
High-level language are being extented to handle both scalar and vector
data.
Most O/S are time sharing using virtual memories.
High-speed main frames and supers appear in multiprocessor system.
A high degree of pipelining and multiprocessing is greatly emphasized in
commercial supercomputers.
Massinely parallel processor(MPP)-1982.
16,384 bit-slice Microprocessor, is under the control of one array
controller for satellite image processing.
10. Trends towards parallel processing:
4 Ascending level
1. Data processing
2. Information processing
3. Knowledge processing
4. Intelligence processing
11.
12. Data processing:
Largest space.
Numeric numbers, character symbols, multidimensional measures.
Huge amount of data are being generated daily in all walks of life.
Scientific, business&government sectors.
Information item processing:
Collection of data object that are related by some syntatics structure of
the relation.
It forms sub space of the data space.
13. KNOWLEDGE:
Consists of information items plus some semantic meaning.
Forms subspace of information space.
From an O/S point of view 4Phases:
Batch processing
Multiprogramming
Time sharing
Multiprocessing
Degree of parallism increases sharply form phase to phase.
14. Definition parallel processing:
Efficient form of info/... processing.
Exploitation of concurrent events in the computing process.
Concurrent implies:
Parallism
Simultancity
Pipelining
Parallel Events may
Occur in multiple resources during the same time interval.
Simultaneous eventsMay occur at the same time instant.
Pipelined eventsMay occur in overlapped time span.
P-PDemands concurrent execution.
Cost effective, improce system performance.
15. The highest level of parallel processing is conducted among multiple jobs
or program through multiprog/... time sharing and multiprocessing.
Use parallel processable algorithm.
Implement of parallel algorithm depends on the efficient allocation of
limited H/W-S/W resources to multiple pgms being used to solve a large
computation problem.
The next highest level of parallel processing is conducted among
procedures or tasks. With in the same program.
16. Decomposition of a program into
multiple task.
3rd level exploit concurrency among multiple instructions.
Job or program level
Task or procedure level
Inter instruction level
Intra instruction level
Highest Job level by->algorithms.
Lowest Job level by->Hardware.
H/W level->high to low level increased.
S/W-low to high level (implementation).
19. Super minicomputer VAX-11/780.
Manufactured by digital equipment company.
CPU contains the master control of the VAX system.
16-32bit general-purpose register, one the which sever as the program counter
(PC).
Special CPU status Register.
Containing information about the current state of the processor and of the
program being executed.
ALUwith an optional floating point accelerator.
Some local cache memory with optional dynamic memory.
The cpu main memory and I/O subsystems connected by common by (SBI
Synchronous Backplane Interconnect).
All I/O devices can communicate with each other through this bus.
Peripheral storage or I/O devices can be connected directly to the SBI
through the unibus and its controller or through a massbus and its
20.
21. Main memory divided into 4 units (logical storage units).
Storage controller provides multiport connections b/w the cup and 4 LSUS.
Periperals are connected to the system via high-speed I/O channels which
operate asynchronously with the cpu.
It is necessary to balance the processing rate of various subsystems in
order to avoid bottle necks and to increase total system throughput.
Throughtputwhich is the number of instructions performed per unit
time.
22. Parallel processing mechanism in
uniprocessor computers:
Multiplicity of functional units.
Parallelism and pipelining with the cpu.
Overlapped cpu and I/O operation.
Use of hierarchical memory system.
Balance of subsystem bandwidths.
Multiprogramming and time sharing.
1.Multiplicity of functional units:
Early comp->only one ALU in cpu.
ALU: Could only perform one function at a time.
CDC->6600-1964:
Has 10 functional unit built into its cpu.
23.
24. These units are independent of each other may operate simultaneously.
Score board:
Used to keep track of the availblity of the functional units.
10 functional units 24 registers available.
Instruction issue rate can be significantly increased.
Another & IBM 360/91(1968)
Which has two parallel execution.
Floating point arithmetic.
Floating point add-subtract.
Within the floating point E unit are 2 functional unit.
Fixed point arithmetic.
F/P multiply&devide.
Highly pipelined, multifunction, scientic uniprocessor.
25. 2.Parallelism and pipelining with in the
cpu:
Parallel adderusing technique
Carry-lookahead
Carry-save
High-speed multiplies recording and comcesence division are techniques for
exploring parallelism and sharing of hardware resources for the functions of
multiply and divide.
Various phases of instruction executions are now pipelined->Inst/...fetching,
decode, operand fetch arithmetic logic execution and store result.
Instruction prefetch and data buffering technique have been developed.
Most commercial uniprocessor systems are now pipelined in their cpu with a
clock rate b/w 10 and 500ns.
26. 3.Overlapped cpu and I/O operation:
I/O operations can be performed simultaneously with the cpu
computations by using separate I/O controller, channels or I/O processor.
DMA channel can be used to provide direct information transfer b/w the
I/O devices and the main memory.
DMA is conducted on a cycle-stealing basis which is apparent to cpu.
Back-end db machines can be used to manage large db stored on disk.
27. 4.Use of hierarchical memory system:
The inner most level is the register files directly addressable by ALU.
Cache memory can be used to serve as a buffer b/w the cpu and main
memory.
Block access of the main memory can be achieved through multiway
interleaving across parallel memory modules.
V/memory space can be established with the use of disks and tape units
at the outer levels.
28.
29. 5.Balance of subsystems bandwidth:
In general the cpu is the fastest unit in a computer, with a processor cycle
tp of tens of nanosecond.
The main memory has a cycle time tm of hundreds of nanosecond.
The I/O devices are the slowest with an average access time td of a few
millisecond.
td>tm>tp
30. Example:
IBM 370/168
td=5ms(disk)
tm=320ns
tp=80ns
With these speed gaps b/w the subsystem we need to match their
processing bandwidth in order to avoid a system bottle neck problem.
The bandwidth of a system is defined as the no/..of operation performed
per unit time.
The memory bandwidth is measured by the no of memory words that can
be accessed per unit time.
Bm=W/tm(word/s or byte/s)
31.
32.
33. 6.Multiprogramming and time
sharing:
With in the same time interval, there may be multiple processes active in
a computer computing for memory, I/O and cpu resources.
Some computer programs are
CPU bound(Computation intensive)
I/O bound (I/O intensive)
The program interlaving is internted to promote better resource
utilization through overlapping I/O&CPU operations.
34.
35.
36. Whenever a process p1 is tied up with I/O operation, the system scheduler
can switch the cpu to process p2. This allows the simultaneous execution
of several programs in the system.
When p2 is done, the cpu can be switched to p3.
The overlapped I/O and cpu operations and the cpu wait time are greatly
reduced.
This interleaving of cpu and I/O operations among several programs is
called multiprogramming.
Figure b:
37. Time sharing Fig 1.9c:
M/pg on a uniprocessor is centered around the sharing of the cpu by
many problems.
Sometimes a high-priority program may occupy the cpu for too long to
allow others to share.
This problem can be overcome by using a time-sharing o/s.
Multiprogramming by assigning fixed or variable time slice to
multiprograms.
Equal opertunities are given no all programs computing for the use of
CPU.
The execution time saved with time sharing may be greater than either
batch or multiprogram processing mode.
39. 1.Pipeline computers:
The process of executing an instruction in a digital computers involves 4
major steps.
IF->Instruction fetch from the main memory.
ID->Inst/...decoding->Identify the operation to be performed.
OF->Operand fetch if need the execution and then.
EX->Execution of the decoded arithmetic logic operation.
In a nonpipelined computer, these 4 steps must be completed before the
next instruction can be issued.
In pipelinedexecuted in a overlapped fashion.
Stage to stage is triggered by a common clock of pipeline.
40.
41. Both scalar arithmetic pipeline and vector arithmetic pipelines are provided.
The instruction preprocessing unit is itself pipelined with three stages shown.
The of stage consists of two independents stages.
One for fetching scalar operands.
Vector operands.
The scalar register are feeuel in quantity than the vector register because each
vector register implies a whole set of component register.
Scalar processoract on a single data stream where as a vector processor
works on a 1D (vector) of numbers (multiple data stream).
SIMD-Example of vector.
Super scalar->Multiple instruction at once but from the same instruction
stream.
Super scalar->MIMD->should not confused with MIMDused more in parallel
computing arch/.... becoz there are multiple instruction stream operating
independently.
42. 2.Array computers:
Is a synchronous parallel computer with multiple arithmetic logic units.
Called processing element (PE) that can operate in parallel in a lock step
fashion.
The PE are synchronized to perform the same function at the same time.
An appropriate data-routing mechanism must be established among the
PES.
44. Scalar and control-type instructions are directly executed in the control unit
(CU).
Each PEs consists of an ALU with registers and a local memory.
The PEs are interconnected by a data-routing n/w.
The interconnection pallern to be established for specific computation is under
program control from the cu.
Vector instructions are broad cast to the PEs for distributed execution over
different component operands fetched directly from the local memories.
Instruction fetch (from local memory or from the control memory) and decode
is done by the control unit.
The PEs are passive devices without instruction decoding capabilities.
Associative memorywhich is content addressable will also be treated there in
the contex of parallel processing.
Array processor designed with associative memories called associative
processor.
45. 3.Multiprocessor systems:
Research and development of multiprocessor systems are aimed at
improving throughput, reliability, felexibility and availability.
The system contains two or more processors of approxincately comparable
capabilities.
All processors share access to common sets of memory modules, I/O
channels, and peripheral devices.
Most importantly, the entire system must be controlled by a single integrated
o/s providing intractions b/w processors and their programs at various levels.
Besides the shared memories and I/O devices each processor has its own
local memory and private devices.
Interprocessor communications can be done through the shared memories
or through an interucpt n/w.
Multiprocessort H/W sys organization is determind primarily by the
interconnection structure to be used b/w the memories and processor.
46. Three different interconnection have been participated in the past.
1. Time-shared common bus.
2. Crossbar switch n/w.
3. Multiport memories.