SlideShare a Scribd company logo
Elements of a modern computer
Elements of a modern computer
 Computing problems: the problems for which computer system should be
constructed.
 Algorithms and data structure: Special algorithms and data structures are
needed to specify the computations and communications involved in
computing problems.
 Hardware resources: Processors, memory, peripheral devices.
 Operating system:
 System software support: Programs written in High level language. The
source code translated into object code by a compiler.
• Compiler support: 3 compiler approaches:
• 1. Prepocessor: uses a sequential compiler.
• 2. Precompiler: requires some program flow analysis, dependence checking
towards parallelism detection.
• 3. parallelizing compiler: demands a fully developed parallelizing compiler
which can automatically detect parallelism.
Evolution of Computer Architecture
FLYNN’S CLASSIFICATION
Parallel/Vector computers
 Execute programs on MIMD mode.
 2 major classes:
 1. shared-memory multiprocessors
 2. message-passing multicomputers.
System attributes to performance
 Turn around Time :-It is the time which includes disk
and memory access, input and output activities,
compilation time , OS overhead and CPU time.
 Clock rate and CPI: processor is driven by a clock
with a constant cycle time =t. The inverse of cycle
time is the clock rate (f=1/t).
 The size of the program is determined by its
instruction count. (Ic). Different machine instructions
may require different number of clock cycles to
execute. Therefore CPI (cycles per instruction)
becomes an important parameter.
Performance factors
 Ic= no. of instructions in a given program.
 Thus the CPU time needed to execute the program
is by finding the product of three factors:
T= Ic * CPI * t
 Instruction cycle requires cycle of events: instruction
fetch, decode, operand fetch, execution and store
results.
 In that cycle, only decode and execution phases are
carried out in the CPU. The remaining three
operations may require access to the memory.
 Therefore memory cycles is the time needed to
complete one memory reference.
 Therefore CPI is divided into 2 components=
processor cycles and memory cycles.
 Depending on the instruction type, the complete
instruction cycle may involve one to as many as four
memory references. (one for instruction fetch, two for
operand fetch, one for storing result).
 Therefore T= Ic * (p + m*k) * t
 Ic= instruction count
 P= number of processor cycles.
 M= number of memory references
 K= ratio between memory and processor cycle.
 t= processor cycle time
 T=CPU Time
System Attributes
 The 5 performance factors are influenced by 4
system attributes:
 Instruction-set architecture
 Compiler technology
 CPU implementation and control
 Cache and memory hierarchy
 Instruction-set architecture= Ic, p (processor cycle
per instruction)
 Compiler technology= Ic, p, m (memory references
per instruction)
 CPU implementation and control= p, t (processor
cycle time) total processor time needed.
 Cache and memory hierarchy= affects memory
access latency = k, t
MIPS RATE
 MIPS (millions instructions per second)
 Throughput rate: how many programs can a system
execute per unit time is called system throughput
Ws.
 In multiprogrammed, the system throughput is often
lower than the cpu throughput Wp.
 Because of additional system overheads caused by
the i/o, complier and os.
MIPS
 Let C be the total number of clock cycles needed to
execute a program.
 Therefore CPU time T= C*t= C/F
 Furthermore CPI= C/Ic
 T= CPI*Ic*t
 T= Ic*CPI/f
 The processor speed measured in MIPS.
 mips rate = Ic/T*10^6
 = f/CPI*10^6
 = f*Ic/ C*10^6
 F is clock rate/.
 Wp= f/Ic*CPI
Implicit and explicit parallelism
 Explicit parallelism requires more effort by the
programmer to develop a source program using
parallel dialects of C, C++, fortron, pascal.
Parallelism is specified in the user program. Here the
compiler needs to preserve the parallelism.
MULTI-PROCESSORS & MULTI COMPUTERS
1. Two categories of parallel computers
a. Shared memory multiprocessor
b. Distributed-memory multicomputer
Shared-memory multiprocessors:
1. The uniform memory access (UMA) model.
2. The non-uniform memory access (NUMA)
3. The cache-only memory architecture (COMA)
Note:
1. These models differ in how the memory and peripheral resources are shared or
distributed
 The physical memory is uniformly shared by all the processors. All
processors have equal access time to all memory words. Each processor
may use its own private cache.
 Peripherals are also shared in the same fashion.
 Multiprocessors are called tightly coupled systems due to high resource
sharing.
 The system interconnect takes the form of a shared bus, a crossbar switch,
or a multistage network. All the processors uniformly share the physical
memory
UNIFORM MEMORYACCESS
UNIFORM MEMORYACCESS
1. UMA model is suitable for general-purpose and time-sharing applications.
2. When all processors have equal access time to all the peripheral devices,
the system is called a symmetric multiprocessor.
3. In an asymmetric multiprocessor, only one or a subset of processors are
executive capable.
4. An executive or master processor can execute the operating system and
handle I/O. The remaining processor has no I/O capabilities, and thus are
called the attached processors.
NUMA MODEL
 In this the access time varies with the location of the memory word. The shared
memory is physically distributed to all processors called local memories.
 The collection of all local memories forms a global address space accessible by all
processors.
 It is easier to access local memory with a local processor. The access of remote
memory attached to other processors takes longer due to added delay through the
interconnection network.
 There are 3 memory access patterns:
 Fastest is local memory access
 Next is global memory access
 Slowest is access to remote memory
COMA MODEL
 The COMA model is a particular case of a NUMA machine, in which the
distributed main memories are converted to caches.
 There is no memory hierarchy at each processor node.
 All the caches form a global address space. The distributed cache directories
assist remote cache access.
 Other variants of multiprocessor: CC-NUMA (Cache Coherent)
Distributed memory multi-computers
 The system consists of multiple computers often called nodes, Interconnected by
a message passing network.
 Each node is an autonomous computer consisting of:
 processor
 local memory and
 sometimes attached disc or I/O peripherals
 The message-passing network provides point-to-point static connections among
the nodes.
 All local memories are private and are accessible only by local processors.
 For this reason traditional multi-computers have been called no remote memory
access machines (NORMA)
Distributed memory multi-computers
MULTI COMPUTER GENERATIONS
 Message-passing multi-computers have gone through two generations of
development and a nee generation is developing.
 1st- 1983-1987- based on processor board technology using hypercube
architecture and s/w controlled message switching. E.g. Caltech Cosmic and
Intel iPSC/1.
 2nd 1988-1992- implemented with mesh-connected architecture and a software
environment for medium-grain distributed computing. E.g. Intel paragon and
the Parsys Supernode 1000.
 The 3rd 1992-1997 is expected to be a fine-grain multicomputer. E.g. MIT-J
Machine, Caltech Mosaic
VECOTR SUPER COMPUTERS
1. A vector computer is built on top of a scalar processor.
2. The vector processor is attached to the scalar processor as an optional feature
3. Program and data are first loaded into the main memory through a host computer.
4. All instructions are decoded by the
scalar control unit.
5. If scalar= goes to scalar processor.
6. If vector= sent to vector control unit.
VECTOR PROCESSOR MODEL
• The diagram shown is a register-to-register architecture.
• Vector registers are used to hold the vector operands, intermediate and final results.
• The vector functional pipelines retrieve operands from and put results into the
vector registers
• Each vector register is equipped with a component counter which keeps track of the
component registers used in successive pipeline cycles.
• The length of each vector register is usually fixed such as Cray series=fixed length
vector registers.
• Fujitsu VP2000= dynamically.
• Memory-to-memory= vector operands and results are directly retrieved and stored
from the main memory. E.g. 512 bits as in cyber 205.
SIMD SUPERCOMPUTERS
Operational Model
An operational model of a SIMD computer is specified by a tuple.
1. M = {N, C, I, M, R}
2. N = no. of processing elements (PEs).
3. C = set of instructions directly executed by the control unit (CU), including scalar and
program flow control instructions
4. I = is the set of instructions broadcast by the CU to all PEs for parallel execution. It
includes arithmetic, logic, data routing, masking, and other local operations executed
by each active PE over data within that PE
5. M = set of masking scheme, where mask partitions the set of PEs into enabled and
disabled.
6. R = set of data routing functions, specifying various patterns to be set up in the
interconnection network for Inter-PE communications.
PRAM AND VLSI MODELS
1. The ideal models provide a convenient framework for developing parallel algorithms
without worrying about the implementation details of physical constraints.
2. The models can be used to obtain the theoretical performance bounds on parallel
computers or to estimate VLSI complexity on-chip area and execution time before the
chip is fabricated.
Parallel Random Access Memory
1. Theoretical models of battle computers are:
a. Time and Space Complexities
b. NP-Completeness
c. PRAM Models
Parallel Random Access Machine (PRAM)
Time and Space complexity
1. The complexity of an algorithm for solving a problem of size “S” on a computer is determined
by the execution time and the storage space required.
Time Complexity
1. The time complexity is a function of the problem size.
2. The time complexity function in order notation is the asymptotic time complexity of the
algorithm.
3. Usually the worst-case time complexity is considered.
Space Complexity
1. The space complexity is defined as a function of the problem size “S”
2. The asymptotic space complexity refers to the data storage of large problems.
Note:
1. The program storage requirement and the storage for input data are not considered in this.
2. The time complexity of a serial algorithm is simply called serial complexity.
3. The time complexity of a parallel algorithm is called parallel complexity
Parallel Random Access Machine (PRAM)
NP-Completeness
1. An algorithm has a polynomial complexity if there exists a polynomial p(s) such that the
time complexity is O(p(s)) for any problem size “s”.
2. The set of problems having polynomial complexity algorithms is called P-class
(Polynomial Class)
3. The set of problems solvable by the nondeterministic algorithm in polynomial time is
called NP-class (for nondeterministic polynomial class)
4. Since deterministic algorithms are special cases of the non-deterministic ones we know
that capital P is a subset of NP.
5. The P-class problems are computationally tractable while NP-P class problems are
.intractable
PRAM Models
1. Conventional uni-processor computers were modeled as a random access machine
By Sheperdson and Sturgis (1963).
2. A Parallel random–access machine model (PRAM) has been developed by Fortune
and Wyllie for modeling idealized parallel computers with zero synchronization or
memory access overhead.
3. This PRAM model will be used for parallel algorithm development and for
scalability and complexity analysis.
4. An N- Processor PRAM has a globally addressable memory.
5. The shared memory can be distributed among the processors
or centralized In one place.
6. These processors work on synchronized read memory,
compute, and write memory cycles.
Which shared memory the model must specify how congruent read and congruent write of
memory is handled.
Four memory update options are possible
1. Exclusive read (ER)  This allows only one processor to read from any memory location
in each cycle.
2. Exclusive write (EW)  This allows at most one processor to write into a memory
location at a time.
3. Concurrent read (CR)  This allows multiple processors to read the same information
from the same memory cell in the same cycle.
4. Concurrent write (CW)  It allows simultaneous writes to the same memory. In order to
avoid confusion, some policies must be set up to resolve the right conflicts.
Note:
 Various combinations of the above options lead to several variants of the PRAM
model
PRAM Models
1. EREW-PRAM model  This model forbids more than one processor from reading or
writing the same memory cell simultaneously.
2. CREW-PRAM model  The write conflicts are avoided by mutual exclusion. Current
reads to the same memory locations are allowed.
3. ERCW PRAM model  This allows exclusive read or concurrent write to the same
memory location
4. CRCW-PRAM model  This model allows either concurrent reads or concurrent writes
at the same time. The conflicting rights are resolved by one of the following four policies
a. Common  All simultaneous write stores the same value to the hot-spot memory location
b. Arbitrary  Any one of the values written may remain, the others are ignored.
c. Minimum  The value written by the processor with the minimum index will remain.
d. Priority  The value being written are combined using some associative functions such
as summation or maximum
PRAM VARIANTS
DISCREPANCY WITH PHYSICAL MODELS
1. PRAM models idealized parallel computers, in which all memory references and
program executions by multiple processors are synchronized without extra stock
cost. In reality, these models don’t exist
2. SIMD machine with shared memory is the closest architecture modeled by PRAM.
However, PRAM allows different instructions to be executed on different processors
simultaneously.
3. PRAM operates in synchronized MIMD mode with shared memory.
4. EREW and CRCW are the most popular models.
5. The CRCW algorithm runs faster than an equivalent EREW algorithm.
6. PRAM models will be used for scalability and performance comparison.
7. PRAM Models can put an upper bound and lower bound on the performance of a
system.
TO RESOLVE CONCURRENT WRITES
1. Common  all writes store the same value to the memory location.
2. Arbitrary  anyone of the values written may remain, the others are ignored.
3. Minimum  minimum value is selected.
4. Priority  the values written are combined using some associative functions such
as summation or maximum value.
VLSI COMPLEXITY MODEL


Memory Bound in a Chip Area A:
1. There are many computations that are memory bound, due to the need to process large
data sets.
2. The amount of information processed by the chip can be visualized as information flows
upward across the chip area.
3. Each bit can flow through a unit area of the horizontal chip slice. Thus the chip area
bounds the amount of memory bits stored on the chip.
I/O Bound on Volume AT:
1. The volume of the rectangular cube is represented by the product AT.
2. As information flows through the chip for a period of time T, the number of input bits
cannot exceed the volume.
3. This results in I/O limited lower bound on the product AT

ARCHITECTURAL DEVELOPMENT TRACKS
1. The architectures of most existing computers follow certain development tracks.
2. Understanding the features of various tracks provides insights for new architectural
developments. Some of them are:
Multi-Processor Tracks
A Multiprocessor system can be either a shared memory multi-processor or a distributed
memory multi-computer.
Shared Memory Track
1. The diagram represents a track of
multiprocessor development employing a single
address space in the entire system
2. The C.mmp was an UMA multiprocessor.
3. 16 PDP 11/40 processors are interconnected to
16 shared memory modules via crossbar switch
ARCHITECTURAL DEVELOPMENT TRACKS
Shared Memory Track
4. A special inter-processor interrupt is provided for fast inter-process communication besides
the shared memory, besides the shared memory.
5. The NYU Ultra-computer and Illinois Cedar projects were developed with a single address
space.
6. Both systems used multistage networks as
a system interconnect. The major
Achievements in the Cedar project are in
parallel compilers and performance
benchmarking experiments
7. The Standford Dash is a NUMA
multiprocessor with Distributed memories
forming a global address space
ARCHITECTURAL DEVELOPMENT TRACKS
Message Memory Track
1. The Cosmic Cube pioneered the development of message-passing computers
2. Since then Intel has provided a series of medium-grain hypercube computers
3. The nCUBE two also assumes a hypercube configuration.
4. The latest Intel system is the Paragon
5. On the research track the Mosaic C and the MIT J-Machine are the two fine grain multi
computers
ARCHITECTURAL DEVELOPMENT TRACKS
Multi-Vector and SIMD Tracks
ARCHITECTURAL DEVELOPMENT TRACKS
Multi-VectorTracks
1. These are traditional vectors in computers.
2. The CDC 7600 was the first vector dual processor system.
3. Two sub-tracks speed derived from the CDC 7600.
4. Cray 1 pioneered the multi-vector development in 1978
5. The latest cray/MPP is a massively parallel system with a distributed shared memory
6. It is supposed to work as a backend accelerator engine compatible with the existing Cray
Y-MP series
ARCHITECTURAL DEVELOPMENT TRACKS
SIMD Track
1. The Illiac IV pioneered the construction of SIMD computers, even the array processor
concept can be tracked far earlier to the 1960s.
2. The sub-track consisting of the Good Year MPP, the AMT/DAP610, and the TMC/CM-2,
are all SIMD machines built with the bit slice Pes.
3. The CM5 is a synchronized MIMD machine executing in a multiple SIMD mode.
ARCHITECTURAL DEVELOPMENT TRACKS
Multi- Threaded and Data Flow Tracks
ARCHITECTURAL DEVELOPMENT TRACKS
Multi- Threaded Tracks
1. The Multithreading idea was pioneered by Burton Smith (1978) in the HEP system which
extended the concept of score boarding of multiple functional units in the CDC 6400.
2. The latest multi threaded multiprocessor projects are the Tera Computer and the MIT
Alewife.
ARCHITECTURAL DEVELOPMENT TRACKS
Data Flow Tracks
1. It was pioneered by Jack Tennis with the “static” architecture.
2. The concept later inspired the development of “dynamic” data flow computers
3. A series of tagged-token architectures was developed at MIT by Arvind and coworkers.
unit_1.pdf

More Related Content

Similar to unit_1.pdf

Bindura university of science education
Bindura university of science educationBindura university of science education
Bindura university of science education
Innocent Tauzeni
 
1.CPU INSTRUCTION AND EXECUTION CYCLEThe primary function of the .pdf
1.CPU INSTRUCTION AND EXECUTION CYCLEThe primary function of the .pdf1.CPU INSTRUCTION AND EXECUTION CYCLEThe primary function of the .pdf
1.CPU INSTRUCTION AND EXECUTION CYCLEThe primary function of the .pdf
aniyathikitchen
 
Lecture 04 chapter 2 - Parallel Programming Platforms
Lecture 04  chapter 2 - Parallel Programming PlatformsLecture 04  chapter 2 - Parallel Programming Platforms
Lecture 04 chapter 2 - Parallel Programming Platforms
National College of Business Administration & Economics ( NCBA&E)
 
Free Hardware & Networking Slides by ITE Infotech Private Limited
Free Hardware & Networking Slides by ITE Infotech Private LimitedFree Hardware & Networking Slides by ITE Infotech Private Limited
Free Hardware & Networking Slides by ITE Infotech Private Limited
Hemraj Singh Chouhan
 
Co notes3 sem
Co notes3 semCo notes3 sem
Co notes3 sem
dilshad begum
 
Parallel computing persentation
Parallel computing persentationParallel computing persentation
Parallel computing persentation
VIKAS SINGH BHADOURIA
 
Operating Systems
Operating Systems Operating Systems
Operating Systems
Fahad Shaikh
 
Micro controller and dsp processor
Micro controller and dsp processorMicro controller and dsp processor
Micro controller and dsp processor
ShubhamMishra485
 
Unit 1. introduction
Unit 1. introductionUnit 1. introduction
Unit 1. introduction
Kiran Bagale
 
Computer System Architecture
Computer System ArchitectureComputer System Architecture
Computer System Architecture
Brenda Debra
 
Parallel computing in india
Parallel computing in indiaParallel computing in india
Parallel computing in india
Preeti Chauhan
 
Organization of modern digital computers
Organization of modern digital computersOrganization of modern digital computers
Organization of modern digital computers
DrRamaPrasathAssista
 
intro, definitions, basic laws+.pptx
intro, definitions, basic laws+.pptxintro, definitions, basic laws+.pptx
intro, definitions, basic laws+.pptx
ssuser413a98
 
Evaluation of morden computer & system attributes in ACA
Evaluation of morden computer &  system attributes in ACAEvaluation of morden computer &  system attributes in ACA
Evaluation of morden computer & system attributes in ACA
Pankaj Kumar Jain
 
Multilevel arch & str org.& mips, 8086, memory
Multilevel arch & str org.& mips, 8086, memoryMultilevel arch & str org.& mips, 8086, memory
Multilevel arch & str org.& mips, 8086, memoryMahesh Kumar Attri
 
Aca module 1
Aca module 1Aca module 1
Aca module 1
Avinash_N Rao
 
An introduction to digital signal processors 1
An introduction to digital signal processors 1An introduction to digital signal processors 1
An introduction to digital signal processors 1
Hossam Hassan
 
EC8791-U5-PPT.pptx
EC8791-U5-PPT.pptxEC8791-U5-PPT.pptx
EC8791-U5-PPT.pptx
Pavithra525349
 
Parallel computing
Parallel computingParallel computing
Parallel computing
Vinay Gupta
 
Distributed Computing
Distributed ComputingDistributed Computing
Distributed Computing
Sudarsun Santhiappan
 

Similar to unit_1.pdf (20)

Bindura university of science education
Bindura university of science educationBindura university of science education
Bindura university of science education
 
1.CPU INSTRUCTION AND EXECUTION CYCLEThe primary function of the .pdf
1.CPU INSTRUCTION AND EXECUTION CYCLEThe primary function of the .pdf1.CPU INSTRUCTION AND EXECUTION CYCLEThe primary function of the .pdf
1.CPU INSTRUCTION AND EXECUTION CYCLEThe primary function of the .pdf
 
Lecture 04 chapter 2 - Parallel Programming Platforms
Lecture 04  chapter 2 - Parallel Programming PlatformsLecture 04  chapter 2 - Parallel Programming Platforms
Lecture 04 chapter 2 - Parallel Programming Platforms
 
Free Hardware & Networking Slides by ITE Infotech Private Limited
Free Hardware & Networking Slides by ITE Infotech Private LimitedFree Hardware & Networking Slides by ITE Infotech Private Limited
Free Hardware & Networking Slides by ITE Infotech Private Limited
 
Co notes3 sem
Co notes3 semCo notes3 sem
Co notes3 sem
 
Parallel computing persentation
Parallel computing persentationParallel computing persentation
Parallel computing persentation
 
Operating Systems
Operating Systems Operating Systems
Operating Systems
 
Micro controller and dsp processor
Micro controller and dsp processorMicro controller and dsp processor
Micro controller and dsp processor
 
Unit 1. introduction
Unit 1. introductionUnit 1. introduction
Unit 1. introduction
 
Computer System Architecture
Computer System ArchitectureComputer System Architecture
Computer System Architecture
 
Parallel computing in india
Parallel computing in indiaParallel computing in india
Parallel computing in india
 
Organization of modern digital computers
Organization of modern digital computersOrganization of modern digital computers
Organization of modern digital computers
 
intro, definitions, basic laws+.pptx
intro, definitions, basic laws+.pptxintro, definitions, basic laws+.pptx
intro, definitions, basic laws+.pptx
 
Evaluation of morden computer & system attributes in ACA
Evaluation of morden computer &  system attributes in ACAEvaluation of morden computer &  system attributes in ACA
Evaluation of morden computer & system attributes in ACA
 
Multilevel arch & str org.& mips, 8086, memory
Multilevel arch & str org.& mips, 8086, memoryMultilevel arch & str org.& mips, 8086, memory
Multilevel arch & str org.& mips, 8086, memory
 
Aca module 1
Aca module 1Aca module 1
Aca module 1
 
An introduction to digital signal processors 1
An introduction to digital signal processors 1An introduction to digital signal processors 1
An introduction to digital signal processors 1
 
EC8791-U5-PPT.pptx
EC8791-U5-PPT.pptxEC8791-U5-PPT.pptx
EC8791-U5-PPT.pptx
 
Parallel computing
Parallel computingParallel computing
Parallel computing
 
Distributed Computing
Distributed ComputingDistributed Computing
Distributed Computing
 

Recently uploaded

Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
manasideore6
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
BrazilAccount1
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
R&R Consult
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
BrazilAccount1
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
zwunae
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
AafreenAbuthahir2
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
Jayaprasanna4
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
seandesed
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 

Recently uploaded (20)

Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 

unit_1.pdf

  • 1.
  • 2.
  • 3.
  • 4. Elements of a modern computer
  • 5. Elements of a modern computer  Computing problems: the problems for which computer system should be constructed.  Algorithms and data structure: Special algorithms and data structures are needed to specify the computations and communications involved in computing problems.  Hardware resources: Processors, memory, peripheral devices.
  • 6.  Operating system:  System software support: Programs written in High level language. The source code translated into object code by a compiler. • Compiler support: 3 compiler approaches: • 1. Prepocessor: uses a sequential compiler. • 2. Precompiler: requires some program flow analysis, dependence checking towards parallelism detection. • 3. parallelizing compiler: demands a fully developed parallelizing compiler which can automatically detect parallelism.
  • 7. Evolution of Computer Architecture
  • 9. Parallel/Vector computers  Execute programs on MIMD mode.  2 major classes:  1. shared-memory multiprocessors  2. message-passing multicomputers.
  • 10. System attributes to performance  Turn around Time :-It is the time which includes disk and memory access, input and output activities, compilation time , OS overhead and CPU time.  Clock rate and CPI: processor is driven by a clock with a constant cycle time =t. The inverse of cycle time is the clock rate (f=1/t).  The size of the program is determined by its instruction count. (Ic). Different machine instructions may require different number of clock cycles to execute. Therefore CPI (cycles per instruction) becomes an important parameter.
  • 11. Performance factors  Ic= no. of instructions in a given program.  Thus the CPU time needed to execute the program is by finding the product of three factors: T= Ic * CPI * t  Instruction cycle requires cycle of events: instruction fetch, decode, operand fetch, execution and store results.
  • 12.  In that cycle, only decode and execution phases are carried out in the CPU. The remaining three operations may require access to the memory.  Therefore memory cycles is the time needed to complete one memory reference.  Therefore CPI is divided into 2 components= processor cycles and memory cycles.  Depending on the instruction type, the complete instruction cycle may involve one to as many as four memory references. (one for instruction fetch, two for operand fetch, one for storing result).
  • 13.  Therefore T= Ic * (p + m*k) * t  Ic= instruction count  P= number of processor cycles.  M= number of memory references  K= ratio between memory and processor cycle.  t= processor cycle time  T=CPU Time
  • 14. System Attributes  The 5 performance factors are influenced by 4 system attributes:  Instruction-set architecture  Compiler technology  CPU implementation and control  Cache and memory hierarchy
  • 15.
  • 16.  Instruction-set architecture= Ic, p (processor cycle per instruction)  Compiler technology= Ic, p, m (memory references per instruction)  CPU implementation and control= p, t (processor cycle time) total processor time needed.  Cache and memory hierarchy= affects memory access latency = k, t
  • 17. MIPS RATE  MIPS (millions instructions per second)  Throughput rate: how many programs can a system execute per unit time is called system throughput Ws.  In multiprogrammed, the system throughput is often lower than the cpu throughput Wp.  Because of additional system overheads caused by the i/o, complier and os.
  • 18. MIPS  Let C be the total number of clock cycles needed to execute a program.  Therefore CPU time T= C*t= C/F  Furthermore CPI= C/Ic  T= CPI*Ic*t  T= Ic*CPI/f  The processor speed measured in MIPS.
  • 19.  mips rate = Ic/T*10^6  = f/CPI*10^6  = f*Ic/ C*10^6  F is clock rate/.  Wp= f/Ic*CPI
  • 20. Implicit and explicit parallelism
  • 21.  Explicit parallelism requires more effort by the programmer to develop a source program using parallel dialects of C, C++, fortron, pascal. Parallelism is specified in the user program. Here the compiler needs to preserve the parallelism.
  • 22. MULTI-PROCESSORS & MULTI COMPUTERS 1. Two categories of parallel computers a. Shared memory multiprocessor b. Distributed-memory multicomputer Shared-memory multiprocessors: 1. The uniform memory access (UMA) model. 2. The non-uniform memory access (NUMA) 3. The cache-only memory architecture (COMA) Note: 1. These models differ in how the memory and peripheral resources are shared or distributed
  • 23.  The physical memory is uniformly shared by all the processors. All processors have equal access time to all memory words. Each processor may use its own private cache.  Peripherals are also shared in the same fashion.  Multiprocessors are called tightly coupled systems due to high resource sharing.  The system interconnect takes the form of a shared bus, a crossbar switch, or a multistage network. All the processors uniformly share the physical memory UNIFORM MEMORYACCESS
  • 25. 1. UMA model is suitable for general-purpose and time-sharing applications. 2. When all processors have equal access time to all the peripheral devices, the system is called a symmetric multiprocessor. 3. In an asymmetric multiprocessor, only one or a subset of processors are executive capable. 4. An executive or master processor can execute the operating system and handle I/O. The remaining processor has no I/O capabilities, and thus are called the attached processors.
  • 26. NUMA MODEL  In this the access time varies with the location of the memory word. The shared memory is physically distributed to all processors called local memories.  The collection of all local memories forms a global address space accessible by all processors.  It is easier to access local memory with a local processor. The access of remote memory attached to other processors takes longer due to added delay through the interconnection network.  There are 3 memory access patterns:  Fastest is local memory access  Next is global memory access  Slowest is access to remote memory
  • 27.
  • 28. COMA MODEL  The COMA model is a particular case of a NUMA machine, in which the distributed main memories are converted to caches.  There is no memory hierarchy at each processor node.  All the caches form a global address space. The distributed cache directories assist remote cache access.  Other variants of multiprocessor: CC-NUMA (Cache Coherent)
  • 29. Distributed memory multi-computers  The system consists of multiple computers often called nodes, Interconnected by a message passing network.  Each node is an autonomous computer consisting of:  processor  local memory and  sometimes attached disc or I/O peripherals  The message-passing network provides point-to-point static connections among the nodes.  All local memories are private and are accessible only by local processors.  For this reason traditional multi-computers have been called no remote memory access machines (NORMA)
  • 31. MULTI COMPUTER GENERATIONS  Message-passing multi-computers have gone through two generations of development and a nee generation is developing.  1st- 1983-1987- based on processor board technology using hypercube architecture and s/w controlled message switching. E.g. Caltech Cosmic and Intel iPSC/1.  2nd 1988-1992- implemented with mesh-connected architecture and a software environment for medium-grain distributed computing. E.g. Intel paragon and the Parsys Supernode 1000.  The 3rd 1992-1997 is expected to be a fine-grain multicomputer. E.g. MIT-J Machine, Caltech Mosaic
  • 32. VECOTR SUPER COMPUTERS 1. A vector computer is built on top of a scalar processor. 2. The vector processor is attached to the scalar processor as an optional feature 3. Program and data are first loaded into the main memory through a host computer. 4. All instructions are decoded by the scalar control unit. 5. If scalar= goes to scalar processor. 6. If vector= sent to vector control unit.
  • 33. VECTOR PROCESSOR MODEL • The diagram shown is a register-to-register architecture. • Vector registers are used to hold the vector operands, intermediate and final results. • The vector functional pipelines retrieve operands from and put results into the vector registers • Each vector register is equipped with a component counter which keeps track of the component registers used in successive pipeline cycles. • The length of each vector register is usually fixed such as Cray series=fixed length vector registers. • Fujitsu VP2000= dynamically. • Memory-to-memory= vector operands and results are directly retrieved and stored from the main memory. E.g. 512 bits as in cyber 205.
  • 34. SIMD SUPERCOMPUTERS Operational Model An operational model of a SIMD computer is specified by a tuple. 1. M = {N, C, I, M, R} 2. N = no. of processing elements (PEs). 3. C = set of instructions directly executed by the control unit (CU), including scalar and program flow control instructions 4. I = is the set of instructions broadcast by the CU to all PEs for parallel execution. It includes arithmetic, logic, data routing, masking, and other local operations executed by each active PE over data within that PE 5. M = set of masking scheme, where mask partitions the set of PEs into enabled and disabled. 6. R = set of data routing functions, specifying various patterns to be set up in the interconnection network for Inter-PE communications.
  • 35.
  • 36. PRAM AND VLSI MODELS 1. The ideal models provide a convenient framework for developing parallel algorithms without worrying about the implementation details of physical constraints. 2. The models can be used to obtain the theoretical performance bounds on parallel computers or to estimate VLSI complexity on-chip area and execution time before the chip is fabricated. Parallel Random Access Memory 1. Theoretical models of battle computers are: a. Time and Space Complexities b. NP-Completeness c. PRAM Models
  • 37. Parallel Random Access Machine (PRAM) Time and Space complexity 1. The complexity of an algorithm for solving a problem of size “S” on a computer is determined by the execution time and the storage space required. Time Complexity 1. The time complexity is a function of the problem size. 2. The time complexity function in order notation is the asymptotic time complexity of the algorithm. 3. Usually the worst-case time complexity is considered. Space Complexity 1. The space complexity is defined as a function of the problem size “S” 2. The asymptotic space complexity refers to the data storage of large problems. Note: 1. The program storage requirement and the storage for input data are not considered in this. 2. The time complexity of a serial algorithm is simply called serial complexity. 3. The time complexity of a parallel algorithm is called parallel complexity
  • 38. Parallel Random Access Machine (PRAM) NP-Completeness 1. An algorithm has a polynomial complexity if there exists a polynomial p(s) such that the time complexity is O(p(s)) for any problem size “s”. 2. The set of problems having polynomial complexity algorithms is called P-class (Polynomial Class) 3. The set of problems solvable by the nondeterministic algorithm in polynomial time is called NP-class (for nondeterministic polynomial class) 4. Since deterministic algorithms are special cases of the non-deterministic ones we know that capital P is a subset of NP. 5. The P-class problems are computationally tractable while NP-P class problems are .intractable
  • 39. PRAM Models 1. Conventional uni-processor computers were modeled as a random access machine By Sheperdson and Sturgis (1963). 2. A Parallel random–access machine model (PRAM) has been developed by Fortune and Wyllie for modeling idealized parallel computers with zero synchronization or memory access overhead. 3. This PRAM model will be used for parallel algorithm development and for scalability and complexity analysis. 4. An N- Processor PRAM has a globally addressable memory. 5. The shared memory can be distributed among the processors or centralized In one place. 6. These processors work on synchronized read memory, compute, and write memory cycles.
  • 40. Which shared memory the model must specify how congruent read and congruent write of memory is handled. Four memory update options are possible 1. Exclusive read (ER)  This allows only one processor to read from any memory location in each cycle. 2. Exclusive write (EW)  This allows at most one processor to write into a memory location at a time. 3. Concurrent read (CR)  This allows multiple processors to read the same information from the same memory cell in the same cycle. 4. Concurrent write (CW)  It allows simultaneous writes to the same memory. In order to avoid confusion, some policies must be set up to resolve the right conflicts. Note:  Various combinations of the above options lead to several variants of the PRAM model PRAM Models
  • 41. 1. EREW-PRAM model  This model forbids more than one processor from reading or writing the same memory cell simultaneously. 2. CREW-PRAM model  The write conflicts are avoided by mutual exclusion. Current reads to the same memory locations are allowed. 3. ERCW PRAM model  This allows exclusive read or concurrent write to the same memory location 4. CRCW-PRAM model  This model allows either concurrent reads or concurrent writes at the same time. The conflicting rights are resolved by one of the following four policies a. Common  All simultaneous write stores the same value to the hot-spot memory location b. Arbitrary  Any one of the values written may remain, the others are ignored. c. Minimum  The value written by the processor with the minimum index will remain. d. Priority  The value being written are combined using some associative functions such as summation or maximum PRAM VARIANTS
  • 42. DISCREPANCY WITH PHYSICAL MODELS 1. PRAM models idealized parallel computers, in which all memory references and program executions by multiple processors are synchronized without extra stock cost. In reality, these models don’t exist 2. SIMD machine with shared memory is the closest architecture modeled by PRAM. However, PRAM allows different instructions to be executed on different processors simultaneously. 3. PRAM operates in synchronized MIMD mode with shared memory. 4. EREW and CRCW are the most popular models. 5. The CRCW algorithm runs faster than an equivalent EREW algorithm. 6. PRAM models will be used for scalability and performance comparison. 7. PRAM Models can put an upper bound and lower bound on the performance of a system.
  • 43. TO RESOLVE CONCURRENT WRITES 1. Common  all writes store the same value to the memory location. 2. Arbitrary  anyone of the values written may remain, the others are ignored. 3. Minimum  minimum value is selected. 4. Priority  the values written are combined using some associative functions such as summation or maximum value.
  • 45.
  • 46. Memory Bound in a Chip Area A: 1. There are many computations that are memory bound, due to the need to process large data sets. 2. The amount of information processed by the chip can be visualized as information flows upward across the chip area. 3. Each bit can flow through a unit area of the horizontal chip slice. Thus the chip area bounds the amount of memory bits stored on the chip. I/O Bound on Volume AT: 1. The volume of the rectangular cube is represented by the product AT. 2. As information flows through the chip for a period of time T, the number of input bits cannot exceed the volume. 3. This results in I/O limited lower bound on the product AT
  • 47.
  • 48. ARCHITECTURAL DEVELOPMENT TRACKS 1. The architectures of most existing computers follow certain development tracks. 2. Understanding the features of various tracks provides insights for new architectural developments. Some of them are: Multi-Processor Tracks A Multiprocessor system can be either a shared memory multi-processor or a distributed memory multi-computer. Shared Memory Track 1. The diagram represents a track of multiprocessor development employing a single address space in the entire system 2. The C.mmp was an UMA multiprocessor. 3. 16 PDP 11/40 processors are interconnected to 16 shared memory modules via crossbar switch
  • 49. ARCHITECTURAL DEVELOPMENT TRACKS Shared Memory Track 4. A special inter-processor interrupt is provided for fast inter-process communication besides the shared memory, besides the shared memory. 5. The NYU Ultra-computer and Illinois Cedar projects were developed with a single address space. 6. Both systems used multistage networks as a system interconnect. The major Achievements in the Cedar project are in parallel compilers and performance benchmarking experiments 7. The Standford Dash is a NUMA multiprocessor with Distributed memories forming a global address space
  • 50. ARCHITECTURAL DEVELOPMENT TRACKS Message Memory Track 1. The Cosmic Cube pioneered the development of message-passing computers 2. Since then Intel has provided a series of medium-grain hypercube computers 3. The nCUBE two also assumes a hypercube configuration. 4. The latest Intel system is the Paragon 5. On the research track the Mosaic C and the MIT J-Machine are the two fine grain multi computers
  • 52. ARCHITECTURAL DEVELOPMENT TRACKS Multi-VectorTracks 1. These are traditional vectors in computers. 2. The CDC 7600 was the first vector dual processor system. 3. Two sub-tracks speed derived from the CDC 7600. 4. Cray 1 pioneered the multi-vector development in 1978 5. The latest cray/MPP is a massively parallel system with a distributed shared memory 6. It is supposed to work as a backend accelerator engine compatible with the existing Cray Y-MP series
  • 53. ARCHITECTURAL DEVELOPMENT TRACKS SIMD Track 1. The Illiac IV pioneered the construction of SIMD computers, even the array processor concept can be tracked far earlier to the 1960s. 2. The sub-track consisting of the Good Year MPP, the AMT/DAP610, and the TMC/CM-2, are all SIMD machines built with the bit slice Pes. 3. The CM5 is a synchronized MIMD machine executing in a multiple SIMD mode.
  • 54. ARCHITECTURAL DEVELOPMENT TRACKS Multi- Threaded and Data Flow Tracks
  • 55. ARCHITECTURAL DEVELOPMENT TRACKS Multi- Threaded Tracks 1. The Multithreading idea was pioneered by Burton Smith (1978) in the HEP system which extended the concept of score boarding of multiple functional units in the CDC 6400. 2. The latest multi threaded multiprocessor projects are the Tera Computer and the MIT Alewife.
  • 56. ARCHITECTURAL DEVELOPMENT TRACKS Data Flow Tracks 1. It was pioneered by Jack Tennis with the “static” architecture. 2. The concept later inspired the development of “dynamic” data flow computers 3. A series of tagged-token architectures was developed at MIT by Arvind and coworkers.