Top ranking colleges in india

(6.1)
Top Ranking colleges in India
By:
Admission.edhole.com

(6.2) Central Processing Unit Architecture
· Architecture overview
· Machine organization
– von Neumann
· Speeding up CPU operations
–multiple registers
– pipelining
– superscalar and VLIW
· CISC vs. RISC

(6.3) Computer Architecture
· Major components of a computer
–Central Processing Unit (CPU)
–memory
– peripheral devices
· Architecture is concerned with
– internal structures of each
– interconnections
» speed and width
– relative speeds of components
· Want maximum execution speed
– Balance Admission.e idsh oofltee.nco cmritical issue

(6.4) Computer Architecture (continued)
· CPU
– performs arithmetic and logical operations
– synchronous operation
–may consider instruction set architecture
» how machine looks to a programmer
– detailed hardware design

· Memory
– stores programs and data
– organized as
» bit
» byte = 8 bits (smallest addressable location)
» word = 4 bytes (typically; machine dependent)
– instructions consist of operation codes and
addresses oprn
oprn
oprn
addr 1
addr 2
addr 2 addr 3
addr 1
addr 1

· Numeric data representations
– integer (exact representation)
» sign-magnitude
» 2’s complement
s magnitude
•negative values change 0 to 1, add 1
– floating point (approximate representation)
» scientific notation: 0.3481 x 106
» inherently imprecise
» IEEE Standard 754-1985
s exp significand

(6.7) Simple Machine Organization
· Institute for Advanced Studies machine (1947)
– “von Neumann machine”
» ALU performs transfers between memory and
I/O devices
» note two instructions per memory word
main
memory
Input-
Output
Equipment
Arithmetic -
Logic Unit
Program
Control Unit
0 8 20 28 39
op code address op code address

(6.8) Simple Machine Organization (continued)
· ALU does arithmetic and logical comparisons
– AC = accumulator holds results
– MQ = memory-quotient holds second portion of
long results
– MBR = memory buffer register holds data while
operation executes

· Program control determines what computer does
based on instruction read from memory
– MAR = memory address register holds address of
memory cell to be read
– PC = program counter; address of next instruction
to be read
– IR = instruction register holds instruction being
executed
– IBR holds right half of instruction read from memory

· Machine operates on fetch-execute cycle
· Fetch
– PC MAR
– read M(MAR) into MBR
– copy left and right instructions into IR and IBR
· Execute
– address part of IR MAR
– read M(MAR) into MBR
– execute opcode

(6.12) Architecture Families
· Before mid-60’s, every new machine had a
different instruction set architecture
– programs from previous generation didn’t run on
new machine
– cost of replacing software became too large
· IBM System/360 created family concept
– single instruction set architecture
– wide range of price and performance with same
software
· Performance improvements based on different
detailed implementations
– memory path width (1 byte to 8 bytes)
– faster, more complex CPU design
– greater I/O throughput and overlap
· “Software compatibility” now a major issue
– partially offset by high level language (HLL) software

(6.13) Architecture Families

(6.14) Multiple Register Machines
· Initially, machines had only a few registers
– 2 to 8 or 16 common
– registers more expensive than memory
· Most instructions operated between memory
locations
– results had to start from and end up in
memory, so fewer instructions
» although more complex
–means smaller programs and (supposedly)
faster execution
» fewer instructions and data to move between
memory and ALU
· But registers are much faster than memory
A–d3m0i stsimioen.se fdahsoteler.com

(6.15) Multiple Register Machines (continued)
· Also, many operands are reused within a
short time
–waste time loading operand again the next
time it’s needed
· Depending on mix of instructions and
operand use, having many registers may
lead to less traffic to memory and faster
execution
· Most modern machines use a multiple
register architecture
–maximum number about 512, common
number 32 integer, Admission.edhole.com 32 floating point

(6.16) Pipelining
· One way to speed up CPU is to increase
clock rate
– limitations on how fast clock can run to
complete instruction
· Another way is to execute more than one
instruction at one time

(6.17) Pipelining
· Pipelining breaks instruction execution down
into several stages
– put registers between stages to “buffer” data
and control
– execute one instruction
– as first starts second stage, execute second
instruction, etc.
– speedup same as number of stages as long as
pipe is full

(6.18) Pipelining (continued)
· Consider an example with 6 stages
– FI = fetch instruction
– DI = decode instruction
–CO = calculate location of operand
– FO = fetch operand
– EI = execute instruction
–WO = write operand (store result)

(6.19) Pipelining Example
· Executes 9 instructions in 14 cycles rather than 54 for
sequential execution

· Hazards to pipelining
– conditional jump
» instruction 3 branches to instruction 15
» pipeline must be flushed and restarted
– later instruction needs operand being
calculated by instruction still in pipeline
» pipeline stalls until result ready

(6.21) Pipelining Problem Example
· IAs dthmisi srseioanlly.e ad hporoleb.lceomm?

(6.22) Real-life Problem
· Not all instructions execute in one clock
cycle
– floating point takes longer than integer
– fp divide takes longer than fp multiply which
takes longer than fp add
– typical values
» integer add/subtract 1
» memory reference 1
» fp add 2 (make 2 stages)
» fp (or integer) multiply 6 (make 2 stages)
» fp (or integer) divide 15
· Break floating point unit into a sub-pipeline
– execute up to 6 instructions at once

· This is not simple to implement
– note all 6 instructions could finish at the same time!!

(6.24) More Speedup
· Pipelined machines issue one instruction
each clock cycle
– how to speed up CPU even more?
· Issue more than one instruction per clock
cycle

(6.25) Superscalar Architectures
· Superscalar machines issue a variable
number of instructions each clock cycle, up
to some maximum
– instructions must satisfy some criteria of
independence
» simple choice is maximum of one fp and one
integer instruction per clock
» need separate execution paths for each
possible simultaneous instruction issue
– compiled code from non-superscalar
implementation of same architecture runs
unchanged, but slower

(6.26) Superscalar Example
0 1 2 3 4 5 6 7 8 clock
· Each instruction path may be pipelined

(6.27) Superscalar Problem
· Instruction-level parallelism
–what if two successive instructions can’t be
executed in parallel?
» data dependencies, or two instructions of slow
type
· Design machine to increase multiple
execution opportunities

(6.28) VLIW Architectures
· Very Long Instruction Word (VLIW)
architectures store several simple instructions
in one long instruction fetched from memory
– number and type are fixed
» e.g., 2 memory reference, 2 floating point, one
integer
– need one functional unit for each possible
instruction
» 2 fp units, 1 integer unit, 2 MBRs
» all run synchronized
– each instruction is stored in a single word
» requires wider memory communication paths
» many instructions may be empty, meaning
Admiswsiaosnte.edd choodlee.c sopmace

(6.29) VLIW Example
Memory
Ref 1
Memory
Ref 2
FP 1 FP 2 Integer
LD F0, 0(R1) LD F6, 8(R1)
LD F10,
16(R1)
LD F14,
24(R1)
SB
R1,R1,#4
8
LD
F18,32(R1)
LD
F22,40(R1)
AD F4,F0,F2 AD F8,F6,F2
LD
F26,48(R1)
AD
F12,F10,F2
AD
F16,F14,F2

(6.30) Instruction Level Parallelism
· Success of superscalar and VLIW machines
depends on number of instructions that occur
together that can be issued in parallel
– no dependencies
– no branches
· Compilers can help create parallelism
· Speculation techniques try to overcome
branch problems
– assume branch is taken
– execute instructions but don’t let them store
results until Admission.edh ostlae.tcuos mof branch is known

(6.31) CISC vs. RISC
· CISC = Complex Instruction Set Computer
· RISC = Reduced Instruction Set Computer

(6.32) CISC vs. RISC (continued)
· Historically, machines tend to add features
over time
– instruction opcodes
» IBM 70X, 70X0 series went from 24 opcodes to
185 in 10 years
» same time performance increased 30 times
– addressing modes
– special purpose registers
· Motivations are to
– improve efficiency, since complex instructions
can be implemented in hardware and
execute faster
–make life easier for compiler writers
– support more complex higher-level languages

· Examination of actual code indicated many
of these features were not used
· RISC advocates proposed
– simple, limited instruction set
– large number of general purpose registers
» and mostly register operations
– optimized instruction pipeline
· Benefits should include
– faster execution of instructions commonly
used
– faster design Admission.edholea.ncdom implementation

· Comparing some architectures
Year Instr. Instr.
Size
Addr
Modes
Registers
IBM
370/168
1973 208 2 - 6 4 16
VAX
11/780
1978 303 2 - 57 22 16
I 80486 1989 235 1 - 11 11 8
M 88000 1988 51 4 3 32
MIPS
1991 94 4 1 32
R4000
IBM 6000 1990 184 4 2 32

· Which approach is right?
· Typically, RISC takes about 1/5 the design
time
– but CISC have adopted RISC techniques

Top ranking colleges in india

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Top ranking colleges in india

Similar to Top ranking colleges in india (20)

More from Edhole.com

More from Edhole.com (20)

Recently uploaded

Recently uploaded (20)

Top ranking colleges in india