SlideShare a Scribd company logo
1 of 91
Oct. 2014 Computer Architecture, Background and Motivation Slide 1
Part I
Background and Motivation
Oct. 2014 Computer Architecture, Background and Motivation Slide 2
About This Presentation
This presentation is intended to support the use of the textbook
Computer Architecture: From Microprocessors toSupercomputers,
Oxford University Press, 2005, ISBN 0-19-515455-X. It is updated
regularly by the author as part of his teaching of the upper-division
course ECE 154, Introduction to Computer Architecture, at the
University of California, Santa Barbara. Instructors can use these
slides freely in classroom teaching and for other educational
purposes. Any other use is strictly prohibited. © Behrooz Parhami
Edition Released Revised Revised Revised Revised
First June 2003 July 2004 June 2005 Mar. 2006 Jan. 2007
Jan. 2008 Jan. 2009 Jan. 2011 Oct. 2014
Second
Oct. 2014 Computer Architecture, Background and Motivation Slide 3
I Background and Motivation
Topics in This Part
Chapter 1 Combinational Digital Circuits
Chapter 2 Digital Circuits with Memory
Chapter 3 Computer System Technology
Chapter 4 Computer Performance
Provide motivation, paint the big picture, introduce tools:
• Review components used in building digital circuits
• Present an overview of computer technology
• Understand the meaning of computer performance
(or why a 2 GHz processor isn’t 2 as fast as a 1 GHz model)
Oct. 2014 Computer Architecture, Background and Motivation Slide 4
1 Combinational Digital Circuits
First of two chapters containing a review of digital design:
• Combinational, or memoryless, circuits in Chapter 1
• Sequential circuits, with memory, in Chapter 2
Topics in This Chapter
1.1 Signals, Logic Operators, and Gates
1.2 Boolean Functions and Expressions
1.3 Designing Gate Networks
1.4 Useful Combinational Parts
1.5 Programmable Combinational Parts
1.6 Timing and Circuit Considerations
Oct. 2014 Computer Architecture, Background and Motivation Slide 5
1.1 Signals, Logic Operators, and Gates
Figure 1.1 Some basic elements of digital logic circuits, with
operator signs used in this book highlighted.
x  y

AND
Name XOR
OR
NOT
Graphical
symbol
x  y
Operator
sign and
alternate(s)
x  y
x  y
xy
x  y
x 
x or x
_
x y or xy
Arithmetic
expression
x  y 2xy
x  y  xy
1  x
Output
is 1 iff:
Input is 0
Both inputs
are 1s
At least one
input is 1
Inputs are
not equal
Add: 1 + 1 = 10
Oct. 2014 Computer Architecture, Background and Motivation Slide 6
The Arithmetic Substitution Method
z = 1 – z NOT converted to arithmetic form
xy AND same as multiplication
(when doing the algebra, set zk = z)
x  y = x + y  xy OR converted to arithmetic form
x  y = x + y  2xy XOR converted to arithmetic form
Example: Prove the identity xyz  x  y  z ? 1
LHS = [xyz  x]  [y  z]
= [xyz + 1 – x – (1 – x)xyz]  [1 – y + 1 – z – (1 – y)(1 – z)]
= [xyz + 1 – x]  [1 – yz]
= (xyz + 1 – x) + (1 – yz) – (xyz + 1 – x)(1 – yz)
= 1 + xy2z2 – xyz
= 1 = RHS This is addition,
not logical OR
Oct. 2014 Computer Architecture, Background and Motivation Slide 7
Variations in Gate Symbols
Figure 1.2 Gates with more than two inputs and/or with
inverted signals at input or output.
OR NOR
NAND
AND XNOR
Bubble = Inverter
Oct. 2014 Computer Architecture, Background and Motivation Slide 8
Gates as Control Elements
Figure 1.3 An AND gate and a tristate buffer act as controlled switches
or valves. An inverting buffer is logically the same as a NOT gate.
Enable/Pass signal
e
Data in
x
Data out
x or 0
Data in
x
Enable/Pass signal
e
Data out
x or “high impedance”
(a) AND gate for controlled transfer (b) Tristate buffer
(c) Model for AND switch.
x
e
No data
or x
0
1
x
e
ex
0
1
0
(d) Model for tristate buffer.
Oct. 2014 Computer Architecture, Background and Motivation Slide 9
Wired OR and Bus Connections
Figure 1.4 Wired OR allows tying together of several
controlled signals.
e
e
e Data out
(x, y, z,
or high
impedance)
(b) Wired OR of tristate outputs
e
e
e
Data out
(x, y, z, or 0)
(a) Wired OR of product terms
z
x
y
z
x
y
z
x
y
z
x
y
Oct. 2014 Computer Architecture, Background and Motivation Slide 10
Control/Data Signals and Signal Bundles
Figure 1.5 Arrays of logic gates represented by a single gate symbol.
/
8
/
8
/
8
Compl
/
32
/
k
/
32
Enable
/
k
/
k
/
k
(b) 32 AND gates (c) k XOR gates
(a) 8 NOR gates
Oct. 2014 Computer Architecture, Background and Motivation Slide 11
1.2 Boolean Functions and Expressions
Ways of specifying a logic function
 Truth table: 2n row, “don’t-care” in input or output
 Logic expression: w (x  y  z), product-of-sums,
sum-of-products, equivalent expressions
 Word statement: Alarm will sound if the door
is opened while the security system is engaged,
or when the smoke detector is triggered
 Logic circuit diagram: Synthesis vs analysis
Oct. 2014 Computer Architecture, Background and Motivation Slide 12
Table 1.2 Laws (basic identities) of Boolean algebra.
Name of law OR version AND version
Identity x  0 = x x 1 = x
One/Zero x  1 = 1 x 0 = 0
Idempotent x  x = x x x = x
Inverse x  x  = 1 x x  = 0
Commutative x  y = y  x x y = y x
Associative (x  y)  z = x  (y  z) (x y) z = x (y z)
Distributive x  (y z) = (x  y) (x  z) x (y  z) = (x y)  (x z)
DeMorgan’s (x  y) = x  y  (x y) = x   y 
Manipulating Logic Expressions
Oct. 2014 Computer Architecture, Background and Motivation Slide 13
Proving the Equivalence of Logic Expressions
Example 1.1
 Truth-table method: Exhaustive verification
 Arithmetic substitution
x  y = x + y  xy
x  y = x + y  2xy
 Case analysis: two cases, x = 0 or x = 1
 Logic expression manipulation
Example: x  y ? xyxy
x + y – 2xy ? (1–x)y + x(1–y) – (1–x)yx(1–y)
Oct. 2014 Computer Architecture, Background and Motivation Slide 14
1.3 Designing Gate Networks
 AND-OR, NAND-NAND, OR-AND, NOR-NOR
 Logic optimization: cost, speed, power dissipation
(a) AND-OR circuit
z
x
y
x
y
z
(b) Intermediate circuit (c) NAND-NAND equivalent
z
x
y
x
y
z
z
x
y
x
y
z
Figure 1.6 A two-level AND-OR circuit and two equivalent circuits.
(a  b  c) = abc
Oct. 2014 Computer Architecture, Background and Motivation Slide 15
Seven-Segment Display of Decimal Digits
Figure 1.7 Seven-segment display
of decimal digits. The three open
segments may be optionally used.
The digit 1 can be displayed in two
ways, with the more common right-
side version shown.
Optional segment
I swear I didn’t use a calculator!
Oct. 2014 Computer Architecture, Background and Motivation Slide 16
BCD-to-Seven-Segment Decoder
Example 1.2
Figure 1.8 The logic circuit that generates the enable signal for the
lowermost segment (number 3) in a seven-segment display unit.
x3
x2
x1
x0
Signals to
enable or
turn on the
segments
4-bit input in [0, 9]
e0
e5
e6
e4
e2
e1
e3
1
2
4
5
0
3
6
Oct. 2014 Computer Architecture, Background and Motivation Slide 17
1.4 Useful Combinational Parts
 High-level building blocks
 Much like prefab parts used in building a house
 Arithmetic components (adders, multipliers, ALUs)
will be covered in Part III
 Here we cover three useful parts:
multiplexers, decoders/demultiplexers, encoders
Oct. 2014 Computer Architecture, Background and Motivation Slide 18
Multiplexers
Figure 1.9 Multiplexer (mux), or selector, allows one of several inputs
to be selected and routed to output depending on the binary value of a
set of selection or address signals provided to it.
x
x
y
z
1
0
x
x
z
y
x
x
y
z
1
0
y
/
32
/
32
/
32 1
0
1
0
3
2
z
y1 0
1
0
1
0
y1
y0
y0
(a) 2-to-1 mux (b) Switch view (c) Mux symbol
(d) Mux array (e) 4-to-1 mux with enable (e) 4-to-1 mux design
0
1
y
1
1
1
0
0
0
x
x
x
x
1
0
2
3
x
x
x
x
0
1
2
3
z
e (Enable)
1 0
x2
Computer Architecture, Background and Motivation Slide 19
Decoders/Demultiplexers
Figure 1.10 A decoder allows the selection of one of 2a options using
an a-bit address as input. A demultiplexer (demux) is a decoder that
only selects an output if its enable signal is asserted.
y1
y0
x0
x3
x2
x1
1
0
3
2
y1
y0
x0
x3
x2
x1
e
1
0
3
2
y1
y0
x0
x3
x2
x1
(a) 2-to-4 decoder (b) Decoder symbol
(c) Demultiplexer, or
decoder with “enable”
(Enable)
1
1 0
1
1 0
1
1
1
Oct. 2014
Oct. 2014 Computer Architecture, Background and Motivation Slide 20
Encoders
Figure 1.11 A 2a-to-a encoder outputs an a-bit binary number
equal to the index of the single 1 among its 2a inputs.
(a) 4-to-2 encoder (b) Encoder symbol
x0
x3
x2
x1
y1
y0
1
0
3
2
x0
x3
x2
x1
y1
y0
1 0
1
0
0
0
Oct. 2014 Computer Architecture, Background and Motivation Slide 21
1.5 Programmable Combinational Parts
 Programmable ROM (PROM)
 Programmable array logic (PAL)
 Programmable logic array (PLA)
A programmable combinational part can do the job of
many gates or gate networks
Programmed by cutting existing connections (fuses)
or establishing new connections (antifuses)
Oct. 2014 Computer Architecture, Background and Motivation Slide 22
PROMs
Figure 1.12 Programmable connections and their use in a PROM.
. . .
.
.
.
Inputs
Outputs
(a) Programmable
OR gates
w
x
y
z
(b) Logic equivalent
of part a
w
x
y
z
(c) Programmable read-only
memory (PROM)
Decoder
Oct. 2014 Computer Architecture, Background and Motivation Slide 23
PALs and PLAs
Figure 1.13 Programmable combinational logic: general structure and
two classes known as PAL and PLA devices. Not shown is PROM with
fixed AND array (a decoder) and programmable OR array.
AND
array
(AND
plane)
OR
array
(OR
plane)
. . .
. . .
.
.
.
Inputs
Outputs
(a) General programmable
combinational logic
(b) PAL: programmable
AND array, fixed OR array
8-input
ANDs
(c) PLA: programmable
AND and OR arrays
6-input
ANDs
4-input
ORs
Oct. 2014 Computer Architecture, Background and Motivation Slide 24
1.6 Timing and Circuit Considerations
 Gate delay d: a fraction of, to a few, nanoseconds
 Wire delay, previously negligible, is now important
(electronic signals travel about 15 cm per ns)
 Circuit simulation to verify function and timing
Changes in gate/circuit output, triggered by changes in its
inputs, are not instantaneous
Oct. 2014 Computer Architecture, Background and Motivation Slide 25
Glitching
Figure 1.14 Timing diagram for a circuit that exhibits glitching.
x = 0
y
z
a = x  y
f = a  z
2d 2d
Using the PAL in Fig. 1.13b to implement f = x  y  z
AND-OR
(PAL)
AND-OR
(PAL)
x
y
z
a
f
Oct. 2014 Computer Architecture, Background and Motivation Slide 26
CMOS Transmission Gates
Figure 1.15 A CMOS transmission gate and its use in building
a 2-to-1 mux.
z
x
x
0
1
(a) CMOS transmission gate:
circuit and symbol
(b) Two-input mux built of two
transmission gates
TG
TG
TG
y
P
N
Oct. 2014 Computer Architecture, Background and Motivation Slide 27
2 Digital Circuits with Memory
Second of two chapters containing a review of digital design:
• Combinational (memoryless) circuits in Chapter 1
• Sequential circuits (with memory) in Chapter 2
Topics in This Chapter
2.1 Latches, Flip-Flops, and Registers
2.2 Finite-State Machines
2.3 Designing Sequential Circuits
2.4 Useful Sequential Parts
2.5 Programmable Sequential Parts
2.6 Clocks and Timing of Events
Oct. 2014 Computer Architecture, Background and Motivation Slide 28
2.1 Latches, Flip-Flops, and Registers
Figure 2.1 Latches, flip-flops, and registers.
R
Q
Q
S
D
Q
Q
C
Q
Q
D
C
(a) SR latch (b) D latch
Q
C Q
D
Q
C Q
D
(e) k-bit register
(d) D flip-flop symbol
(c) Master-slave D flip-flop
Q
C Q
D
FF
/ /
k k
Q
C Q
D
FF
R
S
Oct. 2014 Computer Architecture, Background and Motivation Slide 29
Latches vs Flip-Flops
Figure 2.2 Operations of D latch and
negative-edge-triggered D flip-flop.
D
C
D latch: Q
D FF: Q
Setup
time
Setup
time
Hold
time
Hold
time
D
C
Q
Q
D
C
Q
Q
FF
Oct. 2014 Computer Architecture, Background and Motivation Slide 30
Reading and Modifying FFs in the Same Cycle
Figure 2.3 Register-to-register operation with edge-triggered
flip-flops.
/ /
k k
Q
C Q
D
FF
/ /
k k
Q
C Q
D
FF
Computation module
(combinational logic)
Clock
Propagation delay
Combinational delay
Oct. 2014 Computer Architecture, Background and Motivation Slide 31
2.2 Finite-State Machines
Example 2.1
Figure 2.4 State table and state diagram for a vending
machine coin reception unit.
Dime
Dime
Quarter
Dime
Quarter
Dime
Quarter
Dime
Quarter
Reset
Reset
Reset
Reset
Reset
Start
Quarter
S00
S10
S20
S25
S30
S35
S10
S25
S00
S00
S00
S00
S00
S00
S20
S35
S35
S35
S35
S35
S35
S30
S35
S35
------- Input -------
Dime
Quarter
Reset
Current
state
S00
S35
is the initial state
is the final state
Next state
Dime
Quarter
S00
S10
S20
S25
S30
S35
Oct. 2014 Computer Architecture, Background and Motivation Slide 32
Sequential Machine Implementation
Figure 2.5 Hardware realization of Moore and Mealy
sequential machines.
Next-state
logic
State register
/
n
/
m
/
l
Inputs Outputs
Next-state
excitation
signals
Present
state
Output
logic
Only for Mealy machine
Oct. 2014 Computer Architecture, Background and Motivation Slide 33
2.3 Designing Sequential Circuits
Example 2.3
Figure 2.7 Hardware realization of a coin reception unit (Example 2.3).
Output
Q
C Q
D
e
Inputs
Q
C Q
D
Q
C Q
D
FF2
FF1
FF0
q
d
Quarter in
Dime in
Final
state
is 1xx
Oct. 2014 Computer Architecture, Background and Motivation Slide 34
2.4 Useful Sequential Parts
 High-level building blocks
 Much like prefab closets used in building a house
 Other memory components will be covered in
Chapter 17 (SRAM details, DRAM, Flash)
 Here we cover three useful parts:
shift register, register file (SRAM basics), counter
Oct. 2014 Computer Architecture, Background and Motivation Slide 35
Shift Register
Figure 2.8 Register with single-bit left shift and parallel load
capabilities. For logical left shift, serial data in line is connected to 0.
Parallel data in
/
k
/
k
/
k
Shift
Q
C Q
D
FF
1
0
Serial data in
/
k – 1 LSBs
Load
Parallel data out
Serial data out
MSB
0 1 0 0 1 1 1 0
Oct. 2014 Computer Architecture, Background and Motivation Slide 36
Register File and FIFO
Figure 2.9 Register file with random access and FIFO.
Decoder
/
k
/
k
/
h
Write
enable
Read address 0
Read address 1
Read
data 0
Write
data
Read
enable
2 k-bit registers
h
/
k
/
k
/
k
/
k
/
k
/
k
/
k
/
h
Write
address
Muxes
Read
data 1
/
k
/
h
/
h
/
h
/
k
/
h
Write enable
Read
addr 0
/
k
/
k
Read
addr 1
Write
data
Write
addr
Read
data 0
Read enable
Read
data 1
(a) Register file with random access
(b) Graphic symbol
for register file
Q
C Q
D
FF
/
k
Q
C Q
D
FF
Q
C Q
D
FF
Q
C Q
D
FF
/
k
Push
/
k
Input Output
Pop
Full
Empty
(c) FIFO symbol
Oct. 2014 Computer Architecture, Background and Motivation Slide 37
SRAM
Figure 2.10 SRAM memory is simply a large, single-port register file.
Column mux
Row
decoder
/
h
Address
Square or
almost square
memory matrix
Row buffer
Row
Column
g bits data out
/
g
/
h
Write enable
/
g
Data in
Address
Data out
Output
enable
Chip
select
.
.
.
. . .
. . .
(a) SRAM block diagram (b) SRAM read mechanism
Oct. 2014 Computer Architecture, Background and Motivation Slide 38
Binary Counter
Figure 2.11 Synchronous binary counter with initialization capability.
Count register
Mux
Incrementer
0
Input
Load
IncrInit
x + 1
x
0 1
1
c in
cout
Oct. 2014 Computer Architecture, Background and Motivation Slide 39
2.5 Programmable Sequential Parts
 Programmable array logic (PAL)
 Field-programmable gate array (FPGA)
 Both types contain macrocells and interconnects
A programmable sequential part contain gates and
memory elements
Programmed by cutting existing connections (fuses)
or establishing new connections (antifuses)
Oct. 2014 Computer Architecture, Background and Motivation Slide 40
PAL and FPGA
Figure 2.12 Examples of programmable sequential logic.
(a) Portion of PAL with storable output (b) Generic structure of an FPGA
8-input
ANDs
D
C
Q
Q
FF
Mux
Mux
0 1
0 1
I/O blocks
Configurable
logic block
Programmable
connections
CLB
CLB
CLB
CLB
Oct. 2014 Computer Architecture, Background and Motivation Slide 41
2.6 Clocks and Timing of Events
Clock is a periodic signal: clock rate = clock frequency
The inverse of clock rate is the clock period: 1 GHz  1 ns
Constraint: Clock period  tprop + tcomb + tsetup + tskew
Figure 2.13 Determining the required length of the clock period.
Other inputs
Combinational
logic
Clock period
FF1 begins
to change
FF1 change
observed
Must be wide enough
to accommodate
worst-case delays
Clock1 Clock2
Q
C Q
D
FF2
Q
C Q
D
FF1
Oct. 2014 Computer Architecture, Background and Motivation Slide 42
Synchronization
Figure 2.14 Synchronizers are used to prevent timing problems
arising from untimely changes in asynchronous signals.
Asynch
input
Asynch
input
Synch
version
Synch
version
Asynch
input
Synch
version
Clock
(a) Simple synchronizer (b) Two-FF synchronizer
(c) Input and output waveforms
Q
C Q
D
FF
Q
C Q
D
FF2
Q
C Q
D
FF1
Oct. 2014 Computer Architecture, Background and Motivation Slide 43
Level-Sensitive Operation
Figure 2.15 Two-phase clocking with nonoverlapping clock signals.
Combi-
national
logic
1

1
Clock period

Q
C Q
D
Latch
1

Q
C Q
D
Latch
Other inputs
Combi-
national
logic
2

2

Clocks with
nonoverlapping
highs
Other inputs
Q
C Q
Latch
D
Oct. 2014 Computer Architecture, Background and Motivation Slide 44
3 Computer System Technology
Interplay between architecture, hardware, and software
• Architectural innovations influence technology
• Technological advances drive changes in architecture
Topics in This Chapter
3.1 From Components to Applications
3.2 Computer Systems and Their Parts
3.3 Generations of Progress
3.4 Processor and Memory Technologies
3.5 Peripherals, I/O, and Communications
3.6 Software Systems and Applications
Oct. 2014 Computer Architecture, Background and Motivation Slide 45
3.1 From Components to Applications
Figure 3.1 Subfields or views in computer system engineering.
High-
level
view
Computer
designer
Circuit
designer
Application
designer
System
designer
Logic
designer
Software Hardware
Computer organization
Low-
level
view
Application
domains
Electronic
components
Computer architecture
Oct. 2014 Computer Architecture, Background and Motivation Slide 46
What Is (Computer) Architecture?
Figure 3.2 Like a building architect, whose place at the
engineering/arts and goals/means interfaces is seen in this diagram, a
computer architect reconciles many conflicting or competing demands.
Architect
Interface
Interface
Goals
Means
Arts
Engineering
Client’s taste:
mood, style, . . .
Client’s requirements:
function, cost, . . .
The world of arts:
aesthetics, trends, . . .
Construction technology:
material, codes, . . .
Oct. 2014 Computer Architecture, Background and Motivation Slide 47
3.2 Computer Systems and Their Parts
Figure 3.3 The space of computer systems, with what we normally
mean by the word “computer” highlighted.
Computer
Analog
Fixed-function Stored-program
Electronic Nonelectronic
General-purpose Special-purpose
Number cruncher Data manipulator
Digital
Oct. 2014 Computer Architecture, Background and Motivation Slide 48
Price/Performance Pyramid
Figure 3.4 Classifying computers by computational
power and price range.
Embedded
Personal
Workstation
Server
Mainframe
Super $Millions
$100s Ks
$10s Ks
$1000s
$100s
$10s
Differences in scale,
not in substance
Oct. 2014 Computer Architecture, Background and Motivation Slide 49
Automotive Embedded Computers
Figure 3.5 Embedded computers are ubiquitous, yet invisible. They
are found in our automobiles, appliances, and many other places.
Engine
Impact sensors
Navigation &
entertainment
Central
controller
Brakes
Airbags
Oct. 2014 Computer Architecture, Background and Motivation Slide 50
Personal Computers and Workstations
Figure 3.6 Notebooks, a common class of portable computers,
are much smaller than desktops but offer substantially the same
capabilities. What are the main reasons for the size difference?
Oct. 2014 Computer Architecture, Background and Motivation Slide 51
Digital Computer Subsystems
Figure 3.7 The (three, four, five, or) six main units of a digital
computer. Usually, the link unit (a simple bus or a more elaborate
network) is not explicitly included in such diagrams.
Memory
Link Input/Output
To/from network
Processor
Control
Datapath
Input
Output
CPU I/O
Oct. 2014 Computer Architecture, Background and Motivation Slide 52
3.3 Generations of Progress
Table 3.2 The 5 generations of digital computers, and their ancestors.
Generation
(begun)
Processor
technology
Memory
innovations
I/O devices
introduced
Dominant
look & fell
0 (1600s) (Electro-)
mechanical
Wheel, card Lever, dial,
punched card
Factory
equipment
1 (1950s) Vacuum tube Magnetic
drum
Paper tape,
magnetic tape
Hall-size
cabinet
2 (1960s) Transistor Magnetic
core
Drum, printer,
text terminal
Room-size
mainframe
3 (1970s) SSI/MSI RAM/ROM
chip
Disk, keyboard,
video monitor
Desk-size
mini
4 (1980s) LSI/VLSI SRAM/DRAM Network, CD,
mouse,sound
Desktop/
laptop micro
5 (1990s) ULSI/GSI/
WSI, SOC
SDRAM,
flash
Sensor/actuator,
point/click
Invisible,
embedded
Oct. 2014 Computer Architecture, Background and Motivation Slide 53
Figure 3.8 The manufacturing process for an IC part.
IC Production and Yield
15-30
cm
30-60 cm
Silicon
crystal
ingot
Slicer
Processing:
20-30 steps
Blank wafer
with defects
x x
x x x
x x
x x
x x
0.2 cm
Patterned wafer
(100s of simple or scores
of complex processors)
Dicer
Die
~1 cm
Good
die
~1 cm
Die
tester
Microchip
or other part
Mounting
Part
tester
Usable
part
to ship
Oct. 2014 Computer Architecture, Background and Motivation Slide 54
Figure 3.9 Visualizing the dramatic decrease in yield
with larger dies.
Effect of Die Size on Yield
120 dies, 109 good 26 dies, 15 good
Die yield =def (number of good dies) / (total number of dies)
Die yield = Wafer yield  [1 + (Defect density  Die area) / a]–a
Die cost = (cost of wafer) / (total number of dies  die yield)
= (cost of wafer)  (die area / wafer area) / (die yield)
Oct. 2014 Computer Architecture, Background and Motivation Slide 55
3.4 Processor and Memory Technologies
Figure 3.11 Packaging of processor, memory, and other components.
PC board
Backplane
Memory
CPU
Bus
Connector
(b) 3D packaging of the future
(a) 2D or 2.5D packaging now common
Stacked layers
glued together
Interlayer connections
deposited on the
outside of the stack
Die
Oct. 2014 Computer Architecture, Background and Motivation Slide 56
Figure 3.10 Trends in processor performance and DRAM
memory chip capacity (Moore’s law).
Moore’s
Law
1Mb
1990
1980 2000 2010
kIPS
MIPS
GIPS
TIPS
Processor
performance
Calendar year
80286
68000
80386
80486
68040
Pentium
Pentium II
R10000
1.6 / yr
10 / 5 yrs
2 / 18 mos
64Mb
4Mb
64kb
256kb
256Mb
1Gb
16Mb
4 / 3 yrs
Processor
Memory
kb
Mb
Gb
Tb
Memory
chip
capacity
Oct. 2014 Computer Architecture, Background and Motivation Slide 57
Pitfalls of Computer Technology Forecasting
“DOS addresses only 1 MB of RAM because we cannot
imagine any applications needing more.” Microsoft, 1980
“640K ought to be enough for anybody.” Bill Gates, 1981
“Computers in the future may weigh no more than 1.5
tons.” Popular Mechanics
“I think there is a world market for maybe five
computers.” Thomas Watson, IBM Chairman, 1943
“There is no reason anyone would want a computer in
their home.” Ken Olsen, DEC founder, 1977
“The 32-bit machine would be an overkill for a personal
computer.” Sol Libes, ByteLines
Oct. 2014 Computer Architecture, Background and Motivation Slide 58
3.5 Input/Output and Communications
Figure 3.12 Magnetic and optical disk memory units.
(a) Cutaway view of a hard disk drive (b) Some removable storage media
Typically
2-9 cm
Floppy
disk
CD-ROM
Magnetic
tape
cartridge
. .
.
.
.
.
.
.
Oct. 2014 Computer Architecture, Background and Motivation Slide 59
Figure 3.13 Latency and bandwidth characteristics of different
classes of communication links.
Communication
Technologies
3
6
9
12
9 6 3 3
Bandwidth
(b/s)
Latency (s)
10
10
10
10
10 10 10 1 10
Processor
bus
I/O
network
System-area
network
(SAN) Local-area
network
(LAN)
Metro-area
network
(MAN)
Wide-area
network
(WAN)
Geographically distributed
Same geographic location
(ns) (s) (ms) (min) (h)
Oct. 2014 Computer Architecture, Background and Motivation Slide 60
3.6 Software Systems and Applications
Figure 3.15 Categorization of software, with examples in each class.
Software
Application:
word processor,
spreadsheet,
circuit simulator,
. . .
Operating system Translator:
MIPS assembler,
C compiler,
. . .
System
Manager:
virtual memory,
security,
file system,
. . .
Coordinator:
scheduling,
load balancing,
diagnostics,
. . .
Enabler:
disk driver,
display driver,
printing,
. . .
Oct. 2014 Computer Architecture, Background and Motivation Slide 61
Figure 3.14 Models and abstractions in programming.
High- vs Low-Level Programming
Compiler
Assembler
Interpreter
temp=v[i]
v[i]=v[i+1]
v[i+1]=temp
Swap v[i]
and v[i+1]
add $2,$5,$5
add $2,$2,$2
add $2,$4,$2
lw $15,0($2)
lw $16,4($2)
sw $16,0($2)
sw $15,4($2)
jr $31
00a51020
00421020
00821020
8c620000
8cf20004
acf20000
ac620004
03e00008
Very
high-level
language
objectives
or tasks
High-level
language
statements
Assembly
language
instructions,
mnemonic
Machine
language
instructions,
binary (hex)
One task =
many statements
One statement =
several instructions
Mostly one-to-one
More abstract, machine-independent;
easier to write, read, debug, or maintain
More concrete, machine-specific, error-prone;
harder to write, read, debug, or maintain
Oct. 2014 Computer Architecture, Background and Motivation Slide 62
4 Computer Performance
Performance is key in design decisions; also cost and power
• It has been a driving force for innovation
• Isn’t quite the same as speed (higher clock rate)
Topics in This Chapter
4.1 Cost, Performance, and Cost/Performance
4.2 Defining Computer Performance
4.3 Performance Enhancement and Amdahl’s Law
4.4 Performance Measurement vs Modeling
4.5 Reporting Computer Performance
4.6 The Quest for Higher Performance
Oct. 2014 Computer Architecture, Background and Motivation Slide 63
4.1 Cost, Performance, and Cost/Performance
1980
1960 2000 2020
$1
Compute
r
cost
Calendar year
$1 K
$1 M
$1 G
Oct. 2014 Computer Architecture, Background and Motivation Slide 64
Figure 4.1 Performance improvement as a function of cost.
Cost/Performance
Performance
Cost
Superlinear:
economy of
scale
Sublinear:
diminishing
returns
Linear
(ideal?)
Oct. 2014 Computer Architecture, Background and Motivation Slide 65
4.2 Defining Computer Performance
Figure 4.2 Pipeline analogy shows that imbalance between processing
power and I/O capabilities leads to a performance bottleneck.
Processing
Input Output
CPU-bound task
I/O-bound task
Oct. 2014 Computer Architecture, Background and Motivation Slide 66
Six Passenger Aircraft to Be Compared
B 747
DC-8-50
Oct. 2014 Computer Architecture, Background and Motivation Slide 67
Performance of Aircraft: An Analogy
Table 4.1 Key characteristics of six passenger aircraft: all figures
are approximate; some relate to a specific model/configuration of
the aircraft or are averages of cited range of values.
Aircraft Passengers
Range
(km)
Speed
(km/h)
Price
($M)
Airbus A310 250 8 300 895 120
Boeing 747 470 6 700 980 200
Boeing 767 250 12 300 885 120
Boeing 777 375 7 450 980 180
Concorde 130 6 400 2 200 350
DC-8-50 145 14 000 875 80
Speed of sound  1220 km / h
Oct. 2014 Computer Architecture, Background and Motivation Slide 68
Different Views of Performance
Performance from the viewpoint of a passenger: Speed
Note, however, that flight time is but one part of total travel time.
Also, if the travel distance exceeds the range of a faster plane,
a slower plane may be better due to not needing a refueling stop
Performance from the viewpoint of an airline: Throughput
Measured in passenger-km per hour (relevant if ticket price were
proportional to distance traveled, which in reality it is not)
Airbus A310 250  895 = 0.224 M passenger-km/hr
Boeing 747 470  980 = 0.461 M passenger-km/hr
Boeing 767 250  885 = 0.221 M passenger-km/hr
Boeing 777 375  980 = 0.368 M passenger-km/hr
Concorde 130  2200 = 0.286 M passenger-km/hr
DC-8-50 145  875 = 0.127 M passenger-km/hr
Performance from the viewpoint of FAA: Safety
Oct. 2014 Computer Architecture, Background and Motivation Slide 69
Cost Effectiveness: Cost/Performance
Table 4.1 Key characteristics of six passenger
aircraft: all figures are approximate; some relate to
a specific model/configuration of the aircraft or are
averages of cited range of values.
Aircraft Passen-
gers
Range
(km)
Speed
(km/h)
Price
($M)
A310 250 8 300 895 120
B 747 470 6 700 980 200
B 767 250 12 300 885 120
B 777 375 7 450 980 180
Concorde 130 6 400 2 200 350
DC-8-50 145 14 000 875 80
Cost /
Performance
536
434
543
489
1224
630
Smaller
values
better
Throughput
(M P km/hr)
0.224
0.461
0.221
0.368
0.286
0.127
Larger
values
better
Oct. 2014 Computer Architecture, Background and Motivation Slide 70
Concepts of Performance and Speedup
Performance = 1 / Execution time is simplified to
Performance = 1 / CPU execution time
(Performance of M1) / (Performance of M2) = Speedup of M1 over M2
= (Execution time of M2) / (Execution time M1)
Terminology: M1 is x times as fast as M2 (e.g., 1.5 times as fast)
M1 is 100(x – 1)% faster than M2 (e.g., 50% faster)
CPU time = Instructions  (Cycles per instruction)  (Secs per cycle)
= Instructions  CPI / (Clock rate)
Instruction count, CPI, and clock rate are not completely independent,
so improving one by a given factor may not lead to overall execution
time improvement by the same factor.
Oct. 2014 Computer Architecture, Background and Motivation Slide 71
Elaboration on the CPU Time Formula
CPU time = Instructions  (Cycles per instruction)  (Secs per cycle)
= Instructions  Average CPI / (Clock rate)
Clock period
Clock rate: 1 GHz = 109 cycles / s (cycle time 10–9 s = 1 ns)
200 MHz = 200  106 cycles / s (cycle time = 5 ns)
Average CPI: Is calculated based on the dynamic instruction mix
and knowledge of how many clock cycles are needed
to execute various instructions (or instruction classes)
Instructions: Number of instructions executed, not number of
instructions in our program (dynamic count)
Oct. 2014 Computer Architecture, Background and Motivation Slide 72
Dynamic Instruction Count
250 instructions
for i = 1, 100 do
20 instructions
for j = 1, 100 do
40 instructions
for k = 1, 100 do
10 instructions
endfor
endfor
endfor
How many instructions
are executed in this
program fragment?
Each “for” consists of two instructions:
increment index, check exit condition
2 + 40 + 1200 instructions
100 iterations
124,200 instructions in all
2 + 10 instructions
100 iterations
1200 instructions in all
2 + 20 + 124,200 instructions
100 iterations
12,422,200 instructions in all
12,422,450 Instructions
for i = 1, n
while x > 0
Static count = 326
Oct. 2014 Computer Architecture, Background and Motivation Slide 73
Figure 4.3 Faster steps do not necessarily
mean shorter travel time.
Faster Clock  Shorter Running Time
1 GHz
2 GHz
4 steps
Solution
20 steps
Suppose addition takes 1 ns
Clock period = 1 ns; 1 cycle
Clock period = ½ ns; 2 cycles
In this example, addition time
does not improve in going from
1 GHz to 2 GHz clock
Oct. 2014 Computer Architecture, Background and Motivation Slide 74
0
10
20
30
40
50
0 10 20 30 40 50
Enhancement factor (p)
Speedup
(s
)
f = 0
f = 0.1
f = 0.05
f = 0.02
f = 0.01
4.3 Performance Enhancement: Amdahl’s Law
Figure 4.4 Amdahl’s law: speedup achieved if a fraction f of a
task is unaffected and the remaining 1 – f part runs p times as fast.
s =
 min(p, 1/f)
1
f+(1–f)/p
f = fraction
unaffected
p = speedup
of the rest
Oct. 2014 Computer Architecture, Background and Motivation Slide 75
Example 4.1
Amdahl’s Law Used in Design
A processor spends 30% of its time on flp addition, 25% on flp mult,
and 10% on flp division. Evaluate the following enhancements, each
costing the same to implement:
a. Redesign of the flp adder to make it twice as fast.
b. Redesign of the flp multiplier to make it three times as fast.
c. Redesign the flp divider to make it 10 times as fast.
Solution
a. Adder redesign speedup = 1 / [0.7 + 0.3 / 2] = 1.18
b. Multiplier redesign speedup = 1 / [0.75 + 0.25 / 3] = 1.20
c. Divider redesign speedup = 1 / [0.9 + 0.1 / 10] = 1.10
What if both the adder and the multiplier are redesigned?
Oct. 2014 Computer Architecture, Background and Motivation Slide 76
Example 4.2
Amdahl’s Law Used in Management
Members of a university research group frequently visit the library.
Each library trip takes 20 minutes. The group decides to subscribe
to a handful of publications that account for 90% of the library trips;
access time to these publications is reduced to 2 minutes.
a. What is the average speedup in access to publications?
b. If the group has 20 members, each making two weekly trips to
the library, what is the justifiable expense for the subscriptions?
Assume 50 working weeks/yr and $25/h for a researcher’s time.
Solution
a. Speedup in publication access time = 1 / [0.1 + 0.9 / 10] = 5.26
b. Time saved = 20  2  50  0.9 (20 – 2) = 32,400 min = 540 h
Cost recovery = 540  $25 = $13,500 = Max justifiable expense
Oct. 2014 Computer Architecture, Background and Motivation Slide 77
4.4 Performance Measurement vs Modeling
Figure 4.5 Running times of six programs on three machines.
Execution time
Program
A E F
B C D
Machine 1
Machine 2
Machine 3
Oct. 2014 Computer Architecture, Background and Motivation Slide 78
Generalized Amdahl’s Law
Original running time of a program = 1 = f1 + f2 + . . . + fk
New running time after the fraction fi is speeded up by a factor pi
f1 f2 fk
+ + . . . +
p1 p2 pk
Speedup formula
1
S =
f1 f2 fk
+ + . . . +
p1 p2 pk
If a particular fraction
is slowed down rather
than speeded up,
use sj fj instead of fj /pj ,
where sj > 1 is the
slowdown factor
Oct. 2014 Computer Architecture, Background and Motivation Slide 79
Performance Benchmarks
Example 4.3
You are an engineer at Outtel, a start-up aspiring to compete with Intel
via its new processor design that outperforms the latest Intel processor
by a factor of 2.5 on floating-point instructions. This level of performance
was achieved by design compromises that led to a 20% increase in the
execution time of all other instructions. You are in charge of choosing
benchmarks that would showcase Outtel’s performance edge.
a. What is the minimum required fraction f of time spent on floating-point
instructions in a program on the Intel processor to show a speedup of
2 or better for Outtel?
Solution
a. We use a generalized form of Amdahl’s formula in which a fraction f
is speeded up by a given factor (2.5) and the rest is slowed down by
another factor (1.2): 1/ [1.2(1 – f) + f /2.5]  2  f  0.875
Oct. 2014 Computer Architecture, Background and Motivation Slide 80
Performance Estimation
Average CPI = All instruction classes (Class-i fraction)  (Class-i CPI)
Machine cycle time = 1 / Clock rate
CPU execution time = Instructions  (Average CPI) / (Clock rate)
Table 4.3 Usage frequency, in percentage, for various
instruction classes in four representative applications.
Application 
Instr’n class 
Data
compression
C language
compiler
Reactor
simulation
Atomic motion
modeling
A: Load/Store 25 37 32 37
B: Integer 32 28 17 5
C: Shift/Logic 16 13 2 1
D: Float 0 0 34 42
E: Branch 19 13 9 10
F: All others 8 9 6 4
Oct. 2014 Computer Architecture, Background and Motivation Slide 81
CPI and IPS Calculations
Example 4.4 (2 of 5 parts)
Consider two implementations M1 (600 MHz) and M2 (500 MHz) of
an instruction set containing three classes of instructions:
Class CPI for M1 CPI for M2 Comments
F 5.0 4.0 Floating-point
I 2.0 3.8 Integer arithmetic
N 2.4 2.0 Nonarithmetic
a. What are the peak performances of M1 and M2 in MIPS?
b. If 50% of instructions executed are class-N, with the rest divided
equally among F and I, which machine is faster? By what factor?
Solution
a. Peak MIPS for M1 = 600 / 2.0 = 300; for M2 = 500 / 2.0 = 250
b. Average CPI for M1 = 5.0 / 4 + 2.0 / 4 + 2.4 / 2 = 2.95;
for M2 = 4.0/4 + 3.8/4 + 2.0/2 = 2.95  M1 is faster; factor 1.2
Oct. 2014 Computer Architecture, Background and Motivation Slide 82
MIPS Rating Can Be Misleading
Example 4.5
Two compilers produce machine code for a program on a machine
with two classes of instructions. Here are the number of instructions:
Class CPI Compiler 1 Compiler 2
A 1 600M 400M
B 2 400M 400M
a. What are run times of the two programs with a 1 GHz clock?
b. Which compiler produces faster code and by what factor?
c. Which compiler’s output runs at a higher MIPS rate?
Solution
a. Running time 1 (2) = (600M  1 + 400M  2) / 109 = 1.4 s (1.2 s)
b. Compiler 2’s output runs 1.4 / 1.2 = 1.17 times as fast
c. MIPS rating 1, CPI = 1.4 (2, CPI = 1.5) = 1000 / 1.4 = 714 (667)
Oct. 2014 Computer Architecture, Background and Motivation Slide 83
4.5 Reporting Computer Performance
Table 4.4 Measured or estimated execution times for three programs.
Time on
machine X
Time on
machine Y
Speedup of
Y over X
Program A 20 200 0.1
Program B 1000 100 10.0
Program C 1500 150 10.0
All 3 prog’s 2520 450 5.6
Analogy: If a car is driven to a city 100 km away at 100 km/hr
and returns at 50 km/hr, the average speed is not (100 + 50) / 2
but is obtained from the fact that it travels 200 km in 3 hours.
Oct. 2014 Computer Architecture, Background and Motivation Slide 84
Table 4.4 Measured or estimated execution times for three programs.
Time on
machine X
Time on
machine Y
Speedup of
Y over X
Program A 20 200 0.1
Program B 1000 100 10.0
Program C 1500 150 10.0
Geometric mean does not yield a measure of overall speedup,
but provides an indicator that at least moves in the right direction
Comparing the Overall Performance
Speedup of
X over Y
10
0.1
0.1
Arithmetic mean
Geometric mean
6.7
2.15
3.4
0.46
Oct. 2014 Computer Architecture, Background and Motivation Slide 85
Effect of Instruction Mix on Performance
Example 4.6 (1 of 3 parts)
Consider two applications DC and RS and two machines M1 and M2:
Class Data Comp. Reactor Sim. M1’s CPI M2’s CPI
A: Ld/Str 25% 32% 4.0 3.8
B: Integer 32% 17% 1.5 2.5
C: Sh/Logic 16% 2% 1.2 1.2
D: Float 0% 34% 6.0 2.6
E: Branch 19% 9% 2.5 2.2
F: Other 8% 6% 2.0 2.3
a. Find the effective CPI for the two applications on both machines.
Solution
a. CPI of DC on M1: 0.25  4.0 + 0.32  1.5 + 0.16  1.2 + 0  6.0 +
0.19  2.5 + 0.08  2.0 = 2.31
DC on M2: 2.54 RS on M1: 3.94 RS on M2: 2.89
Oct. 2014 Computer Architecture, Background and Motivation Slide 86
4.6 The Quest for Higher Performance
State of available computing power ca. the early 2000s:
Gigaflops on the desktop
Teraflops in the supercomputer center
Petaflops on the drawing board
Note on terminology (see Table 3.1)
Prefixes for large units:
Kilo = 103, Mega = 106, Giga = 109, Tera = 1012, Peta = 1015
For memory:
K = 210 = 1024, M = 220, G = 230, T = 240, P = 250
Prefixes for small units:
micro = 106, nano = 109, pico = 1012, femto = 1015
Oct. 2014 Computer Architecture, Background and Motivation Slide 87
Figure 3.10 Trends in processor
performance and DRAM memory
chip capacity (Moore’s law).
Performance Trends and Obsolescence
1Mb
1990
1980 2000 2010
kIPS
MIPS
GIPS
TIPS
Processor
performance
Calendar year
80286
68000
80386
80486
68040
Pentium
Pentium II
R10000
1.6 / yr
10 / 5 yrs
2 / 18 mos
64Mb
4Mb
64kb
256kb
256Mb
1Gb
16Mb
4 / 3 yrs
Processor
Memory
kb
Mb
Gb
Tb
Memory
chip
capacity
“Can I call you back? We
just bought a new computer
and we’re trying to set it up
before it’s obsolete.”
Oct. 2014 Computer Architecture, Background and Motivation Slide 88
Figure 4.7 Exponential growth of supercomputer performance.
Super-
computers
1990
1980 2000 2010
Supercomputer
performance
Calendar year
Cray
X-MP
Y-MP
CM-2
MFLOPS
GFLOPS
TFLOPS
PFLOPS
Vector
supercomputers
CM-5
CM-5
$240M MPPs
$30M MPPs
Massively parallel
processors
Oct. 2014 Computer Architecture, Background and Motivation Slide 89
Figure 4.8 Milestones in the DOE’s Accelerated Strategic Computing
Initiative (ASCI) program with extrapolation up to the PFLOPS level.
The Most Powerful Computers
2000
1995 2005 2010
Performance
(TFLOPS)
Calendar year
ASCI Red
ASCI Blue
ASCI White
1+ TFLOPS, 0.5 TB
3+ TFLOPS, 1.5 TB
10+ TFLOPS, 5 TB
30+ TFLOPS, 10 TB
100+ TFLOPS, 20 TB
1
10
100
1000 Plan Develop Use
ASCI
ASCI Purple
ASCI Q
Oct. 2014 Computer Architecture, Background and Motivation Slide 90
Figure 25.1
Trend in
computational
performance
per watt of
power used
in general-
purpose
processors
and DSPs.
Performance is Important, But It Isn’t Everything
1990
1980 2000 2010
kIPS
MIPS
GIPS
TIPS
Performance
Calendar year
Absolute
processor
performance
GP processor
performance
per Watt
DSP performance
per Watt
Oct. 2014 Computer Architecture, Background and Motivation Slide 91
Roadmap for the Rest of the Book
Ch. 5-8: A simple ISA,
variations in ISA
Ch. 9-12: ALU design
Ch. 13-14: Data path
and control unit design
Ch. 15-16: Pipelining
and its limits
Ch. 17-20: Memory
(main, mass, cache, virtual)
Ch. 21-24: I/O, buses,
interrupts, interfacing
Fasten your seatbelts
as we begin our ride!
Ch. 25-28: Vector and
parallel processing

More Related Content

Similar to f37-book-intarch-pres-pt1.ppt

Computer organiztion3
Computer organiztion3Computer organiztion3
Computer organiztion3
Umang Gupta
 

Similar to f37-book-intarch-pres-pt1.ppt (20)

Amit vish
Amit vishAmit vish
Amit vish
 
Lcdf4 chap 03_p2
Lcdf4 chap 03_p2Lcdf4 chap 03_p2
Lcdf4 chap 03_p2
 
Video lectures for bba
Video lectures for bbaVideo lectures for bba
Video lectures for bba
 
CA.ppt
CA.pptCA.ppt
CA.ppt
 
decoder and encoder
 decoder and encoder decoder and encoder
decoder and encoder
 
Deep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlowDeep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlow
 
sonam Kumari python.ppt
sonam Kumari python.pptsonam Kumari python.ppt
sonam Kumari python.ppt
 
dokumen.tips_verilog-basic-ppt.pdf
dokumen.tips_verilog-basic-ppt.pdfdokumen.tips_verilog-basic-ppt.pdf
dokumen.tips_verilog-basic-ppt.pdf
 
python-2021.pdf
python-2021.pdfpython-2021.pdf
python-2021.pdf
 
Computer organiztion3
Computer organiztion3Computer organiztion3
Computer organiztion3
 
Combinational and sequential logic
Combinational and sequential logicCombinational and sequential logic
Combinational and sequential logic
 
python.ppt
python.pptpython.ppt
python.ppt
 
python.ppt
python.pptpython.ppt
python.ppt
 
Hardware Description Language
Hardware Description Language Hardware Description Language
Hardware Description Language
 
Chapter1.ppt
Chapter1.pptChapter1.ppt
Chapter1.ppt
 
computer logic and digital design chapter 1
computer logic and digital design chapter 1computer logic and digital design chapter 1
computer logic and digital design chapter 1
 
Implementation of an arithmetic logic using area efficient carry lookahead adder
Implementation of an arithmetic logic using area efficient carry lookahead adderImplementation of an arithmetic logic using area efficient carry lookahead adder
Implementation of an arithmetic logic using area efficient carry lookahead adder
 
Matlab workshop
Matlab workshopMatlab workshop
Matlab workshop
 
Chapter1
Chapter1Chapter1
Chapter1
 
Floating point ALU using VHDL implemented on FPGA
Floating point ALU using VHDL implemented on FPGAFloating point ALU using VHDL implemented on FPGA
Floating point ALU using VHDL implemented on FPGA
 

Recently uploaded

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Recently uploaded (20)

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 

f37-book-intarch-pres-pt1.ppt

  • 1. Oct. 2014 Computer Architecture, Background and Motivation Slide 1 Part I Background and Motivation
  • 2. Oct. 2014 Computer Architecture, Background and Motivation Slide 2 About This Presentation This presentation is intended to support the use of the textbook Computer Architecture: From Microprocessors toSupercomputers, Oxford University Press, 2005, ISBN 0-19-515455-X. It is updated regularly by the author as part of his teaching of the upper-division course ECE 154, Introduction to Computer Architecture, at the University of California, Santa Barbara. Instructors can use these slides freely in classroom teaching and for other educational purposes. Any other use is strictly prohibited. © Behrooz Parhami Edition Released Revised Revised Revised Revised First June 2003 July 2004 June 2005 Mar. 2006 Jan. 2007 Jan. 2008 Jan. 2009 Jan. 2011 Oct. 2014 Second
  • 3. Oct. 2014 Computer Architecture, Background and Motivation Slide 3 I Background and Motivation Topics in This Part Chapter 1 Combinational Digital Circuits Chapter 2 Digital Circuits with Memory Chapter 3 Computer System Technology Chapter 4 Computer Performance Provide motivation, paint the big picture, introduce tools: • Review components used in building digital circuits • Present an overview of computer technology • Understand the meaning of computer performance (or why a 2 GHz processor isn’t 2 as fast as a 1 GHz model)
  • 4. Oct. 2014 Computer Architecture, Background and Motivation Slide 4 1 Combinational Digital Circuits First of two chapters containing a review of digital design: • Combinational, or memoryless, circuits in Chapter 1 • Sequential circuits, with memory, in Chapter 2 Topics in This Chapter 1.1 Signals, Logic Operators, and Gates 1.2 Boolean Functions and Expressions 1.3 Designing Gate Networks 1.4 Useful Combinational Parts 1.5 Programmable Combinational Parts 1.6 Timing and Circuit Considerations
  • 5. Oct. 2014 Computer Architecture, Background and Motivation Slide 5 1.1 Signals, Logic Operators, and Gates Figure 1.1 Some basic elements of digital logic circuits, with operator signs used in this book highlighted. x  y  AND Name XOR OR NOT Graphical symbol x  y Operator sign and alternate(s) x  y x  y xy x  y x  x or x _ x y or xy Arithmetic expression x  y 2xy x  y  xy 1  x Output is 1 iff: Input is 0 Both inputs are 1s At least one input is 1 Inputs are not equal Add: 1 + 1 = 10
  • 6. Oct. 2014 Computer Architecture, Background and Motivation Slide 6 The Arithmetic Substitution Method z = 1 – z NOT converted to arithmetic form xy AND same as multiplication (when doing the algebra, set zk = z) x  y = x + y  xy OR converted to arithmetic form x  y = x + y  2xy XOR converted to arithmetic form Example: Prove the identity xyz  x  y  z ? 1 LHS = [xyz  x]  [y  z] = [xyz + 1 – x – (1 – x)xyz]  [1 – y + 1 – z – (1 – y)(1 – z)] = [xyz + 1 – x]  [1 – yz] = (xyz + 1 – x) + (1 – yz) – (xyz + 1 – x)(1 – yz) = 1 + xy2z2 – xyz = 1 = RHS This is addition, not logical OR
  • 7. Oct. 2014 Computer Architecture, Background and Motivation Slide 7 Variations in Gate Symbols Figure 1.2 Gates with more than two inputs and/or with inverted signals at input or output. OR NOR NAND AND XNOR Bubble = Inverter
  • 8. Oct. 2014 Computer Architecture, Background and Motivation Slide 8 Gates as Control Elements Figure 1.3 An AND gate and a tristate buffer act as controlled switches or valves. An inverting buffer is logically the same as a NOT gate. Enable/Pass signal e Data in x Data out x or 0 Data in x Enable/Pass signal e Data out x or “high impedance” (a) AND gate for controlled transfer (b) Tristate buffer (c) Model for AND switch. x e No data or x 0 1 x e ex 0 1 0 (d) Model for tristate buffer.
  • 9. Oct. 2014 Computer Architecture, Background and Motivation Slide 9 Wired OR and Bus Connections Figure 1.4 Wired OR allows tying together of several controlled signals. e e e Data out (x, y, z, or high impedance) (b) Wired OR of tristate outputs e e e Data out (x, y, z, or 0) (a) Wired OR of product terms z x y z x y z x y z x y
  • 10. Oct. 2014 Computer Architecture, Background and Motivation Slide 10 Control/Data Signals and Signal Bundles Figure 1.5 Arrays of logic gates represented by a single gate symbol. / 8 / 8 / 8 Compl / 32 / k / 32 Enable / k / k / k (b) 32 AND gates (c) k XOR gates (a) 8 NOR gates
  • 11. Oct. 2014 Computer Architecture, Background and Motivation Slide 11 1.2 Boolean Functions and Expressions Ways of specifying a logic function  Truth table: 2n row, “don’t-care” in input or output  Logic expression: w (x  y  z), product-of-sums, sum-of-products, equivalent expressions  Word statement: Alarm will sound if the door is opened while the security system is engaged, or when the smoke detector is triggered  Logic circuit diagram: Synthesis vs analysis
  • 12. Oct. 2014 Computer Architecture, Background and Motivation Slide 12 Table 1.2 Laws (basic identities) of Boolean algebra. Name of law OR version AND version Identity x  0 = x x 1 = x One/Zero x  1 = 1 x 0 = 0 Idempotent x  x = x x x = x Inverse x  x  = 1 x x  = 0 Commutative x  y = y  x x y = y x Associative (x  y)  z = x  (y  z) (x y) z = x (y z) Distributive x  (y z) = (x  y) (x  z) x (y  z) = (x y)  (x z) DeMorgan’s (x  y) = x  y  (x y) = x   y  Manipulating Logic Expressions
  • 13. Oct. 2014 Computer Architecture, Background and Motivation Slide 13 Proving the Equivalence of Logic Expressions Example 1.1  Truth-table method: Exhaustive verification  Arithmetic substitution x  y = x + y  xy x  y = x + y  2xy  Case analysis: two cases, x = 0 or x = 1  Logic expression manipulation Example: x  y ? xyxy x + y – 2xy ? (1–x)y + x(1–y) – (1–x)yx(1–y)
  • 14. Oct. 2014 Computer Architecture, Background and Motivation Slide 14 1.3 Designing Gate Networks  AND-OR, NAND-NAND, OR-AND, NOR-NOR  Logic optimization: cost, speed, power dissipation (a) AND-OR circuit z x y x y z (b) Intermediate circuit (c) NAND-NAND equivalent z x y x y z z x y x y z Figure 1.6 A two-level AND-OR circuit and two equivalent circuits. (a  b  c) = abc
  • 15. Oct. 2014 Computer Architecture, Background and Motivation Slide 15 Seven-Segment Display of Decimal Digits Figure 1.7 Seven-segment display of decimal digits. The three open segments may be optionally used. The digit 1 can be displayed in two ways, with the more common right- side version shown. Optional segment I swear I didn’t use a calculator!
  • 16. Oct. 2014 Computer Architecture, Background and Motivation Slide 16 BCD-to-Seven-Segment Decoder Example 1.2 Figure 1.8 The logic circuit that generates the enable signal for the lowermost segment (number 3) in a seven-segment display unit. x3 x2 x1 x0 Signals to enable or turn on the segments 4-bit input in [0, 9] e0 e5 e6 e4 e2 e1 e3 1 2 4 5 0 3 6
  • 17. Oct. 2014 Computer Architecture, Background and Motivation Slide 17 1.4 Useful Combinational Parts  High-level building blocks  Much like prefab parts used in building a house  Arithmetic components (adders, multipliers, ALUs) will be covered in Part III  Here we cover three useful parts: multiplexers, decoders/demultiplexers, encoders
  • 18. Oct. 2014 Computer Architecture, Background and Motivation Slide 18 Multiplexers Figure 1.9 Multiplexer (mux), or selector, allows one of several inputs to be selected and routed to output depending on the binary value of a set of selection or address signals provided to it. x x y z 1 0 x x z y x x y z 1 0 y / 32 / 32 / 32 1 0 1 0 3 2 z y1 0 1 0 1 0 y1 y0 y0 (a) 2-to-1 mux (b) Switch view (c) Mux symbol (d) Mux array (e) 4-to-1 mux with enable (e) 4-to-1 mux design 0 1 y 1 1 1 0 0 0 x x x x 1 0 2 3 x x x x 0 1 2 3 z e (Enable) 1 0 x2
  • 19. Computer Architecture, Background and Motivation Slide 19 Decoders/Demultiplexers Figure 1.10 A decoder allows the selection of one of 2a options using an a-bit address as input. A demultiplexer (demux) is a decoder that only selects an output if its enable signal is asserted. y1 y0 x0 x3 x2 x1 1 0 3 2 y1 y0 x0 x3 x2 x1 e 1 0 3 2 y1 y0 x0 x3 x2 x1 (a) 2-to-4 decoder (b) Decoder symbol (c) Demultiplexer, or decoder with “enable” (Enable) 1 1 0 1 1 0 1 1 1 Oct. 2014
  • 20. Oct. 2014 Computer Architecture, Background and Motivation Slide 20 Encoders Figure 1.11 A 2a-to-a encoder outputs an a-bit binary number equal to the index of the single 1 among its 2a inputs. (a) 4-to-2 encoder (b) Encoder symbol x0 x3 x2 x1 y1 y0 1 0 3 2 x0 x3 x2 x1 y1 y0 1 0 1 0 0 0
  • 21. Oct. 2014 Computer Architecture, Background and Motivation Slide 21 1.5 Programmable Combinational Parts  Programmable ROM (PROM)  Programmable array logic (PAL)  Programmable logic array (PLA) A programmable combinational part can do the job of many gates or gate networks Programmed by cutting existing connections (fuses) or establishing new connections (antifuses)
  • 22. Oct. 2014 Computer Architecture, Background and Motivation Slide 22 PROMs Figure 1.12 Programmable connections and their use in a PROM. . . . . . . Inputs Outputs (a) Programmable OR gates w x y z (b) Logic equivalent of part a w x y z (c) Programmable read-only memory (PROM) Decoder
  • 23. Oct. 2014 Computer Architecture, Background and Motivation Slide 23 PALs and PLAs Figure 1.13 Programmable combinational logic: general structure and two classes known as PAL and PLA devices. Not shown is PROM with fixed AND array (a decoder) and programmable OR array. AND array (AND plane) OR array (OR plane) . . . . . . . . . Inputs Outputs (a) General programmable combinational logic (b) PAL: programmable AND array, fixed OR array 8-input ANDs (c) PLA: programmable AND and OR arrays 6-input ANDs 4-input ORs
  • 24. Oct. 2014 Computer Architecture, Background and Motivation Slide 24 1.6 Timing and Circuit Considerations  Gate delay d: a fraction of, to a few, nanoseconds  Wire delay, previously negligible, is now important (electronic signals travel about 15 cm per ns)  Circuit simulation to verify function and timing Changes in gate/circuit output, triggered by changes in its inputs, are not instantaneous
  • 25. Oct. 2014 Computer Architecture, Background and Motivation Slide 25 Glitching Figure 1.14 Timing diagram for a circuit that exhibits glitching. x = 0 y z a = x  y f = a  z 2d 2d Using the PAL in Fig. 1.13b to implement f = x  y  z AND-OR (PAL) AND-OR (PAL) x y z a f
  • 26. Oct. 2014 Computer Architecture, Background and Motivation Slide 26 CMOS Transmission Gates Figure 1.15 A CMOS transmission gate and its use in building a 2-to-1 mux. z x x 0 1 (a) CMOS transmission gate: circuit and symbol (b) Two-input mux built of two transmission gates TG TG TG y P N
  • 27. Oct. 2014 Computer Architecture, Background and Motivation Slide 27 2 Digital Circuits with Memory Second of two chapters containing a review of digital design: • Combinational (memoryless) circuits in Chapter 1 • Sequential circuits (with memory) in Chapter 2 Topics in This Chapter 2.1 Latches, Flip-Flops, and Registers 2.2 Finite-State Machines 2.3 Designing Sequential Circuits 2.4 Useful Sequential Parts 2.5 Programmable Sequential Parts 2.6 Clocks and Timing of Events
  • 28. Oct. 2014 Computer Architecture, Background and Motivation Slide 28 2.1 Latches, Flip-Flops, and Registers Figure 2.1 Latches, flip-flops, and registers. R Q Q S D Q Q C Q Q D C (a) SR latch (b) D latch Q C Q D Q C Q D (e) k-bit register (d) D flip-flop symbol (c) Master-slave D flip-flop Q C Q D FF / / k k Q C Q D FF R S
  • 29. Oct. 2014 Computer Architecture, Background and Motivation Slide 29 Latches vs Flip-Flops Figure 2.2 Operations of D latch and negative-edge-triggered D flip-flop. D C D latch: Q D FF: Q Setup time Setup time Hold time Hold time D C Q Q D C Q Q FF
  • 30. Oct. 2014 Computer Architecture, Background and Motivation Slide 30 Reading and Modifying FFs in the Same Cycle Figure 2.3 Register-to-register operation with edge-triggered flip-flops. / / k k Q C Q D FF / / k k Q C Q D FF Computation module (combinational logic) Clock Propagation delay Combinational delay
  • 31. Oct. 2014 Computer Architecture, Background and Motivation Slide 31 2.2 Finite-State Machines Example 2.1 Figure 2.4 State table and state diagram for a vending machine coin reception unit. Dime Dime Quarter Dime Quarter Dime Quarter Dime Quarter Reset Reset Reset Reset Reset Start Quarter S00 S10 S20 S25 S30 S35 S10 S25 S00 S00 S00 S00 S00 S00 S20 S35 S35 S35 S35 S35 S35 S30 S35 S35 ------- Input ------- Dime Quarter Reset Current state S00 S35 is the initial state is the final state Next state Dime Quarter S00 S10 S20 S25 S30 S35
  • 32. Oct. 2014 Computer Architecture, Background and Motivation Slide 32 Sequential Machine Implementation Figure 2.5 Hardware realization of Moore and Mealy sequential machines. Next-state logic State register / n / m / l Inputs Outputs Next-state excitation signals Present state Output logic Only for Mealy machine
  • 33. Oct. 2014 Computer Architecture, Background and Motivation Slide 33 2.3 Designing Sequential Circuits Example 2.3 Figure 2.7 Hardware realization of a coin reception unit (Example 2.3). Output Q C Q D e Inputs Q C Q D Q C Q D FF2 FF1 FF0 q d Quarter in Dime in Final state is 1xx
  • 34. Oct. 2014 Computer Architecture, Background and Motivation Slide 34 2.4 Useful Sequential Parts  High-level building blocks  Much like prefab closets used in building a house  Other memory components will be covered in Chapter 17 (SRAM details, DRAM, Flash)  Here we cover three useful parts: shift register, register file (SRAM basics), counter
  • 35. Oct. 2014 Computer Architecture, Background and Motivation Slide 35 Shift Register Figure 2.8 Register with single-bit left shift and parallel load capabilities. For logical left shift, serial data in line is connected to 0. Parallel data in / k / k / k Shift Q C Q D FF 1 0 Serial data in / k – 1 LSBs Load Parallel data out Serial data out MSB 0 1 0 0 1 1 1 0
  • 36. Oct. 2014 Computer Architecture, Background and Motivation Slide 36 Register File and FIFO Figure 2.9 Register file with random access and FIFO. Decoder / k / k / h Write enable Read address 0 Read address 1 Read data 0 Write data Read enable 2 k-bit registers h / k / k / k / k / k / k / k / h Write address Muxes Read data 1 / k / h / h / h / k / h Write enable Read addr 0 / k / k Read addr 1 Write data Write addr Read data 0 Read enable Read data 1 (a) Register file with random access (b) Graphic symbol for register file Q C Q D FF / k Q C Q D FF Q C Q D FF Q C Q D FF / k Push / k Input Output Pop Full Empty (c) FIFO symbol
  • 37. Oct. 2014 Computer Architecture, Background and Motivation Slide 37 SRAM Figure 2.10 SRAM memory is simply a large, single-port register file. Column mux Row decoder / h Address Square or almost square memory matrix Row buffer Row Column g bits data out / g / h Write enable / g Data in Address Data out Output enable Chip select . . . . . . . . . (a) SRAM block diagram (b) SRAM read mechanism
  • 38. Oct. 2014 Computer Architecture, Background and Motivation Slide 38 Binary Counter Figure 2.11 Synchronous binary counter with initialization capability. Count register Mux Incrementer 0 Input Load IncrInit x + 1 x 0 1 1 c in cout
  • 39. Oct. 2014 Computer Architecture, Background and Motivation Slide 39 2.5 Programmable Sequential Parts  Programmable array logic (PAL)  Field-programmable gate array (FPGA)  Both types contain macrocells and interconnects A programmable sequential part contain gates and memory elements Programmed by cutting existing connections (fuses) or establishing new connections (antifuses)
  • 40. Oct. 2014 Computer Architecture, Background and Motivation Slide 40 PAL and FPGA Figure 2.12 Examples of programmable sequential logic. (a) Portion of PAL with storable output (b) Generic structure of an FPGA 8-input ANDs D C Q Q FF Mux Mux 0 1 0 1 I/O blocks Configurable logic block Programmable connections CLB CLB CLB CLB
  • 41. Oct. 2014 Computer Architecture, Background and Motivation Slide 41 2.6 Clocks and Timing of Events Clock is a periodic signal: clock rate = clock frequency The inverse of clock rate is the clock period: 1 GHz  1 ns Constraint: Clock period  tprop + tcomb + tsetup + tskew Figure 2.13 Determining the required length of the clock period. Other inputs Combinational logic Clock period FF1 begins to change FF1 change observed Must be wide enough to accommodate worst-case delays Clock1 Clock2 Q C Q D FF2 Q C Q D FF1
  • 42. Oct. 2014 Computer Architecture, Background and Motivation Slide 42 Synchronization Figure 2.14 Synchronizers are used to prevent timing problems arising from untimely changes in asynchronous signals. Asynch input Asynch input Synch version Synch version Asynch input Synch version Clock (a) Simple synchronizer (b) Two-FF synchronizer (c) Input and output waveforms Q C Q D FF Q C Q D FF2 Q C Q D FF1
  • 43. Oct. 2014 Computer Architecture, Background and Motivation Slide 43 Level-Sensitive Operation Figure 2.15 Two-phase clocking with nonoverlapping clock signals. Combi- national logic 1  1 Clock period  Q C Q D Latch 1  Q C Q D Latch Other inputs Combi- national logic 2  2  Clocks with nonoverlapping highs Other inputs Q C Q Latch D
  • 44. Oct. 2014 Computer Architecture, Background and Motivation Slide 44 3 Computer System Technology Interplay between architecture, hardware, and software • Architectural innovations influence technology • Technological advances drive changes in architecture Topics in This Chapter 3.1 From Components to Applications 3.2 Computer Systems and Their Parts 3.3 Generations of Progress 3.4 Processor and Memory Technologies 3.5 Peripherals, I/O, and Communications 3.6 Software Systems and Applications
  • 45. Oct. 2014 Computer Architecture, Background and Motivation Slide 45 3.1 From Components to Applications Figure 3.1 Subfields or views in computer system engineering. High- level view Computer designer Circuit designer Application designer System designer Logic designer Software Hardware Computer organization Low- level view Application domains Electronic components Computer architecture
  • 46. Oct. 2014 Computer Architecture, Background and Motivation Slide 46 What Is (Computer) Architecture? Figure 3.2 Like a building architect, whose place at the engineering/arts and goals/means interfaces is seen in this diagram, a computer architect reconciles many conflicting or competing demands. Architect Interface Interface Goals Means Arts Engineering Client’s taste: mood, style, . . . Client’s requirements: function, cost, . . . The world of arts: aesthetics, trends, . . . Construction technology: material, codes, . . .
  • 47. Oct. 2014 Computer Architecture, Background and Motivation Slide 47 3.2 Computer Systems and Their Parts Figure 3.3 The space of computer systems, with what we normally mean by the word “computer” highlighted. Computer Analog Fixed-function Stored-program Electronic Nonelectronic General-purpose Special-purpose Number cruncher Data manipulator Digital
  • 48. Oct. 2014 Computer Architecture, Background and Motivation Slide 48 Price/Performance Pyramid Figure 3.4 Classifying computers by computational power and price range. Embedded Personal Workstation Server Mainframe Super $Millions $100s Ks $10s Ks $1000s $100s $10s Differences in scale, not in substance
  • 49. Oct. 2014 Computer Architecture, Background and Motivation Slide 49 Automotive Embedded Computers Figure 3.5 Embedded computers are ubiquitous, yet invisible. They are found in our automobiles, appliances, and many other places. Engine Impact sensors Navigation & entertainment Central controller Brakes Airbags
  • 50. Oct. 2014 Computer Architecture, Background and Motivation Slide 50 Personal Computers and Workstations Figure 3.6 Notebooks, a common class of portable computers, are much smaller than desktops but offer substantially the same capabilities. What are the main reasons for the size difference?
  • 51. Oct. 2014 Computer Architecture, Background and Motivation Slide 51 Digital Computer Subsystems Figure 3.7 The (three, four, five, or) six main units of a digital computer. Usually, the link unit (a simple bus or a more elaborate network) is not explicitly included in such diagrams. Memory Link Input/Output To/from network Processor Control Datapath Input Output CPU I/O
  • 52. Oct. 2014 Computer Architecture, Background and Motivation Slide 52 3.3 Generations of Progress Table 3.2 The 5 generations of digital computers, and their ancestors. Generation (begun) Processor technology Memory innovations I/O devices introduced Dominant look & fell 0 (1600s) (Electro-) mechanical Wheel, card Lever, dial, punched card Factory equipment 1 (1950s) Vacuum tube Magnetic drum Paper tape, magnetic tape Hall-size cabinet 2 (1960s) Transistor Magnetic core Drum, printer, text terminal Room-size mainframe 3 (1970s) SSI/MSI RAM/ROM chip Disk, keyboard, video monitor Desk-size mini 4 (1980s) LSI/VLSI SRAM/DRAM Network, CD, mouse,sound Desktop/ laptop micro 5 (1990s) ULSI/GSI/ WSI, SOC SDRAM, flash Sensor/actuator, point/click Invisible, embedded
  • 53. Oct. 2014 Computer Architecture, Background and Motivation Slide 53 Figure 3.8 The manufacturing process for an IC part. IC Production and Yield 15-30 cm 30-60 cm Silicon crystal ingot Slicer Processing: 20-30 steps Blank wafer with defects x x x x x x x x x x x 0.2 cm Patterned wafer (100s of simple or scores of complex processors) Dicer Die ~1 cm Good die ~1 cm Die tester Microchip or other part Mounting Part tester Usable part to ship
  • 54. Oct. 2014 Computer Architecture, Background and Motivation Slide 54 Figure 3.9 Visualizing the dramatic decrease in yield with larger dies. Effect of Die Size on Yield 120 dies, 109 good 26 dies, 15 good Die yield =def (number of good dies) / (total number of dies) Die yield = Wafer yield  [1 + (Defect density  Die area) / a]–a Die cost = (cost of wafer) / (total number of dies  die yield) = (cost of wafer)  (die area / wafer area) / (die yield)
  • 55. Oct. 2014 Computer Architecture, Background and Motivation Slide 55 3.4 Processor and Memory Technologies Figure 3.11 Packaging of processor, memory, and other components. PC board Backplane Memory CPU Bus Connector (b) 3D packaging of the future (a) 2D or 2.5D packaging now common Stacked layers glued together Interlayer connections deposited on the outside of the stack Die
  • 56. Oct. 2014 Computer Architecture, Background and Motivation Slide 56 Figure 3.10 Trends in processor performance and DRAM memory chip capacity (Moore’s law). Moore’s Law 1Mb 1990 1980 2000 2010 kIPS MIPS GIPS TIPS Processor performance Calendar year 80286 68000 80386 80486 68040 Pentium Pentium II R10000 1.6 / yr 10 / 5 yrs 2 / 18 mos 64Mb 4Mb 64kb 256kb 256Mb 1Gb 16Mb 4 / 3 yrs Processor Memory kb Mb Gb Tb Memory chip capacity
  • 57. Oct. 2014 Computer Architecture, Background and Motivation Slide 57 Pitfalls of Computer Technology Forecasting “DOS addresses only 1 MB of RAM because we cannot imagine any applications needing more.” Microsoft, 1980 “640K ought to be enough for anybody.” Bill Gates, 1981 “Computers in the future may weigh no more than 1.5 tons.” Popular Mechanics “I think there is a world market for maybe five computers.” Thomas Watson, IBM Chairman, 1943 “There is no reason anyone would want a computer in their home.” Ken Olsen, DEC founder, 1977 “The 32-bit machine would be an overkill for a personal computer.” Sol Libes, ByteLines
  • 58. Oct. 2014 Computer Architecture, Background and Motivation Slide 58 3.5 Input/Output and Communications Figure 3.12 Magnetic and optical disk memory units. (a) Cutaway view of a hard disk drive (b) Some removable storage media Typically 2-9 cm Floppy disk CD-ROM Magnetic tape cartridge . . . . . . . .
  • 59. Oct. 2014 Computer Architecture, Background and Motivation Slide 59 Figure 3.13 Latency and bandwidth characteristics of different classes of communication links. Communication Technologies 3 6 9 12 9 6 3 3 Bandwidth (b/s) Latency (s) 10 10 10 10 10 10 10 1 10 Processor bus I/O network System-area network (SAN) Local-area network (LAN) Metro-area network (MAN) Wide-area network (WAN) Geographically distributed Same geographic location (ns) (s) (ms) (min) (h)
  • 60. Oct. 2014 Computer Architecture, Background and Motivation Slide 60 3.6 Software Systems and Applications Figure 3.15 Categorization of software, with examples in each class. Software Application: word processor, spreadsheet, circuit simulator, . . . Operating system Translator: MIPS assembler, C compiler, . . . System Manager: virtual memory, security, file system, . . . Coordinator: scheduling, load balancing, diagnostics, . . . Enabler: disk driver, display driver, printing, . . .
  • 61. Oct. 2014 Computer Architecture, Background and Motivation Slide 61 Figure 3.14 Models and abstractions in programming. High- vs Low-Level Programming Compiler Assembler Interpreter temp=v[i] v[i]=v[i+1] v[i+1]=temp Swap v[i] and v[i+1] add $2,$5,$5 add $2,$2,$2 add $2,$4,$2 lw $15,0($2) lw $16,4($2) sw $16,0($2) sw $15,4($2) jr $31 00a51020 00421020 00821020 8c620000 8cf20004 acf20000 ac620004 03e00008 Very high-level language objectives or tasks High-level language statements Assembly language instructions, mnemonic Machine language instructions, binary (hex) One task = many statements One statement = several instructions Mostly one-to-one More abstract, machine-independent; easier to write, read, debug, or maintain More concrete, machine-specific, error-prone; harder to write, read, debug, or maintain
  • 62. Oct. 2014 Computer Architecture, Background and Motivation Slide 62 4 Computer Performance Performance is key in design decisions; also cost and power • It has been a driving force for innovation • Isn’t quite the same as speed (higher clock rate) Topics in This Chapter 4.1 Cost, Performance, and Cost/Performance 4.2 Defining Computer Performance 4.3 Performance Enhancement and Amdahl’s Law 4.4 Performance Measurement vs Modeling 4.5 Reporting Computer Performance 4.6 The Quest for Higher Performance
  • 63. Oct. 2014 Computer Architecture, Background and Motivation Slide 63 4.1 Cost, Performance, and Cost/Performance 1980 1960 2000 2020 $1 Compute r cost Calendar year $1 K $1 M $1 G
  • 64. Oct. 2014 Computer Architecture, Background and Motivation Slide 64 Figure 4.1 Performance improvement as a function of cost. Cost/Performance Performance Cost Superlinear: economy of scale Sublinear: diminishing returns Linear (ideal?)
  • 65. Oct. 2014 Computer Architecture, Background and Motivation Slide 65 4.2 Defining Computer Performance Figure 4.2 Pipeline analogy shows that imbalance between processing power and I/O capabilities leads to a performance bottleneck. Processing Input Output CPU-bound task I/O-bound task
  • 66. Oct. 2014 Computer Architecture, Background and Motivation Slide 66 Six Passenger Aircraft to Be Compared B 747 DC-8-50
  • 67. Oct. 2014 Computer Architecture, Background and Motivation Slide 67 Performance of Aircraft: An Analogy Table 4.1 Key characteristics of six passenger aircraft: all figures are approximate; some relate to a specific model/configuration of the aircraft or are averages of cited range of values. Aircraft Passengers Range (km) Speed (km/h) Price ($M) Airbus A310 250 8 300 895 120 Boeing 747 470 6 700 980 200 Boeing 767 250 12 300 885 120 Boeing 777 375 7 450 980 180 Concorde 130 6 400 2 200 350 DC-8-50 145 14 000 875 80 Speed of sound  1220 km / h
  • 68. Oct. 2014 Computer Architecture, Background and Motivation Slide 68 Different Views of Performance Performance from the viewpoint of a passenger: Speed Note, however, that flight time is but one part of total travel time. Also, if the travel distance exceeds the range of a faster plane, a slower plane may be better due to not needing a refueling stop Performance from the viewpoint of an airline: Throughput Measured in passenger-km per hour (relevant if ticket price were proportional to distance traveled, which in reality it is not) Airbus A310 250  895 = 0.224 M passenger-km/hr Boeing 747 470  980 = 0.461 M passenger-km/hr Boeing 767 250  885 = 0.221 M passenger-km/hr Boeing 777 375  980 = 0.368 M passenger-km/hr Concorde 130  2200 = 0.286 M passenger-km/hr DC-8-50 145  875 = 0.127 M passenger-km/hr Performance from the viewpoint of FAA: Safety
  • 69. Oct. 2014 Computer Architecture, Background and Motivation Slide 69 Cost Effectiveness: Cost/Performance Table 4.1 Key characteristics of six passenger aircraft: all figures are approximate; some relate to a specific model/configuration of the aircraft or are averages of cited range of values. Aircraft Passen- gers Range (km) Speed (km/h) Price ($M) A310 250 8 300 895 120 B 747 470 6 700 980 200 B 767 250 12 300 885 120 B 777 375 7 450 980 180 Concorde 130 6 400 2 200 350 DC-8-50 145 14 000 875 80 Cost / Performance 536 434 543 489 1224 630 Smaller values better Throughput (M P km/hr) 0.224 0.461 0.221 0.368 0.286 0.127 Larger values better
  • 70. Oct. 2014 Computer Architecture, Background and Motivation Slide 70 Concepts of Performance and Speedup Performance = 1 / Execution time is simplified to Performance = 1 / CPU execution time (Performance of M1) / (Performance of M2) = Speedup of M1 over M2 = (Execution time of M2) / (Execution time M1) Terminology: M1 is x times as fast as M2 (e.g., 1.5 times as fast) M1 is 100(x – 1)% faster than M2 (e.g., 50% faster) CPU time = Instructions  (Cycles per instruction)  (Secs per cycle) = Instructions  CPI / (Clock rate) Instruction count, CPI, and clock rate are not completely independent, so improving one by a given factor may not lead to overall execution time improvement by the same factor.
  • 71. Oct. 2014 Computer Architecture, Background and Motivation Slide 71 Elaboration on the CPU Time Formula CPU time = Instructions  (Cycles per instruction)  (Secs per cycle) = Instructions  Average CPI / (Clock rate) Clock period Clock rate: 1 GHz = 109 cycles / s (cycle time 10–9 s = 1 ns) 200 MHz = 200  106 cycles / s (cycle time = 5 ns) Average CPI: Is calculated based on the dynamic instruction mix and knowledge of how many clock cycles are needed to execute various instructions (or instruction classes) Instructions: Number of instructions executed, not number of instructions in our program (dynamic count)
  • 72. Oct. 2014 Computer Architecture, Background and Motivation Slide 72 Dynamic Instruction Count 250 instructions for i = 1, 100 do 20 instructions for j = 1, 100 do 40 instructions for k = 1, 100 do 10 instructions endfor endfor endfor How many instructions are executed in this program fragment? Each “for” consists of two instructions: increment index, check exit condition 2 + 40 + 1200 instructions 100 iterations 124,200 instructions in all 2 + 10 instructions 100 iterations 1200 instructions in all 2 + 20 + 124,200 instructions 100 iterations 12,422,200 instructions in all 12,422,450 Instructions for i = 1, n while x > 0 Static count = 326
  • 73. Oct. 2014 Computer Architecture, Background and Motivation Slide 73 Figure 4.3 Faster steps do not necessarily mean shorter travel time. Faster Clock  Shorter Running Time 1 GHz 2 GHz 4 steps Solution 20 steps Suppose addition takes 1 ns Clock period = 1 ns; 1 cycle Clock period = ½ ns; 2 cycles In this example, addition time does not improve in going from 1 GHz to 2 GHz clock
  • 74. Oct. 2014 Computer Architecture, Background and Motivation Slide 74 0 10 20 30 40 50 0 10 20 30 40 50 Enhancement factor (p) Speedup (s ) f = 0 f = 0.1 f = 0.05 f = 0.02 f = 0.01 4.3 Performance Enhancement: Amdahl’s Law Figure 4.4 Amdahl’s law: speedup achieved if a fraction f of a task is unaffected and the remaining 1 – f part runs p times as fast. s =  min(p, 1/f) 1 f+(1–f)/p f = fraction unaffected p = speedup of the rest
  • 75. Oct. 2014 Computer Architecture, Background and Motivation Slide 75 Example 4.1 Amdahl’s Law Used in Design A processor spends 30% of its time on flp addition, 25% on flp mult, and 10% on flp division. Evaluate the following enhancements, each costing the same to implement: a. Redesign of the flp adder to make it twice as fast. b. Redesign of the flp multiplier to make it three times as fast. c. Redesign the flp divider to make it 10 times as fast. Solution a. Adder redesign speedup = 1 / [0.7 + 0.3 / 2] = 1.18 b. Multiplier redesign speedup = 1 / [0.75 + 0.25 / 3] = 1.20 c. Divider redesign speedup = 1 / [0.9 + 0.1 / 10] = 1.10 What if both the adder and the multiplier are redesigned?
  • 76. Oct. 2014 Computer Architecture, Background and Motivation Slide 76 Example 4.2 Amdahl’s Law Used in Management Members of a university research group frequently visit the library. Each library trip takes 20 minutes. The group decides to subscribe to a handful of publications that account for 90% of the library trips; access time to these publications is reduced to 2 minutes. a. What is the average speedup in access to publications? b. If the group has 20 members, each making two weekly trips to the library, what is the justifiable expense for the subscriptions? Assume 50 working weeks/yr and $25/h for a researcher’s time. Solution a. Speedup in publication access time = 1 / [0.1 + 0.9 / 10] = 5.26 b. Time saved = 20  2  50  0.9 (20 – 2) = 32,400 min = 540 h Cost recovery = 540  $25 = $13,500 = Max justifiable expense
  • 77. Oct. 2014 Computer Architecture, Background and Motivation Slide 77 4.4 Performance Measurement vs Modeling Figure 4.5 Running times of six programs on three machines. Execution time Program A E F B C D Machine 1 Machine 2 Machine 3
  • 78. Oct. 2014 Computer Architecture, Background and Motivation Slide 78 Generalized Amdahl’s Law Original running time of a program = 1 = f1 + f2 + . . . + fk New running time after the fraction fi is speeded up by a factor pi f1 f2 fk + + . . . + p1 p2 pk Speedup formula 1 S = f1 f2 fk + + . . . + p1 p2 pk If a particular fraction is slowed down rather than speeded up, use sj fj instead of fj /pj , where sj > 1 is the slowdown factor
  • 79. Oct. 2014 Computer Architecture, Background and Motivation Slide 79 Performance Benchmarks Example 4.3 You are an engineer at Outtel, a start-up aspiring to compete with Intel via its new processor design that outperforms the latest Intel processor by a factor of 2.5 on floating-point instructions. This level of performance was achieved by design compromises that led to a 20% increase in the execution time of all other instructions. You are in charge of choosing benchmarks that would showcase Outtel’s performance edge. a. What is the minimum required fraction f of time spent on floating-point instructions in a program on the Intel processor to show a speedup of 2 or better for Outtel? Solution a. We use a generalized form of Amdahl’s formula in which a fraction f is speeded up by a given factor (2.5) and the rest is slowed down by another factor (1.2): 1/ [1.2(1 – f) + f /2.5]  2  f  0.875
  • 80. Oct. 2014 Computer Architecture, Background and Motivation Slide 80 Performance Estimation Average CPI = All instruction classes (Class-i fraction)  (Class-i CPI) Machine cycle time = 1 / Clock rate CPU execution time = Instructions  (Average CPI) / (Clock rate) Table 4.3 Usage frequency, in percentage, for various instruction classes in four representative applications. Application  Instr’n class  Data compression C language compiler Reactor simulation Atomic motion modeling A: Load/Store 25 37 32 37 B: Integer 32 28 17 5 C: Shift/Logic 16 13 2 1 D: Float 0 0 34 42 E: Branch 19 13 9 10 F: All others 8 9 6 4
  • 81. Oct. 2014 Computer Architecture, Background and Motivation Slide 81 CPI and IPS Calculations Example 4.4 (2 of 5 parts) Consider two implementations M1 (600 MHz) and M2 (500 MHz) of an instruction set containing three classes of instructions: Class CPI for M1 CPI for M2 Comments F 5.0 4.0 Floating-point I 2.0 3.8 Integer arithmetic N 2.4 2.0 Nonarithmetic a. What are the peak performances of M1 and M2 in MIPS? b. If 50% of instructions executed are class-N, with the rest divided equally among F and I, which machine is faster? By what factor? Solution a. Peak MIPS for M1 = 600 / 2.0 = 300; for M2 = 500 / 2.0 = 250 b. Average CPI for M1 = 5.0 / 4 + 2.0 / 4 + 2.4 / 2 = 2.95; for M2 = 4.0/4 + 3.8/4 + 2.0/2 = 2.95  M1 is faster; factor 1.2
  • 82. Oct. 2014 Computer Architecture, Background and Motivation Slide 82 MIPS Rating Can Be Misleading Example 4.5 Two compilers produce machine code for a program on a machine with two classes of instructions. Here are the number of instructions: Class CPI Compiler 1 Compiler 2 A 1 600M 400M B 2 400M 400M a. What are run times of the two programs with a 1 GHz clock? b. Which compiler produces faster code and by what factor? c. Which compiler’s output runs at a higher MIPS rate? Solution a. Running time 1 (2) = (600M  1 + 400M  2) / 109 = 1.4 s (1.2 s) b. Compiler 2’s output runs 1.4 / 1.2 = 1.17 times as fast c. MIPS rating 1, CPI = 1.4 (2, CPI = 1.5) = 1000 / 1.4 = 714 (667)
  • 83. Oct. 2014 Computer Architecture, Background and Motivation Slide 83 4.5 Reporting Computer Performance Table 4.4 Measured or estimated execution times for three programs. Time on machine X Time on machine Y Speedup of Y over X Program A 20 200 0.1 Program B 1000 100 10.0 Program C 1500 150 10.0 All 3 prog’s 2520 450 5.6 Analogy: If a car is driven to a city 100 km away at 100 km/hr and returns at 50 km/hr, the average speed is not (100 + 50) / 2 but is obtained from the fact that it travels 200 km in 3 hours.
  • 84. Oct. 2014 Computer Architecture, Background and Motivation Slide 84 Table 4.4 Measured or estimated execution times for three programs. Time on machine X Time on machine Y Speedup of Y over X Program A 20 200 0.1 Program B 1000 100 10.0 Program C 1500 150 10.0 Geometric mean does not yield a measure of overall speedup, but provides an indicator that at least moves in the right direction Comparing the Overall Performance Speedup of X over Y 10 0.1 0.1 Arithmetic mean Geometric mean 6.7 2.15 3.4 0.46
  • 85. Oct. 2014 Computer Architecture, Background and Motivation Slide 85 Effect of Instruction Mix on Performance Example 4.6 (1 of 3 parts) Consider two applications DC and RS and two machines M1 and M2: Class Data Comp. Reactor Sim. M1’s CPI M2’s CPI A: Ld/Str 25% 32% 4.0 3.8 B: Integer 32% 17% 1.5 2.5 C: Sh/Logic 16% 2% 1.2 1.2 D: Float 0% 34% 6.0 2.6 E: Branch 19% 9% 2.5 2.2 F: Other 8% 6% 2.0 2.3 a. Find the effective CPI for the two applications on both machines. Solution a. CPI of DC on M1: 0.25  4.0 + 0.32  1.5 + 0.16  1.2 + 0  6.0 + 0.19  2.5 + 0.08  2.0 = 2.31 DC on M2: 2.54 RS on M1: 3.94 RS on M2: 2.89
  • 86. Oct. 2014 Computer Architecture, Background and Motivation Slide 86 4.6 The Quest for Higher Performance State of available computing power ca. the early 2000s: Gigaflops on the desktop Teraflops in the supercomputer center Petaflops on the drawing board Note on terminology (see Table 3.1) Prefixes for large units: Kilo = 103, Mega = 106, Giga = 109, Tera = 1012, Peta = 1015 For memory: K = 210 = 1024, M = 220, G = 230, T = 240, P = 250 Prefixes for small units: micro = 106, nano = 109, pico = 1012, femto = 1015
  • 87. Oct. 2014 Computer Architecture, Background and Motivation Slide 87 Figure 3.10 Trends in processor performance and DRAM memory chip capacity (Moore’s law). Performance Trends and Obsolescence 1Mb 1990 1980 2000 2010 kIPS MIPS GIPS TIPS Processor performance Calendar year 80286 68000 80386 80486 68040 Pentium Pentium II R10000 1.6 / yr 10 / 5 yrs 2 / 18 mos 64Mb 4Mb 64kb 256kb 256Mb 1Gb 16Mb 4 / 3 yrs Processor Memory kb Mb Gb Tb Memory chip capacity “Can I call you back? We just bought a new computer and we’re trying to set it up before it’s obsolete.”
  • 88. Oct. 2014 Computer Architecture, Background and Motivation Slide 88 Figure 4.7 Exponential growth of supercomputer performance. Super- computers 1990 1980 2000 2010 Supercomputer performance Calendar year Cray X-MP Y-MP CM-2 MFLOPS GFLOPS TFLOPS PFLOPS Vector supercomputers CM-5 CM-5 $240M MPPs $30M MPPs Massively parallel processors
  • 89. Oct. 2014 Computer Architecture, Background and Motivation Slide 89 Figure 4.8 Milestones in the DOE’s Accelerated Strategic Computing Initiative (ASCI) program with extrapolation up to the PFLOPS level. The Most Powerful Computers 2000 1995 2005 2010 Performance (TFLOPS) Calendar year ASCI Red ASCI Blue ASCI White 1+ TFLOPS, 0.5 TB 3+ TFLOPS, 1.5 TB 10+ TFLOPS, 5 TB 30+ TFLOPS, 10 TB 100+ TFLOPS, 20 TB 1 10 100 1000 Plan Develop Use ASCI ASCI Purple ASCI Q
  • 90. Oct. 2014 Computer Architecture, Background and Motivation Slide 90 Figure 25.1 Trend in computational performance per watt of power used in general- purpose processors and DSPs. Performance is Important, But It Isn’t Everything 1990 1980 2000 2010 kIPS MIPS GIPS TIPS Performance Calendar year Absolute processor performance GP processor performance per Watt DSP performance per Watt
  • 91. Oct. 2014 Computer Architecture, Background and Motivation Slide 91 Roadmap for the Rest of the Book Ch. 5-8: A simple ISA, variations in ISA Ch. 9-12: ALU design Ch. 13-14: Data path and control unit design Ch. 15-16: Pipelining and its limits Ch. 17-20: Memory (main, mass, cache, virtual) Ch. 21-24: I/O, buses, interrupts, interfacing Fasten your seatbelts as we begin our ride! Ch. 25-28: Vector and parallel processing